Avaya Failover Timers for IP-Endpoints and H.248 Gateways
4 Steps to help you understand what can cause issues with registrations and resource utilization.
In this post “Avaya Failover Timers for IP-Endpoints and H.248 Gateways” learn how the system uses its algorithm in conjunction with the configured timers that allow H.248 Gateways and IP-Endpoint registrations to keep business continuity by preserving the active calls avoiding less interruptions that also allows each endpoint to determine when the session ends.
A ticket came in where the customer explained how their phones will show a regular working phone display but whenever they tried to initiate a call did not work. They said there was no dial tone heard on the handset and the phone will reboot at times.
This issue affected a DR Location and two other branches where they did not have gateways just IP-Endpoints connected over an MPLS connection.
Here are the steps I took to get this resolved 80% of the problem in less than 15 mins and a day later fixed the two other branches.
- 1.- Asking the right questions
- 2.- Get to know the system environment
- 3.- Review and Test
- 4.- Apply the solution
1.- Asking the right questions.
It is a good idea to have a group of questions available to your disposal so when request like are assigned to you are ahead of the game. Start by asking basic questions then move to more complex ones.
It helps knowing when was the first time they experienced this behavior. You’ll be surprise how many times they tell you they had seen this happening before but never at this magnitude. You need to know if there any changes made to the VoIP Network Topology/infrastructure.
When investigating IP-Endpoint phone registration you must know who is distributing IP address and services to the IP-Endpoints. Once you have reviewed the Scope Options and their respective strings, you can move on troubleshooting those Gatekeepers specified under the MCIPADD field.
2.- Get to know the system environment.
If this customer has a SAL Gateway you can easily download a list of servers to help you better understand what type of environment they have in place. If this is not available you can review your IP Node Name List or just ask the customer. If you are not capable of retrieving that information you can easily walk someone through reading the IP information found in any given IP-Endpoint.
The big players here are the PROCR and CLANs for registrations and Avaya Media Severs, Media Gateways and MedPros which provide the Media Processors and DSPs.
Now that we have covered the signaling, registration and media portion lets take a look at the different elements that help the H.248 or Media Gateways registration:
If this link goes down, Link Recovery preserves any existing calls and
attempts to re-establish the original link. If the gateway cannot reconnect to
the original server, then Link Recovery automatically attempts to connect
with an ESS server or the Local Survivable Processor (LSP) if available.
LLDT, PST and TST.
The Link Loss Delay Timer (LLDT) Is the length of time the Media Gateway retains the active calls while it tries to re-register to its gatekeeper.
- The H.248 Link Loss Delay Timer (LLDT) is the time that the server will retain calls already in progress before the link failure.
- This timer starts when Communication Manager detects a loss of network connection to the media gateway.
- The LLDT should be the last timer to expire, meaning that the server holds its call control information until all other means of re-
establishing the call have been exhausted. The timer should be longer than the Media Gateway Primary Search Timer and Total Search Timer.
Primary Search Timer or PST is the length of time the Media Gateway spends trying to connect to the Primary registration server/Gatekeeper normally located at the HQ location (PROCR / CLAN(s)).
The Total Search Time or TST is important especially if you have remote sites handling registration and acting as backup Gatekeepers (ESS/LSPs). It is the time the Media Gateway spends looking for the Alternate Gatekeepers.
3.- Review and Test
Now that you understand all the elements participating in this VoIP Topology, you can start by prompting the End-User to pickup the phone’s handset while running “list trace” to see which Denial Events comes up. In my case these phones were located at a ESS which didn’t have any source of Media Processors to provide Dial Tone.
To verify where are these phones registered you can easily run a “stat socket” or “list reg” command to see if the 9600s are registered to the Core or ESS. In this case they came up in the ESS which explains why they couldn’t get Dial Tone.
The customer informed me there was an upgrade done that affected the Voice Equipment, causing a disruption to the Avaya Core. Some phones were registered to the Core and some others went and register to their respective failover GateKeeper.
A quick test I had them do was a reboot of one of the IP Phones and observed it registered to the primary registration point (Core/PROCR). Another thing you can quick run is the “list trace ras forced_urqs” which verifies if CM is sending Forced Unregistration Requests in high volume.
Collecting the 9600 IP Deskphones configuration through SNMP-Walk / Mib-Walk helps reviewing what’s configured in the IP-Endpoints. The Mib-Walk can be extracted by using PROCR or any other LSP/ESS closest to the IP Phones. Some of the information that you can pull are the AGL (Alternate Gatekeeper List), Boot-up and registration messages, DHCP, File Server among other valuable information.
4.- Apply the solution
The quickest way to fix this is to perform a “reset system 4” at the ESS to force the 9600 IP Deskphones to register to the Core (PROCR) Primary GateKeeper.
Reset the IP-Network Region is another option which forces the phones to reboot. Run the “reset ip-phone ip-network-region x”
Resources:
Please note: I reserve the right to delete comments that are offensive or off-topic.