Incident Postmortem: Extraordinary Long Queues
Issue Description: During the weekend and yesterday morning, gatewayapi.com experienced extraordinarily long queues, causing significant delays in message delivery.
Root Cause: The root cause of the issue was identified as the internal number lookup service used for routing. This service encountered difficulties running efficiently, creating a bottleneck for all messages and severely slowing down delivery times.
Actions Taken to Resolve:
- Increased Service Instances: To address the issue promptly, we increased the number of always running service instances. This adjustment helped distribute the load more evenly across our infrastructure, alleviating the bottleneck and improving message delivery times.
- Timeout Limit Adjustment: We also took action by lowering the timeout limits for lookups in the internal number lookup service. This change was implemented to ensure that a similar issue in the future would have a less severe impact on message delivery times.
- Database Upgrade: In addition to the above measures, we performed a database upgrade to further enhance the overall system performance and reliability.
Preventative Measures: To prevent similar incidents in the future, we will undertake the following measures:
- Continuous Monitoring: We will implement more robust monitoring to promptly detect any anomalies in the internal number lookup service or related components.
- Redundancy and Scaling: We will explore redundancy options and scaling strategies to ensure the system can handle peak loads without disruptions.
- Automated Testing: Regular automated testing of critical services will be conducted to identify potential issues before they impact the production environment.
Timeline: The issue began at 10:05 and was resolved at 10:45 on 5th February.
We apologize for any inconvenience this incident may have caused our customers. Our team remains committed to ensuring the reliability and performance of gatewayapi.com. If you have any further questions or concerns, please do not hesitate to reach out to us.