GatewayAPI.EU incident
Incident Report for GatewayAPI
Postmortem

Incident Summary: On Saturday June 1st 10:31 AM CEST, GatewayAPI.EU experienced an issue that affected the functionality of our REST API, leading to the platform being inaccessible. This incident was a result of a configuration issue that became apparent following an unexpected restart of the EU cluster.

Root Cause: The root cause of the incident was traced to a code cleanup that removed an integration. This change inadvertently left our internal system API with a broken configuration. The issue was not immediately apparent and only surfaced when the pods restarted. For unknown reasons, the entire EU cluster had to restart, triggering a cascade of failures across our interconnected services.

Impact:

  • Users Affected: All
  • Duration of Impact: 10:30 - 11:32 CEST
  • Services Affected: REST API, message delivery

Resolution and Recovery: Our engineering team corrected the configuration issue in the internal system API and successfully restarted the affected services. The REST API functionality was restored, and message delivery resumed normal operations.

Preventive Measures: To prevent a recurrence of this issue, we are implementing the following measures:

  1. Configuration Management: Enhance our configuration validation processes to catch such issues before deployment.
  2. Pod Restart Protocols: Investigate and address the cause of the unexpected EU cluster restart.
  3. Service Dependency Monitoring: Improve monitoring of service dependencies to quickly identify and resolve cascading failures.

We apologize for any inconvenience this incident may have caused. Our team is committed to ensuring the reliability and robustness of our services.

Posted Jun 06, 2024 - 13:06 CEST

Resolved
All systems on GatewayAPI.EU are now back to being fully operational again.

There will be postmortem to follow.

We are sorry for any inconveniences this may have caused.
Posted Jun 01, 2024 - 11:37 CEST
Investigating
We are experiencing issues with GatewayAPI.EU and are now investigating the root cause.

All systems on GatewayAPI.EU are offline for the time being and we encourage you to stop all traffic on gatewayapi.eu until further notice.

We are currently working on resolving the issue.

The .COM setup is still working as intended.

We will keep you updated.
Posted Jun 01, 2024 - 10:44 CEST
This incident affected: Dashboards - Commercial & Europe (Dashboard - Europe) and APIs - Commercial & Europe (API - Europe).