Admin portal and Control plane API down

Incident Report for Firezone Production

Postmortem

At April 24th 2025, 1:38 UTC we detected issues with our application servers coming back online after a recent deploy. Typically deploys cause zero downtime and go unnoticed by our customers.

This time however, the deploy caused an outage due to a hung migration which prevented other app servers from successfully issuing queries against the database.

After identifying the issue, we immediately began taking steps towards remediation, starting with reaping the problematic app servers that were timing out.

We then increased the healthcheck timeout on our app servers to allow the hung migration to resolve itself successfully.

At April 24th, 2:01 UTC, the migration successfully completed, and all app servers began returning to a healthy state, resolving the incident.

Posted Apr 24, 2025 - 03:38 UTC

Resolved

The incident has been resolved.
Posted Apr 24, 2025 - 02:07 UTC

Update

We are continuing to investigate this issue.
Posted Apr 24, 2025 - 01:38 UTC

Investigating

We are currently investigating continued alerts for the admin portal and control plane API being down.
Posted Apr 24, 2025 - 00:31 UTC
This incident affected: Control Plane API and Admin Portal.