Informz Integration Issue
Incident Report for Higher Logic Platform
Postmortem

Incident Root Cause Analysis

Date: 11/22/2021

What Happened
On November 21, 2021, Higher Logic became aware that a number of its customers’ IPSEC tunnels, supporting various integrations, were down and unable to be started.

Timeline
All times Eastern unless otherwise noted.
• Nov 19, 2021 – 11:45pm – As part of a scheduled quarterly maintenance window, Higher Logic staff upgraded the operating system on customer facing ASA firewall devices.
• Nov 20, 2021- 1:00am – Normal testing of these devices appears to show that they are functioning as expected.
• Nov 21, 2021 – 4:32pm – Higher Logic customer support staff are alerted that several customer tunnels are down, and have been for approximately 1 day.
• Nov 21, 2021 – 5:30pm – Root cause is unknown, escalation occurs.
• Nov 21, 2021 – 6:44pm – Root cause is still unknown, escalation occurs.
• Nov 21, 2021 – 7:54pm – Root cause is determined and a rollback of the upgraded operating system is proposed and validation begins.
• Nov 21, 2021 – 8:27pm – Rollback is initiated (US).
• Nov 21, 2021 – 8:36pm – Rollback complete (US).
• Nov 21, 2021 – 8:43pm – Rollback initiated (CA).
• Nov 21, 2021 – 8:50pm – Rollback complete (CA).
• Nov 21, 2021 – 9:40pm – Incident closed.

Root Cause
As part of a scheduled quarterly maintenance window, Higher Logic staff upgraded the operating system on customer facing ASA firewall devices. A modification to the supported IPSEC DH groups in the new operating system resulted in critical compatibility issues with configured tunnels in production when those tunnels attempted to use DH groups that were no longer supported. Although the change was successfully staged in an RC environment prior to deployment, it is not possible to validate every eventual possible configuration at that time, and therefore test coverage was insufficient to detect the effects of these changes in advance. When the upgrade was deployed, these tunnels began to fail.

Next Steps
• Ensure adequate test coverage for most common / recommended tunnel configurations in RC.
• Ensure that monitoring is in place for customer facing tunnels.
• Modify maintenance protocols to include validation of up/down status of tunnels after firewall patching.

Posted Nov 22, 2021 - 12:43 EST

Resolved
This incident has been resolved.
Posted Nov 22, 2021 - 11:41 EST
Monitoring
Our Engineering team has implemented a fix and are monitoring the issue. We are starting to see previously failed mailings starting to go.
Posted Nov 22, 2021 - 11:04 EST
Investigating
We are currently investigating issues related to some of the Informz Integrations. This issue is causing mailings that were pre-scheduled or set to sent now to fail. We recommend holding off on trying to resend this mailings or use the "Bypass Sync" option within the mailing.
Posted Nov 22, 2021 - 09:08 EST