Marketing Enterprise (Real Magnet) - Campaign Deployment/Sending Delays

Incident Report for Higher Logic Platform

Postmortem

Incident Root Cause Analysis
Date: September 25, 2023

What Happened

A brief interruption in communications between the central database and a database
containing a specific set of customers (Shard 4) resulted in an incorrect attempt to
reprocess a batch of records. The attempt to re-insert records with the same unique
identifiers failed and blocked the ability to process any further campaigns for the
customers on Shard 4.

Timeline (all times EDT)

September 21, 2023

  • 10:30 AM – The first issue was reported to customer support.
  • 4:06 PM – The issue was escalated to development after multiple customers reported
    the issue happening to their campaigns.
  • 5:33 PM – An attempt was made to resolve by removing one of the records that
    seemed to be the cause of the blocked campaigns.
  • 8:59 PM – Development was notified that deleting the one record did not allow the
    campaigns to process again. Additional work by development proceeded.

September 22, 2023

  • 12:57 AM – All of the duplicate batch were removed, and campaigns began processing
    normally.

Root Cause
A loss of communication between the central database and the customer database (Shard
4) caused a duplicate batch to be reprocessed.

Details
The campaigns were blocked because the database was trying to insert duplicate
records. Once these duplicate records were addressed, the campaigns began
processing again.

Corrective Actions
The duplicate records were removed allowing the normal processing of other
campaigns.

  • Update the database table design to gracefully handle when there is an attempt to
    insert a duplicate record.
  • Add monitoring alerts to detect this error proactively and provide for more timely
    remediation.
Posted Sep 26, 2023 - 11:01 EDT

Resolved

We have confirmed that this incident is fully resolved. We have not seen further issues in the past 12 hours. We plan to have a root cause analysis (RCA) ready for distribution in the next 3 business days.

Thank you for your patience as we worked to resolve this issue.
Posted Sep 22, 2023 - 12:24 EDT

Monitoring

We implemented a fix around 1 AM ET last night and are monitoring that the fix fully resolved the issue. We will provide another update once we have fully confirmed that the fix is effective.
Posted Sep 22, 2023 - 08:00 EDT

Update

We're still experiencing deploying and message sending with campaigns. So far our attempts to resolve those issues have been unsuccessful. We are continuing to investigate and troubleshoot and will continue to share updates. We do not have a timeline for resolution at this time.
Posted Sep 21, 2023 - 21:18 EDT

Investigating

A subset of customers are experiencing issues deploying a campaign and/or delays with messages being sent from the campaign module. Our Engineering team has been notified and they are investigating the issue.

In the meantime, please leave your campaigns as they are and they will deploy or send the message once the issue has been resolved.

We apologize for the inconvenience and appreciate your patience.
Posted Sep 21, 2023 - 17:10 EDT
This incident affected: Marketing Enterprise (Real Magnet).