Incident Root Cause Analysis
Date: September 25, 2023
What Happened
A brief interruption in communications between the central database and a database
containing a specific set of customers (Shard 4) resulted in an incorrect attempt to
reprocess a batch of records. The attempt to re-insert records with the same unique
identifiers failed and blocked the ability to process any further campaigns for the
customers on Shard 4.
Timeline (all times EDT)
September 21, 2023
September 22, 2023
Root Cause
A loss of communication between the central database and the customer database (Shard
4) caused a duplicate batch to be reprocessed.
Details
The campaigns were blocked because the database was trying to insert duplicate
records. Once these duplicate records were addressed, the campaigns began
processing again.
Corrective Actions
The duplicate records were removed allowing the normal processing of other
campaigns.