Atlassian says this month’s two-week cloud outage has affected nearly double the number of customers it initially estimated after learning of the incident.
As revealed by the company’s chief technology officer, Sri Viswanath, on April 14, nine days after the incident began, a maintenance script accidentally wiped hundreds of customer sites due to communication problems between two Atlassian teams working to disable a legacy application.
However, instead of receiving the ID required to disable the app, the deactivation team received the IDs of the cloud sites where the app was installed.
Also, the script was started with the wrong execution mode (ie permanently deleting data instead of deleting it with failsafe recovery).
The 14-day outage affected a very small group of Atlassian customers between April 5 and April 18. The first set of affected sites were restored by April 8 and the rest of the affected customer sites by April 18.
During the incident, the following Atlassian products were not available to affected customers: the entire Jira family of products, Confluence, Atlassian Access, Opsgenie, and Statuspage.
We have now restored our customers affected by the outage and have reached out to key contacts for each affected site. https://t.co/ZvAFZ2pq8A
– Atlassian (@Atlassian) April 17, 2022
The blackout affected a total of 775 customers.
While Atlassian told us when we first reported this outage that the sites of about 400 of its more than 200,000 cloud customers were wiped, Viswanath revealed on Friday that the actual number was nearly twice that.
After analyzing data collected during the incident investigation, Atlassian’s estimate also changed to include affected inactive, free, or small accounts with a low number of active users.
“The result was an immediate removal of 883 sites (representing 775 customers) between 07:38 UTC and 08:01 UTC on Tuesday, April 5, 2022,” Viswanath said.
“Although this was a major incident, no customer lost more than five minutes of data. Additionally, over 99.6% of our customers and users continued to use our cloud products without any interruption during restoration activities.”
While a small number of Atlassian customers have restored their Confluence or Insight databases and lost five minutes of data, Atlassian says it was able to recover it and is working to restore all data.
“We have since recovered the rest of the data, contacted customers affected by this, and are helping them apply changes to further restore their data,” Viswanath added.
Not the result of a cyber attack
Atlassian initially estimated that restoration efforts would take no more than several days and confirmed to BleepingComputer that there was no unauthorized access to customer data as this outage was not caused by a cyber attack or malicious insider.
“Wider public communications around the outage, coupled with repeating the critical message that there was no data loss and this was not the result of a cyberattack, would have been the right approach,” Viswanath said.
“Instead of waiting until we had a complete picture, we should have been transparent about what we knew and what we didn’t know.
“Providing overall restoration estimates (even if directional) and being clear about when we expected to have a more complete picture would have allowed our customers to better plan for the incident.”
The outage affected customers using the company’s cloud products and came after Atlassian announced in October 2020 that it would no longer sell licenses for on-premises products as of February 2021.
One of Atlassian’s co-founders and co-CEOs, Scott Farquhar, also added that support for already active licenses would be discontinued three years later, on February 2, 2024.