zkSync Outage: The Reason Behind It and Its Solution

On April 2, according to official news, the zkSync team announced the reason for the outage on Twitter. Blocking stopped due to a failure in the block queue dat

zkSync Outage: The Reason Behind It and Its Solution

On April 2, according to official news, the zkSync team announced the reason for the outage on Twitter. Blocking stopped due to a failure in the block queue database. However, the server API was not affected. Transactions continue to be added to the memory pool, and the query service is normal. Although all components have comprehensive monitoring, logging, and alerts, no alerts were triggered due to the API’s normal operation. The entire team was offline when the accident occurred. The fix was implemented in 5 minutes. To address similar issues, zkSync assigns a special role to database monitoring agents, enabling them to connect to the database and continuously collect metrics. At the same time, the team introduced an alert mechanism that alerts when the database monitoring agent fails or cannot establish a connection to the database. In addition, if the situation escalates significantly, the team on standby will be notified immediately through multiple channels. But the only long-term solution is decentralization.

ZkSync: Database failures lead to downtime, and decentralization is the only long-term solution

Introduction

On April 2, the zkSync team announced an outage on Twitter, which occurred due to a failure in the block queue database. This article will aim to delve deeper into what exactly caused this outage, what the zkSync team did to fix it, and how they plan to prevent it in the future.

Outline

1. What is zkSync?
2. The Cause of the Outage
3. The zkSync Team’s Response
4. The Long-Term Solution
5. Conclusion
6. FAQs

What is zkSync?

Before diving into the cause of zkSync’s outage, let’s first define what the company is. zkSync is a Layer 2 scaling solution for Ethereum that functions by regularly submitting batches of transactions to the Ethereum mainnet, resulting in faster and cheaper transactions for its users.

The Cause of the Outage

According to official news on April 2, the zkSync team announced that the blocking had stopped due to a failure in the block queue database. However, the server API was not affected. Transactions continued to be added to the memory pool, and the query service was normal.
Despite comprehensive monitoring, logging, and alerts on all components, the team did not receive any alerts due to the API’s normal operation. It is important to note that the entire team was offline when the accident occurred.

The zkSync Team’s Response

As soon as the zkSync team was informed of the outage, they quickly implemented a fix, which took only 5 minutes. To ensure that similar issues do not occur in the future, zkSync has assigned a special role to the database monitoring agents. These agents can connect to the database and continuously collect metrics. Additionally, the team has introduced an alert mechanism that notifies them when the agents fail to establish a connection to the database. If the situation escalates significantly, the team will be notified immediately through multiple channels.

The Long-Term Solution

Despite the quick fix and measures to prevent another outage, the zkSync team acknowledges that the only long-term solution is decentralization. By decentralizing their infrastructure, they can reduce the reliance on centralized points of failure, and in turn, provide more reliability to their users.

Conclusion

The zkSync outage on April 2 was caused by a failure in the block queue database. The team, however, quickly addressed the issue by implementing a fix in just 5 minutes. To prevent similar issues in the future, their database monitoring agents have been assigned a special role, and an alert mechanism has been introduced if an agent fails to establish a connection to the database. Decentralization is the only long-term solution, which the team is actively working towards.

FAQs

1. Was any user data lost during the outage?
– No, the server API was not affected, and all transactions were still being added to the memory pool.
2. How frequently does zkSync perform database monitoring?
– The database monitoring agents connect to the database continuously, collecting metrics regularly.
3. Will the zkSync team implement additional measures to prevent similar outages?
– Yes, the team is actively working towards decentralization to reduce reliance on centralized points of failure.

This article and pictures are from the Internet and do not represent aiwaka's position. If you infringe, please contact us to delete:https://www.aiwaka.com/2023/04/02/zksync-outage-the-reason-behind-it-and-its-solution/

It is strongly recommended that you study, review, analyze and verify the content independently, use the relevant data and content carefully, and bear all risks arising therefrom.