Erigon V1.4.0: Troubleshooting High Disk Reads And Slow Sync
Experiencing high disk read volume with Erigon v1.4.0? You're not alone! This article dives into the issue of high disk reads observed in Erigon v1.4.0, along with the accompanying slow synchronization speeds. We'll explore potential causes, analyze the provided logs, and discuss possible solutions to get your node back on track. Let's get started, guys!
Understanding the Problem
The primary concern is the high disk read volume encountered after upgrading to or running Erigon v1.4.0. This is manifested in two key ways:
- Slower Synchronization: The node takes significantly longer to synchronize with the blockchain compared to previous versions. Even after a resynchronization, the speed at which it catches up with the latest blocks is noticeably reduced.
- Prolonged Catch-Up Time: Following a snapshot download, the node requires an extended period (more than a day in some cases) to fully catch up to the current block height. This can be a major bottleneck for those needing to quickly spin up or recover a node.
From the information, the user reports that both the new and old versions have recently experienced high disk read volumes. After resynchronization, the synchronization speed is slower than before, and it's taking a long time to catch up with the latest blocks. It can take over a day to catch up after downloading the snapshot. This indicates a potential performance regression or configuration issue that needs to be addressed. So, we have to delve into the possible causes and solutions.
Analyzing the Provided Information
To effectively troubleshoot, let's dissect the system information and logs provided:
- Erigon Version: v1.4.0
- Operating System: Windows/Linux/OSX (This suggests the issue is platform-independent)
- Chain/Network: (Not specified, but important to consider. Is it Mainnet, a testnet, or a private network?)
Log Snippet Breakdown
The log snippet offers some clues:
PruneSmallBatches finished
: This indicates that the pruning process, which aims to reduce disk space usage, is completing. However, thetook=2m7.590609906s
part shows it's taking a significant amount of time. The statistics within this log line provide insights into the amount of data being processed during pruning for different components like accounts, code, commitment, receipt, storage, etc.Committed
: This log line confirms the commitment of a block (block=63186902
). Thetime=7m23.123750875s
value shows the time taken for this commitment, which seems high. The breakdown of time spent on different tasks likeflush
,compute commitment
,tx.commit
, andprune
are valuable for identifying bottlenecks.[6/8 Execution]
: These lines relate to the execution stage of block processing. Theblk/s
(blocks per second) andtx/s
(transactions per second) values give an idea of the processing speed. ThestepsInDB=0.10
suggests that a small percentage of steps are being read from the database, indicating that the majority of data is in memory.GoodPeers
: This confirms that the node has a healthy number of peers connected.txpool stat
: This provides information about the transaction pool, indicating the number of pending, baseFee, and queued transactions.memory stats
: These statistics provide a snapshot of the node's memory usage, including RSS (Resident Set Size), PSS (Proportional Set Size), and other memory-related metrics. These stats can be helpful in diagnosing memory-related performance issues.
Possible Causes and Solutions
Several factors could contribute to the high disk reads and slow synchronization:
1. Disk I/O Bottleneck
- Problem: The most likely culprit is that your storage device (HDD or SSD) cannot keep up with the read/write demands of Erigon. HDDs are significantly slower than SSDs. Even with an SSD, performance can vary greatly.
- Solution:
- Use a High-Performance SSD: If you're using an HDD, upgrading to a high-performance NVMe SSD is highly recommended. This will drastically improve I/O speeds. Make sure the SSD has sufficient capacity for the entire blockchain data.
- Check Disk Usage: Monitor your disk I/O using tools like
iostat
(Linux) or Resource Monitor (Windows) to confirm if the disk is indeed the bottleneck. High disk utilization (close to 100%) indicates a problem. - RAID Configuration: Consider using a RAID configuration (e.g., RAID 0) with multiple SSDs to further increase I/O performance.
2. Insufficient RAM
- Problem: Erigon is a memory-intensive application. If your system doesn't have enough RAM, the operating system will start swapping data to disk, leading to increased disk reads and slower performance.
- Solution:
- Increase RAM: The logs show
Rss=64.8GB
, so ensure you have at least 64GB of RAM. More RAM can significantly improve performance. - Monitor Memory Usage: Use tools like
top
(Linux) or Task Manager (Windows) to monitor memory usage. If the system is constantly swapping, it indicates a RAM shortage.
- Increase RAM: The logs show
3. Erigon Configuration
- Problem: Suboptimal Erigon configuration can also lead to performance issues.
- Solution:
- Optimize Configuration: Review your Erigon configuration file (
erigon.toml
) and adjust parameters like cache sizes, database settings, and other performance-related options. Refer to the Erigon documentation for recommended settings. - Enable Compression: Ensure that compression is enabled for the database. This can reduce the amount of data that needs to be read from disk.
- Experiment with
--batchsize
: Adjust the--batchsize
flag to optimize the batch size for database writes. Experiment with different values to find the optimal setting for your hardware.
- Optimize Configuration: Review your Erigon configuration file (
4. Network Issues
- Problem: Although the logs show