Damn you 😡 It is still going. You were right and I remembered it wrong. I've dug out a 15 months old results of full sync and it was running over 37 hours up to 72M+. Compared to that current version appears to be slightly faster, but still couple times slower than what I thought it would be.
To be honest it smells like a bug (or more optimistically - as an optimization opportunity). There are couple of hiccups when node is not receiving blocks fast enough, but for the most part block processing is reported at close to 100% time. On the other hand computer seems to be sleeping, using around single core only, which is weird, since decomposing signatures, that used to make sync 7 times slower than replay, since HF26 is supposedly done on multiple threads and preemptively, as soon as block arrives, so I'd expect at least some bursts of higher CPU activity. Maybe I should use some config option for that?
It would be nice to have a comparison on the same machine: pure replay vs replay with full validation vs sync.
Signatures are checked ahead of time in separate threads, and sufficient number of threads are default allocated.
Whenever you see block processing at 100%, then the bottleneck is the single-core speed of your system (it's processing operations and updating state).
The results are in:
4921cb8c4abe093fa173ebfb9340a94ddf5ace7a
Performance report (total).
) -124225649 ms
which is34.5 hours
, avg. block processing time (fromPerformance report at block
) is1.423 ms/block
entering live mode
) -143988777 ms
which is40 hours
, avg. block processing time (fromSyncing Blockchain
) is1.649 ms/block
I'm curious how @gtg measurements will look in comparison.
Sync to replay ratio shoots up the most in areas of low blockchain activity, which is understandable, since small blocks are processed faster than they can be acquired from network, but in other areas sync is still 10-20% slower.
And the likely reason I remembered sync as faster than that is due to difference in computer speed - my home computer appears to be over 60% faster than the one I was running above experiments on, which would mean it should almost fit the sync inside 24 hours.
For now I have results for first 50M blocks:
6:32:45
43.466s
61.132s
x1.4064
11:03:00
84.337s
395.575s
x4.6904
14:31:33
103.266s
182.288s
x1.7652
I just counted last 100k block times (cpu / real) so it's not a great measurement. I can have better numbers once I complete those runs. But it seems that replay with validation can somehow make a better use of multiple threads than validation during resync.
It might be the state undo logic slowing down blockchain processing in a sequential manner (this computation is probably skipped for replay+validate). But I doubt there is a way to disable it to check that, short of modifying the code for the test.
Probably we should modify the code dealing with checkpoints to skip undo logic up to the checkpoint. This would allow us to confirm if it is the bottleneck, and it would also give us a speedup when checkpoints are set if it turns out to be the bottleneck.
It should be easy to test - just cut out two lines with
session
indatabase::apply_block_extended
(I'm actually assuming that out of order blocks won't reach that routine during sync, but if they do, it would be a source of slowdown).I'd be surprised if undo sessions were the problem. They are relatively slow and worthy of optimization, but in relation to simple transactions, mostly custom_jsons, so their performance is significant when there is many of them, like during block production, reapplication of pending or in extreme stress tests with
colony
+queen
. During sync we only have one session per block.Yes, your assumption is correct, blocks are strictly processed in order during sync, the P2P code ensures this. If it's easy to test, let me know what you find out: I guess work with @gandalf so that the test is performed on the same machine as previous measurements.