How I replayed our full nodes in 18-19 hours instead of 2.5 days

in #steem6 years ago (edited)

With the recent update to steemd version 19.3, I had to do a lot of full node replaying, which can take anywhere from 1-30 days depending on the hardware and plugins enabled.

The full nodes I manage typically take somewhere between 2-3 days to fully replay with account history and all other plugins. The hardware isn't cheap either, it is about the best you can get without building your own configuration and self-hosting.

Full Node Configuration

AMD EPYC 24 Core CPU
512GB Ram
2x 960GB NVMe Gen 3 Data Center Drives
1Gbit/s Internet

This means anything that causes a replay is extremely time consuming and frustrating to resolve. Luckily we have two similar servers in a cluster when one fails I can easily remove it and replay it while the other picks up the load. If both have to be upgraded, they can be staggered and upgraded one at a time without downtime.

In the case of the recent emergency patch, both full nodes had to be replayed and this causes a lot of difficulties when trying to maintain service but also patch quickly. Not only do both nodes have to be updated quickly, most public full nodes are doing the same thing at the same time. A common practice with full nodes is to redirect queries to another node during a replay, this maintains service while allowing you to pull your node out of use. This is effective during failure or when you have other nodes you maintain, but when all full nodes are patching at the same time, this becomes ineffective.

During this wave of replays, I tested two new tricks that dramatically sped up my replay times and allowed me to patch and bring all four full nodes I maintain online much quicker than normal.

Follow Feed Calculation

The first is a trick I picked up from @gtg (thanks!) and it involves specifying a start time for steemd to start populating follow plugin feeds. Follow feeds are primarily used by a condenser (front ends like steemit.com) and are not used by most services that use public full nodes.

In my case, upon replaying my nodes, I told steemd not to process feeds earlier than 7 days ago. This is done with the --follow-start-feeds= parameter to steemd. It takes a unixtime stamp as a parameter, which is easily calculated using UnixTimeStamp.com.

Example

~/bin/steemd -d data --replay-blockchain --follow-start-feeds=1524355200

Combined with the next suggestion, my replay times went from around 2.5 days to about 18-19 hours. While replays are still extremely painful, this is a huge improvement in recovery time from a node failure or patch that requires a replay. Every day that passes increases the potential replay time of a full node (including witness and seed nodes).

Remove block_log from OS cache

The other trick is a suggestion from Github user theoreticalbts is to drop the block_log file from the OS cache, giving more ram to the shared memory file. The block_log is currently 101GB and growing very rapidly. This eats up a significant portion of ram even on a 512GB server.

There are two ways to accomplish this, both have different implications. Further testing needs to be done to see which one is better overall.

During a replay, run the following script to purge the block_log from the OS cache. This will maintain the maximum amount of ram for steemd and it's shared memory file.

while :
do
   dd if=blockchain/block_log iflag=nocache count=0
   sleep 60
done

You will need to adjust the path to your block_log if you do not run it from the blockchain directory. Leave this script running while you do a replay. I am not entirely sure how much of an improvement this step alone made as further testing is needed to isolate the improvements by this trick alone.

There is another approach @bhuz uses to accomplish the same thing, but in addition to purging the block_log this will purge the shared memory file. This may provide additional benefits of clearing dirty pages even outside of replays, but further testing is needed.

This can be done with the following command:

echo 3 | sudo tee /proc/sys/vm/drop_caches

Conclusion and failures

In the recent replays, I used both tricks to drastically reduce how long replay took.

Unfortunately, I screwed up and accidentally built some of the nodes using low memory (setting I only use on witness nodes) as I was upgrading 9 nodes during the course of 24 hours. Upon realizing this, I fixed it and restarted the replay.

What I didn't realize, is low memory setting persists even after turning it off with a new build. (Thanks @anyx for pointing this out) Which caused some weird issues with the node until I realized it was in low memory mode even though I built without low memory. The solution for this is to delete the CMakeCache.txt prior to doing a rebuild.

It's been a trying few days, but learned a few interesting things and have a few new tools to dramatically speed up recovery time in future replays. While most people this isn't very interesting, those running full nodes may save some sanity with this advice.

Thanks again @gtg, @anyx, @bhuz, and Github user: theoreticalbts (I suspect there is a Steem user behind this account).

X48EJ

Why you should vote me as witness

Witness & Administrator of four full nodes

themarkymark.png

My recent popular posts

STEEM, STEEM Power, Vests, and Steem Dollars. wtf is this shit?
The truth and lies about 25% curation, why what you know is FAKE NEWS
WTF is a hardware wallet, and why should you have one?
GINABOT - The Secret to your Sanity on Steemit
How to calculate post rewards
Use SSH all the time? Time for a big boy SSH Client
How to change your recovery account
How curation rewards work and how to be a kick ass curator
Markdown 101 - How to make kick ass posts on Steemit
Work ON your business, not in your business! - How to succeed as a small business
You are not entitled to an audience, you need to earn it!
How to properly setup SSH Key Authentication - If you are logging into your server with root, you are doing it wrong!
Building a Portable Game Console

Sort:  

It's a pity that only spammers are commenting under posts like these. I would like to see other witnesses/programmers/developers discussing what you did or atleast congratulte you on your succes or something. The main thing i'd like to see is the community learning from each other. Anyways, well done sir!

The audience for this type of post is tiny, so not super surprised. But those who find it helpful it is extremely valuable.

I'm trying out --follow-start-feeds=1524355200 for a node replay now, thanks for the tip

Let me know how it goes!

It just finished now, definately faster. I didn't time it properly but you could get a pretty close estimate from my comment timestamps (~31 hrs this time)

How long it take previously? Did you use the cache clearing of the block_log as well?

At least 2 full days on the previous replay, this time I rebuilt steemd with DSKIP_BY_TX_ID=OFF so that may have affected how long it took to replay. And I used the /proc/sys/vm/drop_caches trick

great story.i just loved it.

great your post

Gammer, weldon guy.

Keep it up boss. Your post is always encouraging. Have learnt alot from it and i hope to see more of it from you again @markymark

Good post !!!

I suspect theoreticalbts is one and the same person with @theoretical who is one of the main authors of the SMT whitepaper.

Technical question (I'm trying to up my game and want to learn how to set up and run a witness): the steem blockchain produces 28800 blocks per day of up to 65 Kb each. That means that theoretically the blockchain could grow to up to 1.78 GB per day which means that by now, after 2 years, the full steem blockchain could (in theory, at maximum space utilization) be occupying 1.2 TB of disk space.

Now it is probably a lot smaller than that in reality but I was wondering, is there some kind of archiving solution already in place or it's considered too early to start thinking about that ?

Oh really sucha amazing news well done my friend , great job

Your post is always different i follow your blog everytime , your post is so helpful . I always inspire of your post on my steem work . Thank you for sharing @themarkymark

Follow my blog @powerupme

Hey Marky Mark, any interest in doing a slightly fluffy, slightly educational witness interview? I've done about 20-25 with some of our top witnesses in the past several months and looking for some other witnesses to take part, in an attempt to increase witness transparency