I have a question about Block Producers that has been rattling my brain for a few days. I have read more than 50 BP candidates announcements and have been following several telegram groups.
The question is, why are so many BPs proposing that they will have block producing nodes (servers) running at multiple data center locations? Now, before you just spit out the answer “redundancy and fail over”, let me explain why this is perplexing to me.
Each location that a BP operates, will likely be completely redundant. That means they will be hosted at tier 3 or better facility, which have redundant internet links, power sources, security, etc. Then, within that facility, each BP will likely have multiple active servers with load balancing, and even hot swap-able backup servers.
As a result, these nodes will have up-time measured by “nines” of reliability between 99% and 99.999%. When built correctly, the only thing that should bring a node down outside of human error will be natural disaster, government intervention, war, etc. (and obviously, we should always vote for locations where these types of risks are minimized).
Now, when you look at the EOS network, there will be 21 nodes that produce blocks at any given time. And there will be another 50 – 80 standby nodes, ready to step in at any moment. The sum of all 21 primary locations might only be down 1% of the time. In the big picture, this is of zero concern, because the standby nodes will step in without issue.
So, the only reason a BP would run multiple locations is self-serving. It will not improve the reliability of the EOS network. What it would do, is keep that specific BP producing blocks (and money) if it was their location that suffered the disaster.
But at what cost? A heavy one. And who is really paying for it? EoS coin holders. For each BP to have secondary and third locations, it will add 2x-3x to the cost on the equipment and recurring data center fees. This is a huge waste of money. Why would an EOS coin holder want to fund BPs who have 2x-3x the costs, when it brings zero advantage?
To me, any BP that is proposing to run servers at multiple locations is either greedy or does not understand that the real value of the EOS network is server decentralization run by different BPs. If there is third reason, I need someone to explain it to me. But before you answer, I have one more point.
I get that BPs are going to spend a lot of money to get up and running, and that if their only location goes down due to a major event like natural disaster, it will affect that BP financially. And if they are down for an extended amount of time, it could cripple that company into bankruptcy, which no one in the EOS ecosystem would want. So, I get the logic behind BPs needing to protect their investment. But there are much better options than funding EVERY BP at 2x-3x the required operational cost.
One option would be proper business interruption insurance. A second option would be to self-insure. The BP community and coin holders could just take 0.1% of all BP rewards and place into a disaster fund. Then, this fund can be used to subsidize any BP that endures such a natural disaster until they get back up and running – even if that must be a new location.
In my opinion, both the options that I proposed would save tens of millions of dollars annually, compared to the design that almost all top BPs candidates are proposing--running servers at multiple locations per BP.
Ok, so, now you have my brain dump, tell me where I am going wrong, and why we should fund individual BPs to have servers at multiple locations?
So you kind of answered reasoning behind it perfectly:
"I get that BPs are going to spend a lot of money to get up and running, and that if their only location goes down due to a major event like natural disaster, it will affect that BP financially. And if they are down for an extended amount of time, it could cripple that company into bankruptcy, which no one in the EOS ecosystem would want. So, I get the logic behind BPs needing to protect their investment."
The fact is once you are out for extended period of time - you loose your BP seat at a table - if you can not keep up - you are voted out.
So recovery needs to be fast. I do think that collaboration and leasing resources between BPs would be a good option too.
I have learned a few things since my original post. So, I will respond to myself, since I was operating with a misconception.
I thought that the EOS network could seamlessly switch between block producing companies if blocks are missed by an active BP. EOS will seamlessly replace a BP with a standby BP if the vote warrants it. But will not automatically swap in a standby BP just because active BP misses a few blocks. This won’t happen until the active BP is offline for 24 hours. Thus, if a BPs primary location goes down, they can switch to another active server faster and more efficiently that promoting a standby.
Additionally, p2p communication and thus network performance is more efficient with less nodes.
These two points have lead me to now believe it is best to have 50-75 block producers (companies), each with 2-3 layers of redundancy within their networks. And then maybe 25-50 emergency block producers (companies) that run 1-2 servers in low cost operation. This set-up minimizes the number of missed blocks when active block producing loses a site.