Steem Blockchain Patch Issued

in #steem7 years ago (edited)

Steem BC Patch.JPG

The Steem blockchain recently stalled at block 23847548 due to an invalid transaction that was allowed into a previous block. The Steemit development team quickly identified the cause and issued a patch, which was then deployed by a majority of the witnesses. At this time, the Steem blockchain has resumed normal operation. At no point during the event were user accounts or tokens at risk.

It is important to emphasize that what took place was actually the result of a protection mechanism built into the blockchain preventing the invalid transaction from doing any real harm. While it is unfortunate that operations were suspended during this time, it is these protection mechanisms that help ensure that accounts remain safe and secure even in the face of unforeseen events.

Cause

Seven days ago, an account (@nijeah) attempted to submit a transaction that would have resulted in a negative balance of STEEM being powered down from their account. The blockchain has safety rules that forbid such a transaction from occurring, but these rules did not forbid the invalid transaction from being submitted, even though its attempted execution--which would occur seven days later--would not be allowed to occur by the blockchain.

When the scheduled power down occurred, witness nodes were unable to process the transaction--and all subsequent transactions--due to the aforementioned rules. This is what we refer to as “halting” (as opposed to something like “forking”). A code change was needed in order to define how to properly handle this behavior.

Solution

The Steemit development team, along with the assistance of several of the witnesses, was able to quickly identify the root cause of the problem. As soon as the cause was identified, a patch was issued and the rollout of the patch was coordinated with the top witnesses.

Within only a few hours of the issue occurring, the patch was applied by a majority of the witnesses, and the Steem blockchain resumed normal operation.

Instructions for node operators

This section contains instructions for node operators who still need to apply the patch.

All nodes running 0.19.3 should update to release version 0.19.5 to start receiving blocks again. The patch will not require a replay.

If you were running the AppBase release candidate (0.19.4), a new release candidate (0.19.10) will be made shortly. Alternatively, you can run the branch 20180702-fix-vesting-withdrawals-steemd to get the patch now.

Impact

Any transactions that were submitted during the time the blockchain was halted would have resulted in an error. Some pending transactions that were submitted just prior to the halt may not have been included in blocks, and would have expired. Affected transactions would need to be resubmitted, as they would not have been included in a block and are no longer valid.

Other than the period of time where no new transactions were allowed, there was no additional impact from the event. Everybody’s tokens remained safe, and accounts were not at risk of being hacked.

Conclusion

We want to thank everybody involved for their responsiveness during the event. It is a great testament to our amazing blockchain team and Steem witnesses that we were able get the blockchain back to operational status in such a short period of time.

Great job to everyone involved! Steem on!

Team Steemit

Sort:  
There are 2 pages
Pages

A wild night, excellent teamwork, and a quick summary and explanation. While halting can be scary, it's a clear and effective way to prevent transactions that could have a huge impact on funds and security.

I'd like to extend a huge, huge thank you to everyone involved in both helping users understand to hold tight and that the chain remained uncompromised while working to have nodes ready to resume, but even more so...

hefty appreciation to those up all night who may not look for or be individually rewarded with personal recognition for the hours of intense coordination and professionalism required to go from full stop to back on track in so little time. Thanks, truly.

I didn't even knew about this one, the devs and the witnesses involved acted so fast to implement and run this patch, which is definitely amazing!

What makes me curious is the fact that nobody tried to power down more SP than they had, at least not by now. This is one of the reasons why Steem is still in beta and actually we are the beta testers.

So, somehow, even though he has done a bad thing, I guess that we should congratulate @nijeah or who is behind that account for highlighting this vulnerability in the Steem code-base. It is definitely better now than later :D

Powering down more SP than you have was always checked and rejected immediately. In this case the missing check was for "negative power down" (which could also be described as attempting to use the power down command to power up). No one had been creative enough to try that yet!

Okay, I got it now, pretty intelligent, I must admit! So if I send to somebody -2 Steem, that person is actually sending me 2 Steem :))

That was a tricky one!

Damn... that was possible up until a few days ago?

Guess we have to thank @nijeah for "finding" this bug!!

Noow I get it :)

delete

After giving it a bit of thought, I would guess that @nijeah delegated his/her steem power to another account at the same time they powered down their Steem Power, done from two different browser tabs.

One witness could have processed the Steem Power Delegation, while the next block processed by a different witness handled the Power Down before the previous block was confirmed.

I'm even more confident now that the Steem network can handle any possible "monkey wrench" that may be thrown into the mix. Great teamwork !!!

This is precisely why Proof-of-(mis)Stake is flawed.

Can you imagine any other system going down that processes monetary value for a few hours retaining its userbase?

Sure, they fixed it -- but it took a lot of manual intervention. Doesn't inspire confidence in sanity checks and consensus mechanisms.

I don’t know if it is accurate to say that this was an issue related to the DPOS algorithm. Also, the practice of stopping operations if an unexpected scenario is triggered is pretty standard - afaik all of the major crypto currency exchanges have similar mechanisms in place z

Bitcoin experienced a similar incident early in its life and the community and block producer response was similar to the response to this incident. https://en.bitcoin.it/wiki/Value_overflow_incident

There have been comparable incidents on other proof of work blockchain networks.

Whether your argument is for Bitcoin maximalism or the superiority of proof of work over proof of stake or proof of delegated stake (what Steem uses), neither is well supported by your inference of a spotless history for them.

When the ledger and associated funds could be compromised by potentials like double spending or printing out of thin air, I do think one of the best and most reasonable responses is a temporary network stoppage that does not require the complex ethical consideration that undoing, forking out, or changing transactions would require on top of important code/patching work.

Can you imagine any other system going down that processes monetary value for a few hours retaining its userbase?

Its called lunch hour at my bank...

heaven help me, I chortled.

just a regular guy ..having a regular pizza ..in a regular pizza pouch

#DontJudgeme

This is now associated in my mind with bankers - thank you.

Can we resteem a comment? My 100% vote isnt enough to convey the lols you gave me.

Appreciate that...you can resteem any of my latest posts

:)

Actually yes, this happened twice this year at my bank, operation were halted for 48 hours the first time and 6 hours the second time. The whole bloody bank stopped working for 2 days while IT people were scrambling to find and correct the problem. And still the bank retained the userbase bevause peopel are lazy and the bank compensates people who can substantiate claims that they had losses (because they couldn't buy or sell a financial instrument or repay a debt that was due, etc.)

delete

Really? Banks and credit card companies experience security vulnerabilities frequently. The difference is if you have your keys here then just relax and steem on.

u did good crimmy. thanks for keeping all of us in palnet up to date with what was happening.

@crimsonclad van this happen again?

It cannot. The patch has ended this exploit, and put a check in place to reject the transaction instead of freeze the chain! The fact that a patch was developed, tested, applied, the chain restarted, and a rolling upgrade across the network begun all in less than twelve hours is pretty amazing. A lot of great people stayed up all night and worked hard behind the scenes to make sure this loophole was closed before it could harm anyone or chain function again.

Thanks for your answer!

Thank you for this update, and well done to everyone who were participant to getting the blockchain back to normal operations.

With that said, I would like to submit for your consideration — because "stoppages" of one kind or another make people nervous — that you implement some kind of communication method for the general user base when things are out of sorts.

Most major multi-user sites have cloud-hosted "fallback sites" that operate completely independently of the main venue. This could be something like a separate "steemstatus.org" domain fully disconnected from Steemit and the blockchain. If something "goes awry," every request instead forwards to the contingency site (can be triggered automatically, or manually, depending on situation) where a live feed (blog style, message board style) provides anyone trying to access the main venue with a live news feed, or at least an "outage message."

eBay, for example, is really good about that. In "our" industry, Coinbase has it. It builds confidence in a system if users — rather than just finding darkpages and error messages — land on a page that simply has a message "A faulty transaction has caused a temporary stoppage of the blockchain. Our technicians are aware of the problem and are currently working on implementing a patch to address the problem. Check this site for updates."

It's a relatively minor thing; could even be run from a simple WordPress blog... but the communication would build a lot of confidence in the community that "someone's working on it."

Just a suggestion from a relatively small newbie (albeit with 40 years in the IT field); hope you'll consider it.

=^..^=

and this is why ure my fave cat on steemit... well, other than my kitty :D

Thanks for that idea!

Indeed, a really good idea.

Particularly important as the new account creation features under HF.20 might increase the growth of memberships.

Yes, this type of communication is needed. Otherwise rumors run wild, and that's never a good thing.

hacker.gif

Yeah I like to hack..

Hack my way into your hearts...

ladies !!

real hackers use 1 keyboard for faster hacking.

two powergloves, or death

Popup windows of spam ads...or death !!

Very nostalgic those big screens

Best meme I have seen in some time. It was like a meme, only super.

Best meme I have seen
In some time. It was like a
Meme, only super.

                 - arrowj


I'm a bot. I detect haiku.

Funds are safu.

Thanks for the detailed information on this odd incident. I'm proud of and very grateful for the robust response of all the devs and witnesses to get this patched as soon as possible. I'm left with an even greater confidence in the integrity of the STEEM blockchain.

P.S. Off topic (and nitpicking), meet my good friend the em dash: (—) It looks more streamlined and professional than using two hyphens where you want long dashes.

When the scheduled power down occurred, witness nodes were unable to process the transaction—and all subsequent transactions—due to the aforementioned rules.

A minor detail, maybe... but somebody had to be that guy ;-)

That's why I love Steemit. The Team is always ready for any problem. Keep it up Team!!

I was actually pretty impressed with how well most people handled it! Most were calm and waiting for more information.

Nice job by all of handling an issue and moving forward!

Agreed! I feel we're all growing together. We forget that 2 years ago this was all just a theoretical experiment spearheaded by a bunch of weirdos who thought we could reward social media and content in a totally new way. Who would have thought such a rag-tag group of contrarians could become such a unified and powerful community?!

Maybe they were unable to write about their concerns because the blockchain stalled.

There were some tweets from the @steemit account actually

Yes, but it's better to be some sort of communication presented on the main site. I know it's not a priority and Steemit is not eBay and not even Coinbase. It can be for now a page directing to the @steemit Twitter feed for realtime details.

In perspective, it would be better if it would be an interface that any Dapp can point to when there's an issue on the blockchain.

It is a good idea. Non-trivial to implement though, even though it sounds “easy”.

I know... That's why I suggested a temporary solution limited to Steemit and a link to their Twitter feed for updates.

I'm an engineer and I worked in aerospace industry. When you have a system that has a failure, there is always a "root cause and corrective action". I see that a root cause was identified as an unknown vulnerability that existed. The exploitation of that vulnerability didn't affect the accounts, but was effective at shutting down the blockchain.

The missing piece of the explanation is the "corrective action". I see that the fix was put into place, but that is not a corrective action. A corrective action would address why a vulnerability existed for so long and discovered and exploited by a copy and paste scammer. I know that code has bugs and can be difficult to discover every possible vulnerability, but take it from an aerospace engineer, you can go a long way error proofing software.

Besides the corrective action, is there a bounty for finding bugs?

Speaking of bounty. Binance has a bounty fund to pay for those who provide information which brings hackers to justice. Will Steemit Inc do something similar?

Thank you, you are truly amazing. My witness node is currently re-indexing.. but this takes hours. I notice the CPU utilization is not much (basically single core only). Can we parallelize the re-indexing process (at least partially) so it will speed up the process?

I believe there has been some work done on that with AppBase.

Merci pour les informations détaillées sur cet incident étrange. Je suis fier et très reconnaissant pour la réponse robuste de tous les développeurs et les témoins pour obtenir ce correctif dès que possible. Je suis encore plus confiant dans l'intégrité de la blockchain STEEM.

Will there be a time where transactions such as this one will be stopped in their tracks before this 7 day period goes by?

Also, did the user do that intentionally or was there something else involved?

The change that was applied will stop these now before they enter a block.

Ah, the sound of progress.

It was a wild 12 hours or so, but I'm quite proud of the community and how everyone responded. I spoke to multiple Discord channels to explain what I could as it was happening, and most people seemed to understand it was a good thing that the chain halted to protect the system. Great job, community!

and for that, @lukestokes, we thank u immensely

Amen! And great work Luke

Thanks @lukestokes for your support to the steem blockchain and educating as many people as you could!

Regards, @gold84

Thanks, Luke! Good thing I found you online this early morning ( for me ) on Discord. :)

Thanks for your quick responsiveness!

Isn't it actually a good sign that the Steem blockchain immediately disabled the production of any further blocks as a security mechanism?

Great job on both communication and trouble shooting, Team Steemit!

I would say have to say it is.

This provides me with greater comfort knowing this is written into the blockchain. Instead of just proceeding with an invalid transaction, the entire blockchain stops. While a pain in the rear, it is better to be safe than have something major happen.

Safety of tokens should be the highest priority of the blockchain industry. It is good to see that the STEEM team treats it as such. Security is an area that the industry really needs to step up.

Thanks for explaining the real cause of today's Crazy Down Period, it's great that the Development team of Steem Blockchain is super fast and responsible,iam happy that i put my money in a right place.

Team, Excellent work !! ... the proactive action taken to address this bug is admirable. Despite the momentary hardship this increases confidence in the STEEM's developers & in the blockchain.

Thanks for the explanation! Steem used to be down quite frequently but it has been quite good recently with very little downtime which was why this was so unexpected and caused so much panic haha

All hail steem! We are happy that everything is working properly now!

@eurogee of @euronation and @steemstem communities

I love this gif

You did a good job fixing it.

This means that is possible to submit false transactions to the Blockchain?

In pretty much all cases, they will be blocked before they are accepted into a block. There was this one specific scenario though where one was allowed through. This case was patched now, so similar ones will be blocked going forward.

Ok, good work on that exceptions ;)

That's the spirit: Steem on!

Woot!

Thanks for the assiduous work and prompt response.

Thank you for the detailed and clear explanations! I translated this article to Japanese for Japanese community! https://steemit.com/steemit/@katakoto/what-happened-to-steemit-in-early-july-2018-steemit-steemit

Excellent job, I posted this link on my blog and resteemed this post because I think it's important people see this. CHEERS

What I didn't notice here was a proper apology. A lot of people lost a day of Steemit use, putting them a day behind at reaching their goals on Steemit, losses compounded exponentially. Add all that together and quite a bit of currency was lost.... and a loss of at least 1/4 of a day for the website as a whole, the community set back 0.25 days, which in blockchain life is lot of time.
What I heard here was a downplaying or whitewashing of what happened and a quick "well it coulda been worse, and it's over now, so let's move on".
But hey, it's your site, you can run it however you want. We're just the content-creators.

Thank you @steemitblog and all witnesses who worked hard to solve this problem. You give me confidence that this platform is really safe. Off course there is always bad guy who try to disturb this platform but much more good guys who will protect this community 🙏🏾

Great transparent communication. It is really great to be informed and knows exactly what happened.


Congrats, you made the #steemitminute for today!
Click the Image Below to see the Video!

This response is exactly why i believe in Steemit. I am telling ever-one.....One of the true blockchain’s that is productive. When i try to explain blockchain i simply bring up Steem Block Explorer and show them the blockchain. 100% behind you guys!!!! Resteemed!

Thanks to the update. This post is featured in today's Joy News (in Chinese :D)


Funds are Safe... steemit doing it the Dante way

Dante is here No fear

unnamed.png

Never Doubt, Never Worry

bitcoin needs 51% attack to stop the chain, STEEM needs 1 guy whos not even on top ;)

Good to hear this was handled so efficiently and without any real drama. I think we need a service status page so users know if there is an issue with the steemit site or the blockchain.

great job sir for fixing the problem and for maintaining our safety.. thank you sir...

I was freaking out I thoughtmy account had been hacked. My friends accounts were fine but mine couldn't login

Was wondering what happened -- Wasn't sure if it was on my end or an issue with the blockchain.

Good to know -- I just hope I didn't break my @Runburgundy project by trying to look for something that wasn't broken :S. hahahah... Ohhhh troubleshooting.

Thank you for the prompt action and the clarification.

Curious to why things were being slightly off and peculiar

People were somewhat negative about this, although I think it’s quiite impressive how quickly it was resolved, and how well witnesses cooperate. Good job guys!

Certainly this was not an ideal scenario, but there are always going to be unforeseen issues, what matters is how the software--and the community that uses it--responds to those issues. I think we can all be proud of how this problem was solved.

Is that why it was slow yesterday

Posted using Partiko iOS, join the beta testing program here

Thanks for the update!

What concerns me is the fact that a majority of the witnesses implemented the patch so fast. I can hardly believe they all properly reviewed the changes before implemneting them. My point is, what good is an unhackable, decentralized blockchain if all it takes to affect a majority of the nodes is a centrally released piece of software?

We reviewed the patch before applying it.

That's good to hear:)

So is this @nijeah related to @haejin?

Seems like spelled backwards they are one and the same.

I'm happy that everything worked out and that nothing harmful took place to the blockchain or to the accounts. I'm also happy that things are up and running now.

Thank you for all the work troubleshooting and getting us back up and running.

What I'm not happy about is the lost of productivity for five hours, and potentially up to 12 for others depending on their time zones.

To know that one failed transaction could bring the entire blockchain to a halt, rather than it being isolated and dealt with individually does not make me happy. I know I'm not a large account, so what I potentially lost in rewards and productivity individually during those five hours is probably minimal. But what about the combination of everyone? I don't know what that amounts to, but I'm going to be conservative and say it's substantial.

I know this will be looked upon as a complaint, and it is, but I'm hoping that someone somewhere can see that it's constructive. It isn't optimum, or for that matter, an adequate solution for everything to stop anytime something untoward occurs on the blockchain. I've checked into nijeah's account, and he's currently delegating most of his SP to a deadfish account who hasn't done anything with it for at least two months.

I'm not trying to be disagreeable here. I'm trying to point out that one account doing something that occurs regularly (powering down) shouldn't stop the entire blockchain. If this one is fixed, are there other potential bugs like this one still lurking out there somewhere? Is having the entire blockchain stop the only way we have to troubleshoot it?

Look at it this way, would you rather have the blockchain stop for a few hours or have someone give himself 37,898,000 SP out of the blue? Which do you think would have a more negative impact on the network and ecosystem?

hear here

Hey @pfunk. Not at all ungrateful that the transaction didn't go through. If we're talking about what I would rather, though, I would rather neither happen. Someone shouldn't be able to give themselves nearly 38 million SP and the blockchain shouldn't shut down for all of us. The filter for such transactions shouldn't fail.

I think we can agree on that. I think we also agree that stopping the blockchain is better than the transaction going through. The questions remain, though. Is there potential for more of these 'unusual transactions' and if so, will anything be done about them before someone else accidentally or maliciously brings the blockchain to a halt again?

In software development it's impossibly optimistic to think a complex piece of software will be bug-free. Developers of course try to think of all of the ways something might be exploited but all the bases are rarely covered. There are numerous checks and transaction rejections in Steem, for some things less obvious than others. Often it's the exploitation of a bug that identifies it and drives it to be fixed.

Of course ideally the blockchain wouldn't stop and the transaction would have been rejected. But much like @timcliff said, considering the circumstances, the outcome wasn't too bad.

I have worked for a company where there was a formal requirements process with full traceability in place, we had solid DTAP environments, programming was done in Ada, code and module reviewing and testing was done in an almost religious way, and we had a very competent and creative dedicated FMECA team, and also aggressive alpha and beta testing, and guess what ...

Shit happens. Having shit happen less often is very expensive, and even then there are no guarantees. Still, my trust in the quality of the blockchain codebase did take a small hit, can't help it. Well caught and solved, though.

The transaction that caused it to stop was a very unusual one. At best, it was a mistake. There was a good chance it was actually malicious.

The majority of the time, individual bad transactions are filtered out before they enter blocks. This way, they do not impact all the good ones. In this case, it got past the initial filter. Once it was in a block, it is not possible to separate it out.

I know it is not ideal, and obviously the loss of t up-time was a really big deal, but given what happened, the temporary freeze was the best possible outcome.

I appreciate the answers. Does anyone know why this very unusual transaction got past the initial filter? And is there only one filter to go through?

Every type of transaction has filters coded up to prevent invalid transactions of that type. This person found a "creative" case that was not anticipated, and therefore didn't have coding in place to stop.

that it happpened ( the freeze) and was fixed within 3-4 hours speaks volumes. well, to me at least

Thank you for the detailed information of what transpired, it was appreciated.

--

excellent post

Thanks so much for keeping us informed!

We start to see protection mecansism being implemented in DPoS Protocols...It's also happen on EOS Blockchain and the chain halt. Thks the Witness and Steemit Inc quick response. We are on running again. Graphene family product are showing resilience accross the board.

There are 2 pages
Pages