A Hacky Guide to Hive (part 2.2.1: blocks)

in #dev3 months ago (edited)

Context

In the previous post, I made a special transaction.
I broadcasted a custom_json transaction of the type: YO.
This information will forever be stored in block 89040473 of Hive's blockchain.
To get to this information again, I could querry a Hive node's:

If I don't know those 2 parameters, but want to find my move, I could use:

...you can access blockchain data many different ways, use the above enpoints with Beem or lighthive...

I demonstrated, how anyone can YO, now I want to show a method, to get to all YOs.
It could be any custom_json. Or a different event. It's just an example. It could be a move in a blockchain game, or you could go as far as trying to build your own little hive engine.
You might want to observe votes or comments as they come in, and store some, so you don't have to look them up again later, maybe for a notification system...


FELIXBOXVHSTAPE400YO.png


A Better Stream

In another post I explained, how the Hive blockchain is really just a very long list.

block_api

The block_api gives you access to all blocks.
You can access the block_api on all public nodes.
If you want to use your own node, having only the block_api should be one of the cheapest options.

stream()

Basically you could build most things around just looking at all blocks as they are written.
That will not include all information for everything (virtual values and such), but a lot.
This might not be the best approach to build everything, but once you've got a stable block stream going, you can build good stuff around it...

Beem

Beem's stream() method still works and you could use it as is.

The main logic behind Beem's stream is hidden in the blocks() mehod. That part alone is 278 lines long and does a lot of things.
In the background, Beem can handle:

  • node switching
  • threading
  • syncing
  • private keys

... and more.
I could not build it better. I don't have to.

Procedure

The main procedure to get to a block is still just a querry.
The speed and reliability of that querry depends mostly on the source (the node), not on the Python code.

Python isn't particularily fast to begin with.
But all we need it to do during this procedure:

  • Querry next block
  • Filter the block for YO
  • Store YO

That's a job done.

At the moment, querrying the latest block from api.hive.blog takes about 1 second.
Maximum block size is a witness parameter:

The value must not be more than 2MB (2097152).

...so there are 2 seconds left to handle 2MB at most. (current max: 65536 Bytes)
To just filter and store a block takes only miliseconds even in Python...
Which means, this thing can idle for almost 2 seconds and repeat the procedure.

Beem actually does that too 😅:

# Sleep for one block
time.sleep(self.block_interval)

Storage

It doesn't really matter how I build the stream; without storage, I'll lose all progress when the stream ends or crashes.

I'll use SQL. I could use Redis, or Mongo...

There are many different storage solutions and I could never build anything better.
This stuff handles sessions and serialization. It comes with built in backup solutions.
It's fast. It's scalable: I'll use SQLite, but you could plugin in a giant cluster of whatever.
I am trying to move the responsibility of storage handling where it belongs: the database level.

threading and node switching

Beem can switch through nodes from a list and even manage worker threads.
But why manage that inside Python in the first place?

I will just build one single procedure and can run it as a background service.
If I need another thread, I can just run another instance of the same procedure.
I could run one thread for every node, or even use separate machines.
Anyhow, the procedure does not need to know which thread it's in.
As long as I funnel the data to the same database in the end, all synchronization and serialization and whatnot is taken care of automatically.

I am trying to move the responsibility of concurrency where it belongs: the operating system- and database layer.


Live Stream

block_api.get_block_range

import requests

def get_block_range(start, count, url):
    data = '{"jsonrpc":"2.0", "method":"block_api.get_block_range","params":{"starting_block_num":'+str(start)+',"count": '+str(count)+'},"id":1}'       
    response = requests.post(url=url, data=data)
    return response.json()['result']['blocks']

The only function you really need.
I am not even joking.

  • Usage:
url = 'https://api.hive.blog'

for block in get_block_range(89040473, 1, url):
    print(block)

Loop

For a stream you only need to loop this; you need a start block and then increment.
Repeat every 3 seconds and it's basically Beem's stream(), without all the fluff.

But that's an infinite loop.
For the final service, that's what I'd want; For a code snippet, I feel like avoiding it.

In the early days, nodes accepted websockets. I don't know, why that got turned off. Maybe it was too expensive. Maybe you can still do something like that on your own node.
Anyways, if you test this on the public nodes you are stuck with this 3-second-querry loop. It seems crude, but it seems as that's how it's done.

The documentation recommends Beem's stream.

@jesta's chainsync does it:

time.sleep(self.get_approx_sleep_until_block(throttle, config, status['time']))

So yeah... I also wait 3 seconds.

Interrupt

Best case would be, I start the loop once and it runs infinitly (fire&forget).
In reality I have to prepare for what happens should it stop.
Maybe I need to resync the whole service...

The above is all it takes to rebuild Beem's stream or any other.
Wrap some try excepts around it and it can't really break down.

But for something useful, storage is necessary.
So that I at least know, where the last tream stopped. And where to begin...
For YO, I could ignore all 89040473 blocks before the first YO.

Traffic

That 3-second-querry thing may seem like a lot of traffic.
But if it's planned well, and stored well, it only has to be done once for any block.
From that point on, it can feed a whole network of other things, which don't have to make any queries outside of my own database.

Again: For things like posts and author balance, the standard apis can be enough.
Also: Posts, votes, account balance, can change, blocks can't.

Sending one request every 3 seconds, receiving 60KB max data...
I don't know, how annyoing this is for node providers.
I guess it's ok...

Syncing might be different. In the docs, there's a get_block_range example with count=1000.
The response could be 60MB. But that could also sync 50 minutes in a single call...

Filter YO

def get_yos(block):
    yos = []
    for transaction in block['transactions']:
        for operation in transaction['operations']:            
            if operation['type'] == 'custom_json_operation':
                if operation['value']['id'] == 'YO':
                    yos.append(operation)
    return yos

Returns all YOs in a block, but loses the information, which transaction and which block each YO was in.

I'll try to avoid data manipulation in this part of the service.
This part is the stream and shouldn't be involved in anything else.
However, I do want to store the block num, which already got lost along the way.
I also want block id and previous. This just demonstrates how to filter data.
It's best to start by building the tables first, though.

Conclusion

It might not look like much, but the part that needs to connect to a Hive node is done.
This is the absolute minimum necessary and can only fail at very few points so far.
Most possible problems can be caught outside of this core logic.
All that's missing is persistent storage, which I will conclude next post.

Anyways, threading, concurrency, data manipulation, whatever... everything else can and should happen later, upstream.
What I keep trying to point out: All extra logic should be avoided.
I am looking at a Hive querry as a single step - a procedure. It should be a single function.
Next post, storage will be wrapped in as few procedures as possible and that will conclude in a YO crawler/watcher that feeds a db, that you could plug anything into. It will probably be short and include only minimal logic. That's a feature.

Naming

I think, the hardest question in programming is naming.
'YO crawler' isn't good. I should give this thing a name, before it's finished.
custom_jacksn, or custom_YOson maybe? Or YOmind...

Sort:  

custom_YOson

I vote for this one 🤣

Btw, these posts are very interesting - also because they help me realize how my scripts are even worse than what I thought !LOL - and, at the same, quite hard to understand for me.

There's plenty of informations and I have to find some free time to start doing some tests and exercises, as I've found that this is what helps me the most in understanding the most difficult stuff.

My review for the world’s strongest tape
It’s not tearable.

Credit: reddit
@felixxx, I sent you an $LOLZ on behalf of arc7icwolf

(1/10)
Delegate Hive Tokens to Farm $LOLZ and earn 110% Rewards. Learn more.

hard to understand

I think it may be a bit hard to understand, because it doesn't do anything, yet.
I need 1 more post to finish custom_YOson.
Then one more post to show what it can be used for.

I hope it all makes more sense then.
It's very few lines of code...

There are also a lot of concepts that I'm not familiar with, but by reading about them at least I'm starting to get a wider idea of how they are all interconnected.

Key takeaway, the bottomline of this post:

To connect to Hive you only have a few methods.
No matter how much code you throw at the problem, won't really improve that.

If you want to connect to a public node: use http.
If you need to write data: use a db.
If you want threading and pooling and things: use the os and db.

Whether you fully understand how these things work or you just heard of them, there's no way, you can improve them. And you shouldn't even try. Or find a different guide. 😂

The summary I didn't deserve but that I needed 🤣🤣🤣

Since you are the only reader anyways 😅:

Let me cook for1-2 more posts, then I can build whatever.
What would you like to see or build yourself?
A vote bot? A discord bot? ticket payment system? YO game?

Whatever you like! Everything for me is new, so whatever you build, I'm in :)

Is @holger80 still around ? I think he left, right ? Is anyone still maintaining the Beem library ?

afaik @holger80 is still doing stuff, but not around Hive.
I finished the next part of this guide today.
What I am trying to demonstrate is that in many cases don't need a library and are better off without one.