78,000 Tags in 5 days: Analyzing Post Tag Trends

It's pretty clear that the HIVE Blockchain is used quite commonly for posts, and most of those posts have at least one tag/topic included with it. Sometimes I struggle figuring out what tags to use when creating a post, and wanted to find some information to help me decide.

The code for the Blockchain Tug of War and Blockchain Vampires games was already filtering out post data from the Blockchain, and it was pretty simple to add some lines to begin tracking tags. I accomplished the code side by simply filtering out each tag for each post and adding it in CSV style to a log file. This ensured that the blockstream code itself could still run very quickly and prevent desync with the HIVE Blockchain.

If you don't know, one block is created every 3 seconds on HIVE, and to get the data from the Blockchain you stream the blocks, which means your application gathers the block data every time a new one is generated. Because the Blockchain is timing based, your application also needs to match the same intervals, and it cannot be running slowly or else it will become backed up or miss blocks.

For tag analysis, for now I was simply looking for the frequency of tag occurrences within the 5 day window. To accomplish this I simply had to take the log file and import the CSV into a spreadsheet. From the spreadsheet I used =unique() to identify each specific tag, and then used =countif() to identify the number of occurrences of each tag. Finally I used =sort() to organize the tags by number of occurrences (highest first). I ended up with 78,000 different tag entries in the 5 day window!

An additional note about the tag data is that it includes tags for communities, so many of the tags are related to communities or front-ends and depending on the posting method may have been added to the post automatically. Because I want to gather data for what to include on posts, I found the community related tags to still be useful.

I was a little bit surprised by the tag trending data, and it also made me realize I've been using tags that are not very popular! Below is an image of the top 99 tags by number of occurrences within the 5 day window. It is also important to note that there were almost 7,000 individual tags, a lot of which were only used one time within the 5 days!

HIVE Posts Tag Trending Data.jpg

One note, if you are ever filtering tags with your own application, is that currently HIVE allows newline characters in tags, so be sure to account for that when parsing the tag data!

My final test is to see the difference tags make, and so for this post I'm going to try to include popular tags that loosely relate as opposed to unpopular tags that specifically relate. A final note about this is that the first tag is special and is used to organize the post in different communities/locations. In this case, because I manually entered proofofbrain as my first tag, the post is going to be placed in proofofbrain, even though I did not select it from the PeakD interface.

I am going to continue to analyze tags and topics from posts on HIVE and will keep you updated with interesting findings. If there are specific tags you would like to track please let me know and I will add them to the Blockchain Tug of War game!
-Ace

Sort:  

Sucks that our most popular tags are from people just trying to farm a tiny bit of extra crypto. I look forward to when every topic imaginable has a tag full of juicy content.

I agree, it is difficult right now because there is a lot of farming, and the future will bring better organization and content. Thankfully the back-end structure itself is set up very well for the brighter future!

Though there are crypto farmers trying to jump on every popular tags, like the #neoxian tag, I really need some explanation on it.

Regardless, this project is an interesting one and shows the limitless capabilities of the Blockchain. And honestly, reporting on tags these way is good for SEO, though it may impede creativity to some degree as people will naturally tend to go with the crowd.