I have been a proud member of the Steemit community for almost a week now. As I have become acclimated and started to explore this ecosystem, I have wondered about the characteristics of successful posts.
In my day job I wrangle data for a tech startup called Stabilitas, where I engineer ways to pursue empirical understanding. This evening I spent some time tinkering with the Steemit API and started slinging some Python to analyze Steem posts. My efforts brought me to building word clouds.
A word cloud is a visual representation of text usage. In the visualizations I built for this project, the font size is used to depict the frequency of word appearances in the body of all trending Steem posts (words used more frequently are larger in each word cloud).
Below are a few samples I generated, a brief exploration of my methodology, and some questions for you.
Samples
Sample 1
This sample depicts the top 1,000 words, but only if their relative frequency could be represented with font sizes from 18 to 100. This sample uses pure white text and the Cabin Semibold font.
Sample 2
This sample depicts the top 1,000 words, and word frequency is represented with a minimum font size of 4 and a maximum font size that will neatly fit the size constraints of the image. This sample uses pure white text and the Cabin Sketch Bold font.
Sample 3
This sample uses the same configuration parameters as sample 2 above, but employs a grayscale color ramp (white is more frequent and gray is less frequent).
Sample 4
This sample uses the same configuration parameters as sample 2 (and 3) above, but applies a color ramp derived from the official Steem logo.
Sample 5
This sample uses the same configuration parameters and color ramp as sample 4 above, but with a white background.
Methodology
- Query the Steemit API for the top 20 trending posts
- Parse the body of each post; remove all HTML and markdown
- Convert text to lowercase and remove all punctuation
- Analyze the text; evaluate word frequency and assign values
- Create a word cloud using a mask of the Steem logo
Questions For You
- What do you think of this concept? Is it useful, or hollow gloss?
- Would these graphics be more helpful if they only used words from the titles of each post?
- Which style do you like best?
- How can I improve these visualizations?
Note of Appreciation
I really appreciate @blakemiles84 encouraging me to join this community. Thank you, Blake!
Nice work, you've clearly put a lot of work into this - You have a new follower :)
I really like the concept, I think its useful and a good fit for steem data. Sample 5 is my favorite here
I've been using a customized fork of timdreams wordcloud2js to make "steemclouds" for @steemleak donation reports using a similar method to what you described - but sizing by vote loyalty and weight instead.
@sandwich put a lot of effort into steem.cloud too :)
Wow! I had no idea about the "steemclouds" post. Thanks for bringing that project to my attention!
I like it. Nice use of the Steem logo.
Thanks! I thought masking the word cloud to the logo would be a fun visualization. It worked out even better than I had hoped.
I think this is great ! #5
Thanks for the feedback! I now have a script that can generate these "live" in about 12 seconds.