Visualization - Word Clouds from Steem Trending Posts

in #steemit8 years ago

I have been a proud member of the Steemit community for almost a week now. As I have become acclimated and started to explore this ecosystem, I have wondered about the characteristics of successful posts.

In my day job I wrangle data for a tech startup called Stabilitas, where I engineer ways to pursue empirical understanding. This evening I spent some time tinkering with the Steemit API and started slinging some Python to analyze Steem posts. My efforts brought me to building word clouds.

A word cloud is a visual representation of text usage. In the visualizations I built for this project, the font size is used to depict the frequency of word appearances in the body of all trending Steem posts (words used more frequently are larger in each word cloud).

Below are a few samples I generated, a brief exploration of my methodology, and some questions for you.

Samples

Sample 1

Steem Word Cloud - Sample 001
This sample depicts the top 1,000 words, but only if their relative frequency could be represented with font sizes from 18 to 100. This sample uses pure white text and the Cabin Semibold font.

Sample 2

Steem Word Cloud - Sample 002
This sample depicts the top 1,000 words, and word frequency is represented with a minimum font size of 4 and a maximum font size that will neatly fit the size constraints of the image. This sample uses pure white text and the Cabin Sketch Bold font.

Sample 3

Steem Word Cloud - Sample 003
This sample uses the same configuration parameters as sample 2 above, but employs a grayscale color ramp (white is more frequent and gray is less frequent).

Sample 4

Steem Word Cloud - Sample 004
This sample uses the same configuration parameters as sample 2 (and 3) above, but applies a color ramp derived from the official Steem logo.

Sample 5

Steem Word Cloud - Sample 005
This sample uses the same configuration parameters and color ramp as sample 4 above, but with a white background.

Methodology

  • Query the Steemit API for the top 20 trending posts
  • Parse the body of each post; remove all HTML and markdown
  • Convert text to lowercase and remove all punctuation
  • Analyze the text; evaluate word frequency and assign values
  • Create a word cloud using a mask of the Steem logo

Questions For You

  • What do you think of this concept? Is it useful, or hollow gloss?
  • Would these graphics be more helpful if they only used words from the titles of each post?
  • Which style do you like best?
  • How can I improve these visualizations?

Note of Appreciation

I really appreciate @blakemiles84 encouraging me to join this community. Thank you, Blake!

Sort: