2 Million Password Analysis and Visualization

in #utopian-io7 years ago (edited)

Details

I used about 2 million (completely 1,936,835) passwords. I made a general letter analysis on the 2 million password dataset. I found the dataset on the internet and I will share it with you. I can say that there are passwords from different languages. I checked myself. However, the letter frequency of the verb did not come out like the English letter frequency. R programming language to make the data workable and visilizaliton.

Outline

  1. Scope of Analysis
  2. Tools
  3. Scripts

Scope of Analysis

Charts
Letters Analysis

First, I searched for the distribution of the letters on the whole data set, and this bar graph was output. The most commonly used letter "a". Used 1027141 times singular or plural. The majority of Internet data is in English. Based on this information, we expect the frequency of letters to be close to the English letter frequency.
When we do an English letter frequency analysis, we see that the most used letter "e", but in my analysis the most commonly used letter is "a".

600px-English_letter_frequency_%28alphabetic%29.svg.jpg

a-z barchart.jpeg

Numbers Analysis

When we do a number analysis on the dataset, we see what the most used digit is 1 (1272673 times). The least used figure is 4 (565775 times). We see a decline in the first four of the counting numbers.
Then it starts to fluctuate.

0-9 barchart.jpeg

Uniques Analysis

Another curious thing was the distribution of unique characters. The most used unique character @ (12159 times) and it's followed by the exclamation point(10708 times).
PS: We didn't use the dot in the analysis.

unique.jpeg

Upper-Lower Characters Analysis

We expected lower characters to be more, but we did not expect such an overwhelming result. In the character-only analysis, upper letters cover only 5%.

1.jpeg

On data screenshot:
image.png

Tools

I used the R programming language, R project and few libraries.
Libraries:
ggplot2, plotly ->use for visualize the data
stringr ->string operations

Scripts

I will be issuing short sections of code because the code is quite long. You can find the complete project and dataset at on my Github.

Bar chart plotting code.

barchart {a-----z}
barplot(harf_degerleri, xlab="Characters",ylab = "Values", main="[a-z] barchart", names.arg = chars, col="orange", ylim=c(1,1200000))
barchart {0-----9}
barplot(rakam_degerleri, xlab="Numbers",ylab = "Values", main="[0-9] barchart", names.arg = numbers, col="pink", ylim=c(1,1500000))
barchar {uniuqe}
barplot(uniq, xlab="Unique chars.",ylab = "Values", main="[unique] barchart", names.arg = uniq_names, col="grey", ylim=c(1,15000))
büyük harf toplam -> 594064 küçük harf toplam ->9346518
buyuk_harf_topam_sayı <- 594064
kucuk_harf_toplam_sayı <- 9346518
toplam_char_sayı <- buyuk_harf_topam_sayı+kucuk_harf_toplam_sayı

Pie chart plotting code.

harf_degerleri_total <- sum(harf_degerleri)
upper_chars_total <- sum(upper_chars)
pie_chart_values <-c(harf_degerleri_total,upper_chars_total)
colors <- c("#009E73","#E69F00")
pie(pie_chart_values,main="upper-lower chars.", col=colors, labels=labelss)
label1 <-paste(kucuk_harf_yuzde,"% lower chars.")
label2 <-paste(buyuk_harf_yuzde,"% upper chars.")
labelss <-c(label1,label2)
kucuk_harf_yuzde <- round((100*kucuk_harf_toplam_sayı)/toplam_char_sayı)
buyuk_harf_yuzde <- round((100*buyuk_harf_topam_sayı)/toplam_char_sayı)



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

.

Hey @crokkon, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!

greetings follow me and follow me. thank you
super greetings @amirdesaingrafis

good writing I like. expand the work again, and more articles. hopefully a little science that you provide useful to the reader
do not forget to follow me @amirdesaingrafis

Congratulations @avnigenc! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You published your First Post
You made your First Vote
You made your First Comment
You got a First Vote
Award for the number of upvotes received

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!