Details
I used about 2 million (completely 1,936,835) passwords. I made a general letter analysis on the 2 million password dataset. I found the dataset on the internet and I will share it with you. I can say that there are passwords from different languages. I checked myself. However, the letter frequency of the verb did not come out like the English letter frequency. R programming language to make the data workable and visilizaliton.
Outline
- Scope of Analysis
- Tools
- Scripts
Scope of Analysis
Charts
Letters Analysis
First, I searched for the distribution of the letters on the whole data set, and this bar graph was output. The most commonly used letter "a". Used 1027141 times singular or plural. The majority of Internet data is in English. Based on this information, we expect the frequency of letters to be close to the English letter frequency.
When we do an English letter frequency analysis, we see that the most used letter "e", but in my analysis the most commonly used letter is "a".
Numbers Analysis
When we do a number analysis on the dataset, we see what the most used digit is 1 (1272673 times). The least used figure is 4 (565775 times). We see a decline in the first four of the counting numbers.
Then it starts to fluctuate.
Uniques Analysis
Another curious thing was the distribution of unique characters. The most used unique character @ (12159 times) and it's followed by the exclamation point(10708 times).
PS: We didn't use the dot in the analysis.
Upper-Lower Characters Analysis
We expected lower characters to be more, but we did not expect such an overwhelming result. In the character-only analysis, upper letters cover only 5%.
On data screenshot:
Tools
I used the R programming language, R project and few libraries.
Libraries:
ggplot2, plotly ->use for visualize the data
stringr ->string operations
Scripts
I will be issuing short sections of code because the code is quite long. You can find the complete project and dataset at on my Github.
Bar chart plotting code.
barchart {a-----z}
barplot(harf_degerleri, xlab="Characters",ylab = "Values", main="[a-z] barchart", names.arg = chars, col="orange", ylim=c(1,1200000))
barchart {0-----9}
barplot(rakam_degerleri, xlab="Numbers",ylab = "Values", main="[0-9] barchart", names.arg = numbers, col="pink", ylim=c(1,1500000))
barchar {uniuqe}
barplot(uniq, xlab="Unique chars.",ylab = "Values", main="[unique] barchart", names.arg = uniq_names, col="grey", ylim=c(1,15000))
büyük harf toplam -> 594064 küçük harf toplam ->9346518
buyuk_harf_topam_sayı <- 594064
kucuk_harf_toplam_sayı <- 9346518
toplam_char_sayı <- buyuk_harf_topam_sayı+kucuk_harf_toplam_sayı
Pie chart plotting code.
harf_degerleri_total <- sum(harf_degerleri)
upper_chars_total <- sum(upper_chars)
pie_chart_values <-c(harf_degerleri_total,upper_chars_total)
colors <- c("#009E73","#E69F00")
pie(pie_chart_values,main="upper-lower chars.", col=colors, labels=labelss)
label1 <-paste(kucuk_harf_yuzde,"% lower chars.")
label2 <-paste(buyuk_harf_yuzde,"% upper chars.")
labelss <-c(label1,label2)
kucuk_harf_yuzde <- round((100*kucuk_harf_toplam_sayı)/toplam_char_sayı)
buyuk_harf_yuzde <- round((100*buyuk_harf_topam_sayı)/toplam_char_sayı)
Posted on Utopian.io - Rewarding Open Source Contributors
.
Hey @crokkon, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!
greetings follow me and follow me. thank you
super greetings @amirdesaingrafis
good writing I like. expand the work again, and more articles. hopefully a little science that you provide useful to the reader
do not forget to follow me @amirdesaingrafis
Congratulations @avnigenc! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
You published your First Post
You made your First Vote
You made your First Comment
You got a First Vote
Award for the number of upvotes received
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP