Steem Sincerity - Update and Community Involvement

andybets (62)in #steemdev • 8 years ago (edited)

Thanks to everyone for the support and feedback on the Sincerity (anti-spam) API.

Community Response

The community has broadly welcomed this project, but there are also those who feel that the early integration of an imperfect account classifier was handled insensitively. The API provides estimated probability scores rather than absolute classifications, but many people still felt they were unfairly classified because initially, the SteemPlus Chrome browser extension itself used the highest probability as the classification. As a result of feedback, this has now changed to show 'Tell us' in cases of significant doubt.

This has also been a learning experience for me though. I had, perhaps naively, assumed that the community would be more united about what constitutes spam than is actually the case. In the SteemPlus crowdsourced data, there are many conflicting reports where accounts are reported as both 'Human Content Creators' and 'Spammers' for example. With this in mind, it seems inevitable that no spam classifier will please everyone, and that it will not come close to perfectly classifying community sourced training data due to differing perceptions of what spam is.

Progress

Despite this complication, I still believe this is a worthwhile project, and I am currently busy working on improving the account classification algorithm, to increase accuracy and reduce the 'false positives' whereby people are mistakenly identified as spammers. Thanks to the community providing more than 3000 feedback reports through SteemPlus (so far), after some algorithmic validation, I have expanded the training set from 90 to 480 accounts. These will typically be the more challenging accounts, that are 'wrongly' reported by SteemPlus, so they should serve as good training material.

Given the significant differences of opinion about which accounts are spamming however, I have decided to make the training data public, and ask for more direct community involvement about which accounts should be classified as spammers, so at the bottom of this post is the data I'm currently using to train the classifier.

Community Crowdsourcing

It would be great if anyone would like to contribute data to the training set!

If so, please paste your account lists into a comment below in the following form:

Human Content Creators
account1, account2, account3

Spammers
account3, account4

Bots
account5

It is fine to include accounts I'm already using.

If you disagree with any accounts in my current lists below, please comment in this form:

Incorrect Classification
account1, account2, account3, account4

Note that as with the data from SteemPlus, all crowdsourced data will be filtered though an algorithm to decide which accounts are added to the training data. So adding an account here does not guarantee its inclusion.

Discussion and ideas are also welcome in the comments of course :)

Current Training Data

This is mainly from the SteemPlus community feedback data and I have not had time to manually evaluate it, so it does not represent my opinions.

The 274 'human content creator' accounts:_{aaronleang, abunagaya, ackhoo, ackza, adsactly, alejandromata, alex-fitness, altobot, amigoponc, andrarchy, andybets, angelggomz, argalf, arorapuneet, arvindkumar, atomcollector, atukh09, austrobot, babysteps, baejaka, bafi, balte, bcuda69, bembelmaniac, berniesanders, binkyprod, brandonp, brimax, bscrypto, buzzard, cardboard, carenina, carlgnash, cattledog, catweasel, chbartist, clarkgold, clevershovel, coin.info, coinbandit2000, contentjunkie, crazybgadventure, crisdevilgamer, cryptogecko, cryptosharon, cyclamen, d00k13, dailypick, dailypro, dailytop10open, dan, dana-edwards, dbooster, deathwing, digitalis, digitokash, djdarkstorm, dnews, doctorrevelator, drakos, drmake, dswigle, eddiespino, ehiboss, elear, eleidap, elgeko, elite.skeete, eliterry12, emergehealthier, espoem, ethandsmith, evildido, exyle, fabinhocrypto, feronio, flauwy, flugschwein, folken, followforupvotes, fr4mer, free999enigma, fxsajol, gbenga, gmichelbkk, grammarnazi, gravitcaper, gray00, greencross, grimmyx, grizzle, grumpycat, guidom, haejin, hakeemshah96, happydaddyfr, hatuvera, heimindanger, hellroute, hethur240, hilladigahackles, hungryhustle, iamankit, indigoocean, infoslink, inventor16, invisusmundi, isi3, itharagaian, j-alhomestudio, jackjohanneshemp, jeffmcmullen, jehovahwitness, jlordc, joannereid, johannfrare, joseph, jpphotography, jrvacation, kadna, kaliju, kamile, kanrat, kastiuz, keter, kingscrown, kryptoe, lanhange, lenasveganliving, lexiconical, lonelywolf, lost108, louis88, lundsten, luzcypher, magicalmoonlight, majharul, marcelgoo, mark-dahl, marsella-2017, masjuan, maverickfoo, meanmommy33, mecurator, megaela, melinda010100, menerva, meno, mfederi, midlet, mohammedpolash, molometer, mountainjewel, mrbean1, mxx, myndnow, nakedverse, neopch, new-york, newenx, nonameslefttouse, obvious, old-guy-photos, oldtimer, oliverschmid, omar-hesham, omersurer, omitaylor, onealfa, oscarps, osmerj, oups, paradise, pars11, patrice, personz, pfunk, photosblog, piaristmonk, pipiczech, pipurilla, polm, pritam20, pyro0816, r-k-m, ragepeanut, rahulsaini, raorac, ravenruis, rebeccaontheroof, reesebrehio, reggaemuffin, reko, rem3600, richardcrill, rival, rivalzzz, roelandp, roxane, rudenc, rusni1122, sahda, sahra-bot, sametceylan, samueldouglas, sargoon, schrosct, scolari-ire, seablue, sebbbl, serylt, seveaux, shadowspub, shahabshah, sharehows, sheorath, sherlockholmes, simondiamond, sircork, sisygoboom, smacommunity, son-of-satire, spawnband, spongechris, ssimkins9, steem-plus, steem.chat, steem.dollar, steemplayroom, steemreports, steevc, stehaller, stellabelle, stoodkev, sukro, superoo7, taphophilia, tarazkp, taskmanager, tattoodjay, teammalaysia, teutonium, thebugiq, themonetaryfew, therealwolf, theturtleproject, timcliff, tm50, tmholdings, tolgahanuzun, toptrendingnews, transisto, travelwithus, trufflepig, tsaaditia30, tts, twoitguys, txatxy, vaansteam, vasil-danev, veerall, veganroma, verhp11, vicrivasr, vladimir-simovic, wissyofenmu, wrath-of-grapes, yann85, yanosh01, yidneth, youarehope, zekans84, zenkly, zonguin}

The 119 'spammer' accounts:_{aabisteemvoter, agx, ahlawat, aiqabrago, alexis2, all-aceh, alomgir0101, alomgirhoseain, ambriya11nov, andreacrangel, arielb12, baninduana, benswann, bilal218, bitgeek, bitius, bongje, bradfordtennyson, capari, chadgarber, chanchalroy, cinelonga, coldproject, crispycoinboys, cryptoconfiance, cryptoinside, cryptomario, cryptoriddler, cryptotenx, darryljonesjr, deboas, dogimage, dontryme2, dotwin1981, dreimaldad, dreykan, energypa, fawadsolangi, filo6322, finkployd, forexflo, francosteemvotes, gauravtak, gemce, hagoodman, happydolphin, hiranur, hodgetwins, ili0braz, introbot, irrer-ivan, jahnubis, jeabsywanvisa, jensvoigt, jiyaur, julia6, kate1, kimthewriter, kiporen212, leecamp, lefactuoscope, lefthouse, lianaakobian, liebeilio, love777, lucas3, mattl, mcitron, mianfahad2, michelle2, minecrew, muhammadroni, murattatar, murhadi9, nachon, nicecooking, nicole3, nicole5, oliverstoney, oz27, philou, pippo84, rachel4, rasel49, resteembot, riccardo47, rinis, rkaitra, rollthedice, romyjaykar, rrnayak, sacredwriter, siiiiichfried, sirgatodaniel, skeaa14, soundwavesphoton, speedvoter, steemlota, steemstem-bot, stef77, stmit, streembot, tarikhakan55, thejimmydoreshow, tonimontana, tonkatonka, underpants, unixfriend, vivekkanade, wafrica, wefund, wolf92, yasmin3, yetxuni, yoshiko, yougotresteemed, youneedverse, yundong21, zulacut}

The 87 'bot' accounts:_{adriatik, aksdwi, allaz, alphaprime, appreciator, arcange, bearwards, bluebot, bodzila, boomerang, boostbot, booster, bottymcbotface, brupvoter, buildawhale, cheetah, childfund, chronocrypto, cryptoempire, cuddlekitten, dailyupvotes, deutschbot, dlivepromoter, dolphinbot, ebargains, edensgarden, emperorofnaps, estabond, estream.studios, fishbaitbot, gaman, honestbot, isotonic, lightningbolt, lost-ninja, lovejuice, lrd, luckyvotes, megabot, minnowbooster, minnowfairy, minnowhelper, minnowsupport, minnowvotes, moneymatchgaming, msp-bidbot, nado.bot, noicebot, oceanwhale, onlyprofitbot, peace-bot, photocontests3, postdoctor, postpromoter, proffit, promobot, pushbot, pushup, pwrup, redlambo, redwhale, resteemable, rocky1, seakraken, singing.beauty, sleeplesswhale, slimwhale, smartsteem, sneaky-ninja, spydo, steembloggers, steemersbot, steemitboard, steeply, sunrawhale, thebot, therising, tomole444, treeplanter, twitterbot, upme, upmyvote, upyou, voterunner, whalebuilder, youtake, zapzap}

#steem #spam #api #steem-sincerity

8 years ago in #steemdev by andybets (62)

$89.42

Sort:

Trending

[-]

kus-knee (74) 8 years ago

I love the work that you do. I cam accross you after you commented on my article regarding producing reports for tax purposes.

I was just about to write an article about steemreports and in particular the section referred to here: http://steemreports.com/accounts-reporting-tool/

I noticed that due to no fault of your own it is not working right now. Could you please contact me via Steemit.chat when it is ready to go again?

Thanks for your great work!

$2.17

4 votes

[-]

andybets (62) 8 years ago

Hi, thanks for the support!

I will let you know when the tools are working again.

Steem Sincerity - Update and Community Involvement

Community Response

Progress

Community Crowdsourcing

Current Training Data

dances

ℂ𝕣𝕖𝕕𝕚𝕓𝕝𝕖 𝔸𝕝𝕖𝕣𝕥