Wow nice development for Steemit :) Personally I do ML using randomforest for classification problems. Key issue is to have sufficient features yet not overfit my predictions. Luckily RF does output probability scores for each classification so it makes it easier to set different thresholds.
Depending on how much data you have in your training set, I guess I can take a look at your APIs. Not sure if I could contribute as I am also tired of those pesky spammers while doing nothing about it. Im sure your work can help create whitelists and blacklists or give out a “spam” rating for every user. Ratings should be kept below certain threshold. It could probably help mirror the reputation but focused on catching spam.
Have a good day and hope we can chat a bit more about this implementation.
Thanks. I'm not familiar with random forest, but I see it relates to nearest neighbours, which my implementation uses. My training set isn't really big enough for it to be highly robust/unbiased, but I hope to fix this soon.