Unfortunately, the SteemPlus feedback is proving problematic for various reasons. For example many people see a foreign language post, and report it as spam, or see bots and report them as spam. Basically many people are just using the classification to apart their own values, rather than actually considering whether the content is bad for the platform. So one reason for the delay in improvement is because any classifier however well designed is only as good as the data it is trained upon.
I appreciate your honesty anyway, and am working hard on this difficult problem.
Yeah, that seems to be a pitfall for "human-trained-AIs" ever so often... and then they turn into trolls themselves :(
Maybe you could ask for some "targeted" training evaluation of specific accounts via discord or so?! I guess some sort of human vetting of the resulting training set (and resulting AI behavior) cannot be foregone... Then again... maybe the "flat" account data just doesn't contain enough pointers for a reliable AI decision?!?
If I may make a suggestion, for the top-500, wouldn't it make sense to preselect a hierarchy based on comment/post-count, possibly filtered by incoming vote-diversity (to push voting-farm-spam to the top) and THEN classifying the spammer-probablity with the AI?!
just an idea... right now the leading criteria for the top 500 is the AI probability followed by comment-count... and that's awkwardly not including those real spam-heroes
P.S.: if that's not the case yet, maybe adding some of the really severe cases like a-0-0 with 25.000 comments to the training data as a top match for 1.0 spammer can help?!