It could be similar to the penguin thing on Google, Although I don't know the technical details.
Although how would it work for people like Stellabelle who have "guest posts" and other people who have similar content openly posted on their blog?
Maybe we could somehow do more of a plagiarism search?! Maybe Cheetah bot is too generous? What is the percentage on that?That is something we have to think about, But I completely agree. What @msgivings did was really messed up.
I plugged Msgivings most recent article into my Grammarly. And turned the plagiarism button on. It came up 23 % unoriginal. And the Reddit stuff that was stolen from was cited on it. Now I'm ashamed I didn't figure it out! Although copying and pasting every article I read everyday would be extremely time consuming.
A decision to 'nuke' someone should never be algorithm based. Despite a fancy name 'machine learning', these algorithms are very primitive.
These tools can be used as an alerting system. For instance, if the bot finds a suspicious post it informs #steemitabuse chat. Humans can then validate the claim. If the author in question is a host of guest posts, then, he/she may be put on a 'whitelist'.
What about news articles? There is no point in posting just a link. Obviously it's not good to copypaste entire news but only few samples that the poster feels to be most interesting. Always clearly state the source and only few selected quotes from the news.
The bot can ignore quoted parts of the post.
source
There is nothing to stop you from summarizing the highlights of the article and then linking to it. That way, you are posting original material without copying. The exception would be if you wanted to pick up on someone quoted in the article and write some commentary. You'd need the original quote to make sense of the commentary