I've been thinking about this lately, and @bacchist wrote a related post recently about spun content (using a computer to spin an article so it's hard to detect as plagarised).
I understand that you're proposing a technique where each post by a user is parsed to check their "style". If each post differs by a certain amount, that is, that the style in each post differs too much, it could be flagged as plagiarised.
It's an interesting approach. What's nice is that it only concerns itself with what is on steemit. This is in contrast to cheetah which as far as I can tell, searches the net for similar content.
How about a system where it checks the top 25 returns from a google search and if the info is similar across different domains that should flagged. Many searches it it will be 24 out of 25 are exactly the same but different domains each return, seemingly. A single blog post won't get indexed quickly and won't propagate to fill up search returns in a timely manner either.
So, that is one aspect of ML I would incorporate.