The most effective way to copy-paste, and why it needs urgent attention.
The wonderful @bacchist recently made a great post about spotting spun content, highlighting how unscrupulous Steemians can use something like WORDAI to spin other peoples articles in an attempt to avoid plagiarism detection.
I wanted to add to this by showing an example of a rather cynical spinning technique emerging on Steemit that allows the plagiarist to keep the original wording, but trick @cheetah bot and sometimes even @steemcleaners, who check for plagiarism manually.
How do they do it? By replacing letters with other letters and special characters that look almost identical to the untrained eye. For example:
Whаt if l toId you somе of thеsе words arеn't actuaIIy words?
Whаt would be your аѕѕеѕѕmеnt of this article at a gIance? Cаn you sее it? This wholе pаragraph is pаcked with bastardizеd words thаt аlmost look pеrfect. Did you spot it in thе hеader of this paragraph too?
I is replaced by l
e is replaced by е
a is replaced by а
.. and there are various other less obvious character replacements that spammers are using, sometimes in combination with word spins.
The result?
@cheetah passes by without batting an eyelid, and someone manually checking for plagiarism might be fooled. Google can give limited results when strings of text are searched for, and unless you paste the text into a good spell checker, they may not even be highlighted. What does this mean for @steemcleaners and other human spam fighters? Quite simply, more work.
Honestly, this type of spinning shows such insidious, fraudulent intent, that I have very little sympathy for those accounts whos reputation is damaged upon its discovery.
Photos via pixabay.com
Possible Solutions.
I'm not a developer, but the first thing that comes to mind is a bot like @cheetah that highlights conspicuous posts that contain a lot of spelling errors or special characters. Alternatively (perhaps) a different font could be used - one that has far more distinctive special characters, so it would be easy to spot the difference between an e and an е, etc. Any other suggestions would be most welcome. I do think this is urgent because I'm seeing more and more of it popping up in Steemit Abuse Classic. Please comment below if you have any suggestions on tackling this problem, via changes to Steemit, a bot, or something else.
Finally, the ultimate weapon against spam and plagiarism..
It's you, the community. I know, that's super cheesy but it's true. There is a type of content that spammers and plagiarists will never be able to churn out: personal, distinctive and entertaining content people like @fyrstikken @reneenouveau, @sirwinchester, @queenmountain, @heiditravels and even @dollarvigilante. Notice how many selfies these folks include in their posts? This is the "personal touch" turned up to 11. Whatever your opinions are of these various personalities, they're crucial to the credibility of Steemit. Your vote might be better spent on one of them, than on that "Top 5 health benefits of garlic" post.
Thanks for those tips. I think many of us can sense when a piece has been spun but these tips aid in "tipping us off" if you don't mind the pun! Oh, and I just made a rhyme. Weird.
LOL thanks for the comment :)
Upvoted@! Thank you for bringing this up!
Thanks for the Up
Very true. In mainstream SEO, a misspelled word can result misinterpretation and bad indexing by the search engines. Here, we're not so concerned with attracting outside traffic so much. The goal is attracting internal eyeballs, so trick like what you mentioned can work to a certain extent. Great points.
Thanks for your comment. Are you up to speed on SEO? Would be really interested to hear any ideas for a solution to this problem?
I'm not sure what Steemit uses to check duplicate content. I believe Copyscape has the means to add misspelled words and replace with the usual spelling to help detect trickery. Someone would have to be updating a data file whenever instances were spotted and reported.
Whenever you're publishing web content, and you want to rank for a word like "list", you would have it in your content. Misspelling it as "1ist" would be counterproductive as the google bot won't index you for "list". But that's in the real world I guess.
It's sort of amazing the lengths people will go to sneak other people's content under peoples noses. I didn't even know to look for odd characters like that.
Sad as always, now I think that this could be the reason why several times when I thought that the article was suspicious, I actually couldn't find anything in google and plagiarism tools.
This means more work, yes. Unfortunately, can't propose any suggestion how to deal with it, just what you've said already - to make a bot like Cheetah.
Perhaps a bounty could be organised for something like that. Hmmm. Let's see what people think.
"Honestly, this type of spinning shows such insidious, fraudulent intent, that I have very little sympathy for those accounts whose reputation is damaged upon its discovery."
I have NO sympathy for such thieves.
1 learned someth1ng new to watch out f0r, th@nk y0u!
I have noticed your post was flagged by R4fken, well known steemit hater.
I am for justice and I am here with all my Steem Power to help you resist his hate downvotes and let your post be visible for steem users. Upvoted!