Aicu the Curation Bot ~ Top Picks 18/05 - 02/06

in #curation6 years ago

And again a fortnightly summary of the picks made by my curation bot aicu. My bot Aicu only upvotes content similar to the excellent content which is selected by the large curation networks like curie, helpiecake and c-cubed.

I haven't had time to write about the actual decision factors which the bot uses. But as I'll rework some parts of the bot soon, I'll write a post then. The bot uses a lot of sophisticated machine learning to classify each post, but it still makes mistakes. The top 10 and top 15 below are ranked directly using the internal decision machinery of aicu, no further processing on my part.

The General Overview

In those two weeks the bot voted on 277 posts from which 42.5% were selected by larger curation networks afterwards. You can find a more detailed breakdown below.

The Curation Networks Selection

There's not much left to say, the following ten posts were selected by aicu and later on by large curation networks and are all excellent posts.

#AuthorPost
1.@watersnake101From Where the River Flows "Ambon Ambon Falls" - Beautiful Sunday
2.@leeartBaguio Night Market - Market Friday
3.@watersnake101Ending a Fantastic Adventure - Beautiful Sunday
4.@ctdotsI happen to live in the borough where “Chernobyl” miniseries was shot
5.@timezonejunkiesLivestock Market, Street Food and One of The World's Most Remote Border Crossing // Kashgar // China!
6.@anaerwuWhen Asgard was still young about whether the first Thor is really so bad?
7.@blueeyes8960Stop the Delphians (A 1,000 word sci-fi story)
8.@neumannsalvaBeware of the A-Line skirt: making the Delphine skirt from Tilly Walnes
9.@r00sj3Walkabout Straya #7: On my way to Byron Bay
10.@foxyspiritTattoo's Tattoo's Tattoo's!!!! And a contest did you say?

Aicu Gems ~The Unselected

Now let's look at some authors which aicu picked, but were not selected by large curation networks. An excellent selection of articles which you definitely should check out.
#AuthorPost
1.@futurefoodLOT.FF.Y1.M10 (Shift in Perception)
2.@amirtheawesome1World War II Prequel - The rise of Hitler
3.@zoidsoftCan Copyright Law Defeat the Technocratic Surveillance State? Drive a Hard Bargain because your Personal Data is your Prosperity
4.@lanzjosegFiction Vs Reality a summary and comparison with Venezuela
5.@blueeyes8960Ill Fated Picnic - A 3 Prompt Freewrite
6.@damm-steemitA hike along the "Forbidden Paths". Uzbekistan
7.@greddyforceDiscovering Virtsu Area And walking On A Peninsula
8.@anaerwuBooks about the American West.
9.@penderisDon't vote ERROR ~PennedBullshit
10.@puravidavilleAyote Bread, Costa Rica Style: Fruits and Veggies Monday
11.@cosimoThe End of Game of Thrones's Ending
12.@elindosAbout crypto-based games and how to avoid Ponzi, and make them last
13.@livecamFrom entrepreneur to bankrupt in Thailand (Real Life Storys in Thailand #1)
14.@simplymikeUlog #?: Gardening Adventures... Some Positive News, At Last
15.@deerjayWednesday Walk ~ The Panther Trail Area

Improving Aicu


I've been looking at a couple of ways to improve the ability of aicu to detect high-quality content. One way would be by computing the Dirichlet prior on the textual data. This is known as Latent Dirichlet Allocation and a Bayesian approach to topic modelling. Another approach would be the use of a more complex neural network for learning a doc2vec representation. Both approaches should lead to improvements. Right now, I favour the LDA approach.

Another feature which has nothing to do with the content selection is the Natural Language Generation aspect of aicu. I've been working on that aspect to allow the bot to comment on the selected content and telling the user why his post was selected. On top of that, the bot could write the summaries himself. This part consists of two modules.

Text Summary

The bot needs to be able to summarise the post of the user to its key points. Otherwise, it won't be able to create meaningful comments aside from: "Great post" or "Great post about "insert main category"". This part is doable, as there are very cool datasets for generalised document summary. For example, the gigaword dataset. News articles. And each news article has a summary on top. That way and though some generative models, one should be able to train a concise model for that.
Natural Language Generation

My biggest problem is the NLG module. I don't want to use a bullshit generator for generating this. RNN's or HMM's are no option for the core system therefor. My current approach is a linguistically deep NLG module. It uses a logical representation of facts to realise sentences from them. I've covered the rough syntactic structure of the module using a Probabilistic Context-free Grammar model trained on the Penn Treebank. My current issue is that I need to ensure grammatical congruence on the word level. You can't ignore things like number, gender and case when generating the sentences. I might be able to learn that relation and which parts a the syntax tree govern over them. Or if I'm lucky I discover a database for an LFG(Lexical Functional Grammar) somewhere. Anyway, lot's of work to be done.

Aside from those two points, I need to do those updates weekly.

Sort:  

Good one @watersnake101. Idol talaga 😁😁

Congratulations @aicu! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You received more than 10 upvotes. Your next target is to reach 50 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

@aicu, thank you for supporting @steemitboard as a witness.

Here is a small present to show our gratitude
Click on the badge to view your Board of Honor.

Once again, thanks for your support!

Do not miss the last post from @steemitboard:

Use your witness votes and get the Community Badge

Congratulations @aicu! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You distributed more than 6000 upvotes.
Your next target is to reach 7000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Introducing NFT for Peace
Support the HiveBuzz project. Vote for our proposal!