Large Language Models such as chatGPT have to date relied on free, mainly text based, date to train and deliver useful results.
Their usefulness depends on the amount of data they have access to..... the more data they can feed off, the more accurate their results, and the more useful this is to consumers.
But we have a problem... according to a recent article in nature, easily accessible (read 'free') data could run out in as little as four years time. This is mainly because the sheer pace of feeding has increased as more and more people have started relying on Large Language Models.
And to add to the problem, publishers are increasingly looking for ways to stop AI feeding on their material, by putting it behind paywalls.
Incoming postmodern regression loop....?
One potential outcome of this is that AI starts to generate its own new data, and then teaching itself from this... I love this idea, a postmodern regression loop in which the tiniest inaccuracies end up becoming amplified.
This is a possibility.... I mean it's perfectly possible that AI can generate fake knowledge that is entirely believable by MOST humans...actually this is ALREADY happening, although ATM fake news seems to require actual humans to amplify said news into significance, but I can see that process going fully auto.
The consequence of this could be truly horrible and AI tweaks and twists based on what it knows people want to hear, bending the left left and the right right and pulling everyone's views apart, in more ways than one.
I have a feeling this will be for the plebs.... just let them feed on AI generate bullshit, in the same way education babysits the majority and keeps them stupid.
And for the elite...
However while the above may just be the fate of lesser models (and amusing to boot) it's more likely that successful future AI models are going to have to pay to access quality data, or private data which isn't yet stored online, there are huge volumes of that.
These will get more accurate, more efficient, more expert at co-creating useful knowledge with the human experts who generate it.
But that ain't gonna be cheap. These useful, actual factual AI models are probably gonna cost a fortune, and only be accessible to the relatively well off, if we're lucky to top 20%, if not a much smaller percentage of the population.
It's probably just gonna be of that trend towards polarisation!
Posted Using InLeo Alpha
LLM models tend to rapidly degenerate when trained on synthetic data.
I'm looking forward to that if it ever happens! In a way, it sort of amuses and terrifies me in equal measure!
I read a few articles about tests done with synthetic data a while back. The best way to describe the results are that they are similar to what happens when humans inbreed. Rapid degeneration!
It is the ability to make decisions based on the information one has that makes something useful. LLMs so far do not have the ability to do that and that is where humans come in. More data just means more information which does not equate to increase in decision making ability or reasoning.
Yes fair point, I've found chatgpt to be great at improving grammar and tone of content I've written, not to actually write the content!
@mightpossibly has a project @ai-summaries that is generating transcripts of YouTube videos and storing that data on chain.
Synthetic data is already a reality.
I have the intuition that over reliance on AI could potentially stunt or atrophy reasoning, and writing skill. I don't consider this a strongly held opinion since I have only performed cursory diligence.
It is not unimaginable that much of humanity could become passive consumers of individualized entertainment.
I think dark web and crypto powered models will have a part to play in the future too.
It's probably already happening if people are posting articles written by bots or generating books. These things are not really creating new knowledge. Their benefits ought to be in extracting information and drawing new conclusions. I'm sure the philosophers are debating how much human creativity is actually original.
I'm glad I don't work in education where students are generating essays. I don't know how that war of generators vs detectors is going.
It's a crazy world.
It's basically a rabbit hole I'm a little scared to explore. It wouldn't surprise me if unis went back to more exam type assessment, to AI proof it.
I wonder how AI would generate its own data, like Skynet in the Terminator series.
This is already a problem I'm seeing with AI art generators (and probably other AI tools, I just haven't played with them so much).
All the time AI art generators were trained on real-world artworks and images, the images it output tended to be vaguely reasonable. But it's always had an issue with giving people the right number of fingers and animals the right number of legs (usually 4). Now, however, AI art is scraping the internet for images to steal and learn from, and more and more of those images are AI-generated themselves.
The consequence is that it no longer has any idea how many limbs, legs and fingers people and animals have, and the quality of output is degenerating rapidly.
You raise some good points. Also to consider is all the encrypted data on privacy networks, eg. in SimpleX & matrix.protocol groups, to name just a couple. Not sure is an AI agent could enter these groups to harvest data.
Decentralized Groups using encryption are more likely to contain critical data imo. as revealing critical data contrary to the narrative can be freedom and life threatening, as people like #JulianAssange & Roger Ver can testify.
#FreeRoger
#PrivacyMatters
#AnonymityMatters
#CryptigraphyMatters
The company I work for in the Regtech space is making use of AI in order to build our own LLMs, a tool to help human analysts sift through the regulatory landscape. The trouble is we are up against it because small one man band businesses are setting up shop and building similar AIs.
The race is on to garner as much data collateral as possible.
It’s certainly turning into a very challenging landscape.
This is amazingly possible, based on what I currently understand.
AI models created historically inaccurate depictions of famous people when fed faulty data and if they self check this data , they would place it in their database as correct and factual, but fake news.
This fallacy is also demonstarted by AI adding non-edible and potentially harmful ingredients to foods because they change the consistency to desirable consistency without regard for harmful effects of adding concrete or other inorganic substances to food. A self database assembling AI would add these errors to it's database to be buried insode complex instruction sets, which would probably be adopted and not sufficiently checked by lazy humans.