AI and the future of software engineering

Social media is filled with predictions that software developers will soon be replaced by AI and that it makes no sense for young people in this age to learn coding skills. These predictions are mostly made by people who either never worked in tech, or by people with a tech background who have been promoted away far enough from the actual workfloor to kinda have lost contact with the actual software development work.

In this post I want to offer a different perspective.

As a software engineer with a background in information security, computational statistics and a field where control-feedback theory is a big part of, I understand a few things that together make me conclude that now is actualy the best time ever for young people to learn coding skills, and that coding skills of a certain time are going to be more in demand because of AI addoption.

The first thing to realize is that for AI, or most ML technology for that matter, to work, the training set must both be big enough, specific enough, and it needs to be of high enough quality, or the results are going to suck, even if the best state of the art AI technology is used and enough processing power is spent on training the AI to warm the planet an exta degree. When we look at the current generation of AI coding assistents, it becomes pretty clear in usage that at least one of these prerequisites isn't actually met.

Coding assistent AI today has some hipe around it, and it undeniably boosts productivity, but in the hands is less experienced developers that increased productivity may not actualy be a good thing from a long term business perspective. The point is that AI is fast, but it messes up sometimes any you really should keep your eyes open for that. Mileage may differ between programming languages and application areas, but I'm yet to see any project where the overall code quality didn't take a hit from the use of AI coding assistents after a number of months. A senior dev will usualy have the experience to dismiss or fix AI generated code before attaching their name (and reputation) to a commit, junior devs from my experience will gladly commit and push a piece of AI generated code they don't understand, because, hey, it passes the unit-test, so it's OK, right? In my experience, senior devs usualy backstep from using AI coding assistants after having run into a few huge cockups, realizing that eventualy the productivity gain is not worth it, and decide it isn't for them. Juniors tend to prematurely start to trust the AI in such a way that IMHO junior devs shouldn't be trusted using the AI coding assistents. The sweet spot for AI coding assistants IMHO seems to lie in use by disiplined medior developers. Skilled enough to avoid much of the bug and security risks, and not working in those crytical parts of the code to loose all productivity gains.

But that's not all. Many people believe AI will continue to get better if we just make things faster and bigger. However, the data problem is real. In order to train the AI, data is needed to learn. If the data set is too small, no advances in AI technology are going to help. If there is too much low quality data in the data set, nothing can fix that after the the AI is trained on the poluted data. If the data is too unspecific, help with more specific work will often be off. However if we were to take the trouble of vetting the data and turning it into more specific sub data-sets, the resulting sets will be small, quite often too small.

Take C++. C++ is just a single programming language, but with many sub niches each with its own coding quirks. It's an old language that has changed a lot over the years. Anyone who has worked on a 25 year old codebase and who has also worked on new projects in C++ can tell you they are completely different beasts. What is good practice in one codebase will be grounds for termination int the other one. C++ devs sometimes refer to this devide as pre-Alexandrescu vs post-Alexandrescu C++.

But for just that one language, there is more than just this one divide. C++ in game programming vs C++ in embedded programming vs C++ in backend and system development vs C++ programming in trading, all are very different beasts so much so that seniot devs usualy arent interchangeable in these fields, its like they are completely different languages. And this is just one language.

Have a look at any language on github, any language you have experience with and look at a few doze relatively new projects. A trained eye will see what projects were made by devs with decades of experience, and what project were made by a small group of motivated but relatively unschooled kids or by some carpenter or nurse who is learning how to code in their spare time.

There are statistical tools available that can help the training process to learn to reconnize some of these attributes. The tools arent perfect, but they can be tuned to reduce false positives for all of the training buckets. But again, how higher the quality of the filtering, the smaller the resulting focused training set. At some point, and that point pretty much a reality, attempts to improve the specificity of the training data will result in training sets that are simply too small to be usefull as training set for serious AI models anymore.

And finaly there is the feedback loop. The more AI is used in assisting the creation of code for real code bases, the harder it becomes to avoid training new AI models with clean data. When you trin every new generation of AI assistent tools with the output of the last generation AI assistent tools, the more we move into the field of control feedback theory.
Negative feedback is a great tool in engineering with massive applications and usefullness. I won't get into the subject too much because it seems non viable for AI tool feedback. Positive feedback is quite a different beast though. In short, unless some globally accepted marking gets accepted, both by AI tool builders and their users, the feedback loop will be one of positive feedback. Positive feedback tends to create system instability of the total system. So for those who believe that AI will continue to get better, think again. Develoment in AI technology will be strugling to even compensate for the almost unavoidable positive feedback that will be pushing new generations of AI assistants into system instability. Without these advances, dataset filtering will at least need to get tuned to attenuate the feedback. That is, the data-set filtering will need to try to filter out and exclude old AI generated code, likely with accepting high false positive numbers in order to keep the positive feedback in check.

There is a data problem now, and AI itself will continue making that very data problem worse.

So what would this mean for new or aspiring developers? I would sugest that AI will have a major impact on the field, and on the type of work and type of skills new developers should have.

For one, stop listening to the hype. Stop listening to the gurus telling you people will be programming in English without any coding skills. It's not going to happen. AI is going to increase:

Medior dev productivity
Senior dev headackes
Bugs
Amount of insecure code
The maintenance/new-code ratio for developers

So, yes, learn to program, but know you will spent much more time fixing code than previous generations of developers. Learn how to look for insecure code. Learn how to identify the more and less crytical parts of a code-base. Take code quality and code maintenance classes, and realize that AI will make work as a developer both more tedious and more productive than it is today. AI isn't going to destoy the job market for developers, it's going to grow it, but it's going to change the focus of our work as developers. Less work and more productivity on new code creation and redactoring side of things. More work on maintaining, security reviewing, debugging and maintaining the larger mixed-quality codebase. In niches AI is going to die a quick death because the clean data sets needed for it to florish is simply too small.