Genetic Programming has gone Backwards

When Genetic Programming (GP) first arose in the late 80s and early 90s, there was one very defining characteristic of its application, which was so widely accepted as to be left unsaid:

GP always starts from scratch

Sure, you might implement low-level functions in the function set, and even abstract some more complex functions in certain cases, but GP almost always (with some notable exceptions) started with a random population generated by a random number generator.

Later on, people tried to encourage some reuse in GP through the automatic discovery of reusable subtrees, but again the initial population was generated in a way that was not directly related to the problem at hand.

Perhaps from an AI point of view, this didn’t seem unusual. It’s a similar approach to the use of a neural net, which starts with a random set of weighted connections, and is then trained for a particular task. In fact, to have incorporated existing solutions or source code could have been perceived as “cheating” at the time! Surely any AI system should not need any hint from a human.

However, when you consider this situation from an automatic programming point of view, it quickly becomes apparently that Genetic Programming was trying to run before it could walk.

Rather than trying to create programs from scratch, wouldn’t it make more sense to try to reuse, tweak or augment existing code? Then, once we’d worked out the finer details of this less ambitious approach we could go on to generate larger and larger code fragments, and — hey! — maybe an entire program or algorithm.

Recently, some GP practitioners have pivoted towards this more incremental approach, which has become one of the most successful lines of attack in GP research, in the form of a subfield now referred to as Genetic Improvement. Led by researchers from the University of Virginia, CMU, and UCL, this field has changed the rules of the game: GP is now used to complement the efforts of human programmers, rather than completely replace them.

Such an approach now appears blindingly obvious! Why first try to generate code that has already been written by humans? Why not see whether GP can do better than humans, if given handwritten code as a starting point? Why not take advantage of the vast database of code that the internet has provided to us? It’s as if we have suddenly realised that it is much easier to evolve something in the context of an existing ecosystem, rather than starting entirely from scratch.

Excitingly, Genetic Improvement has additional benefits: the resulting programs are usually much more relevant to industry, and can excite and inspire researchers in other fields. Rather than battle with AI techniques to see who can solve an overly simplistic and artificial toy problem more rapidly, GI offers the opportunity to fix problems in open-source projects; to produce faster, more efficient software than can be achieved by manual means alone; and to explore power-efficiency and other neglected properties of software in a way that human programmers often overlook.

I recently spoke to one of the leaders of GI research, and I asked him if he felt any rivalry with others racing to push the boundaries of what GI can achieve. His reply?

“Not at all! There is simply so much work to be done. There is just so much to do!”

Time to roll up your sleeves and get involved! If we work hard enough on GI, perhaps one day we’ll get back to where we started — the dream of true automatic programming.


All opinions are my own!
Thanks to John Clark, John Woodward, and William B. Langdon for proof-reading a draft of this post.