And this is where the new o1 model has started to split away from human thinking and start bringing in that insanely effective AlphaGo approach of pure trial and error in pursuit of the right result.
o1's baby steps into reinforcement learning
In many ways, o1 is pretty much the same as its predecessors – except that OpenAi has built in some 'thinking time' before it starts to answer a prompt. During this thinking time, o1 generates a 'chain of thought' in which it considers and reasons its way through a problem.