The Apple team found "catastrophic performance drops" by those models when they tried to parse simple mathematical problems written in essay form. In this example, the systems tasked with the question often didn't understand that the size of the kiwis have nothing to do with the number of kiwis Oliver has. Some, consequently, subtracted the five undersized kiwis from the total and answered "185."
Human schoolchildren, the researchers posited, are much better at detecting the difference between relevant information and inconsequential curveballs.