Scientists from MIT, Google Research, and Stanford University, led by Ekin Akyürek, have found that massive neural network models that are similar to large language models are capable of containing smaller linear models inside their hidden layers, which the large models could train to complete a new task using simple learning algorithms.
The study demonstrates how powerful language models, like GPT-3, are capable of learning a new task from a small number of instances without the need for fresh training data. Large language models like OpenAI’s GPT-3 are neural networks that are capable of composing poetry and computer code in a manner that resembles human writing.
These machine-learning models take a small piece of input text and then predict the text that will probably come next. They were trained using enormous amounts of internet data.
Generally, a machine-learning model like GPT-3 would need to be retrained with new data for this new task. The model’s parameters are updated as it processes fresh data to learn the task throughout this training phase. However, with in-context learning, the model’s parameters aren’t changed, giving the impression that the model has just picked up a new duty.
Solving Machine-learning Mystery
Theoretical findings by the researchers demonstrate that these enormous neural network models are capable of concealing smaller, more straightforward linear models. Using just the data already present in the larger model, the larger model might then use a straightforward learning technique to train this smaller, linear model to accomplish a new task.
According to the primary author, Ekin Akyürek, this research opens the door to further investigation into the learning algorithms these huge models can employ and represents an essential step toward understanding the mechanics underlying in-context learning. Researchers could make it possible for models to fulfill new jobs without the need for expensive retraining by having a better knowledge of in-context learning.
Developing A New Model: “Transformer”
Akyürek put forth the theory that in-context learners are truly learning to carry out new tasks rather than just matching previously observed patterns. He and others have conducted experiments utilizing fake data that they could not have seen anywhere else to inspire these models, and they discovered that the models could still learn from a limited number of examples.
Akyürek and his partners hypothesized that these neural network models would contain more compact machine-learning models that they might train to carry out a new task.
The researchers employed a transformer neural network model, which has the same architecture as GPT-3 but was trained expressly for in-context learning, to verify this notion. They theoretically demonstrated that this transformer can write a linear model within its hidden states by examining its architectural design. Multiple layers of connected nodes that process data make up a neural network. The layers between the input and output layers are the hidden states.
Their mathematical analyses demonstrate that this linear model is encoded somewhere in the transformer’s earliest levels. The transformer can then apply straightforward learning techniques to update the linear model. The model essentially trains and mimics a smaller version of itself.
Akyürek’s Future Plans
Akyürek intends to carry on investigating in-context learning with functions that are more intricate than the linear models they looked at in this work in the future. Large language models could potentially be the subject of these experiments to determine whether basic learning algorithms can also adequately represent their actions. He also wants to understand more about the kinds of pre-training data that can support in-context learning.
“With this work, people can now visualize how these models can learn from examples. So, my hope is that it changes some people’s views about in-context learning,” Akyürek says. “These models are not as dumb as people think. They don’t just memorize these tasks. They can learn new tasks, and we have shown how that can be done.”
The research work will be presented at the International Conference on Learning Representations. Assisting Ekin Akyürek in the work are Dale Schuurmans, a research scientist at Google Brain and professor of computing science at the University of Alberta; as well as senior authors Jacob Andreas, the X Consortium Assistant Professor in the MIT Department of Electrical Engineering and Computer Science; Tengyu Ma, an assistant professor of computer science and statistics at Stanford; and Danny Zhou, principal scientist, and research director at Google Brain.