In a bid to address the memory and computational bottlenecks that exist when trying to run machine-learning models on tiny edge devices, researchers at MIT and the MIT-IBM Watson AI Lab have developed a new technique that enables on-device training using less than a quarter of a megabyte of memory. The research will be presented at the Conference on Neural Information Processing Systems.
The research work was led by Song Han, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and an MIT-IBM Watson AI Lab member. It was joined by EECS Ph.D. students Ji Lin and Ligeng Zhu, as well as MIT postdocs Wei-Ming Chen and Wei-Chen Wang, and Chuang Gan, a principal research staff member at the MIT-IBM Watson AI Lab.
Two algorithmic approaches were used by Han and his colleagues to improve the training process’ efficiency and reduce its memory requirements. The first method, called sparse update, employs an algorithm to determine which weights should be updated throughout each training phase.
The program begins freezing the weights one at a time and continues doing so until the accuracy falls below a predetermined level. While the activations corresponding to the frozen weights do not require memory storage, the remaining weights are modified.
“Updating the whole model is very expensive because there are a lot of activations, so people tend to update only the last layer, but as you can imagine, this hurts the accuracy. For our method, we selectively update those important weights and make sure the accuracy is fully preserved,” Han says.
The weights, which are normally 32 bits, are simplified as part of their second method, which uses quantized training. A quantization approach reduces the amount of memory required for both training and inference by rounding the weights to only eight bits.
Applying a model to a dataset and producing a prediction is the process of inference. The approach then employs a method known as quantization-aware scaling (QAS), which functions as a multiplier to modify the weight-to-gradient ratio to prevent any accuracy loss that can result from quantized training.
The researchers created a device they called a little training engine that allows a basic microcontroller without an operating system to run these algorithmic improvements. This system flips the training process’s order so that more work is done in the compilation phase before the model is used on the edge device.
Eventually, Song Han maintained that research work enables IoT devices to continuously update the AI models with freshly acquired data in addition to making inference.
“Our study enables IoT devices to not only perform inference but also continuously update the AI models to newly collected data, paving the way for lifelong on-device learning. The low resource utilization makes deep learning more accessible and can have a broader reach, especially for low-power edge devices.”