Researchers at the University of California, Santa Cruz have developed a method that allows them to run large artificial intelligence language models (LLMs) with billions of parameters while consuming significantly less energy than current systems.

Image source: Stefan Steinbauer/Unsplash

The key to their success was eliminating matrix multiplication (MatMul) from the learning process. The researchers used two methods. The first was to convert the number system to a ternary system using the values ​​-1, 0, and 1, which allowed the multiplication to be replaced by simple summation of numbers. The second method was based on the introduction of temporary calculations, which gave the network an effective “memory” that allowed it to work faster but with fewer operations. The work was carried out on a specialized system with FPGA, but the researchers emphasize that most of their efficiency improvements can be applied using open source software and tweaking existing systems.

The research was inspired by Microsoft’s work on using ternaries in neural networks, and the researchers used Meta✴’s LLaMa as a reference large model. Rui-Jie Zhu, one of the PhD students who worked on the project, explained the idea behind the achievement of replacing expensive operations with cheaper ones. While it’s unclear whether this approach can be applied universally to all AI systems and language models, it has the potential to radically change the AI ​​landscape.

Importantly, the scientists have made their development open source, which will allow major AI players such as Meta✴, OpenAI, Google, Nvidia and others to freely use the new achievement to process workloads and create faster and more energy-efficient artificial intelligence systems. Ultimately, this will lead to AI being able to fully function on personal computers and mobile devices, and approaching the level of functionality of the human brain.

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *