While the potential of generative artificial intelligence (AI) may seem limitless, the computing power it requires could present limitations. One estimate places the cost of a ChatGPT query at 1,000 times that of the same question asked of a normal Google search. In the initial development stages, as companies such as OpenAI seek to generate public interest, that may be an acceptable cost, even with 100 million active users added in a single month.
However, that kind of expense could easily become unsustainable for a more general-use product. Even the White House has weighed in on the question, noting the potential environmental impact of the increased energy consumption and data center space required for extended generative AI applications.
Before dealing with the cost of running large language models (LLMs), most companies interested in developing their own generative AI solutions will come up against the cost of training them. Training generative AI requires either owning or renting time on hardware, significant data storage needs and intensive energy consumption. The cost of simply training OpenAI’s GPT-3 — the version before the one employed in ChatGPT — was more than $5 million.
However, there has been some progress in lowering the bar for entry into generative AI. One solution developed at the Massachusetts Institute of Technology (MIT) claims to reduce the cost of training an LLM by 50%. In addition, the more efficient training method also trains LLMs in half the time.
Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have also raised the potential of smaller, specialized LLMs as a way to reduce cost and improve efficiency. Limiting the data set that a model is working from not only allows it to outperform models with 500 times as many parameters but also promises to address some privacy and accuracy concerns.
The researchers employed a model in their LLM that permits it to simplify how it calculates potential responses. It does this by comparing a hypothesis, or the potential generated statement, to a premise — a known fact. In addition to relying on a smaller and more specialized data set, the model also requires less training to achieve accurate results, researchers found.