Bite-Sized AI: Why Smaller Models Like Microsoft’s Phi-3 Are Big for Business

Smaller artificial intelligence (AI) models, like Microsoft’s recently unveiled Phi-3-miniare proving that bigger isn’t always better for business applications.

These lightweight, efficient models can tackle content creation and data analysis without the hefty computational requirements and costs associated with their larger counterparts, experts say, making AI more accessible and cost-effective for businesses. 

“Small language models have a lower probability of hallucinations, require less data (and less preprocessing), and are easier to integrate into enterprise legacy workflows, Narayana Pappu, CEO at Zendata, a provider of data security and privacy compliance solutions, told PYMNTS.Most companies keep 90% of their data private and don’t have enough resources to train large language models.”

Microsoft Bets on Small AI

In a paper published on the open-access publishing platform arXiv, Microsoft also announced the creation of two larger models in the Phi-3 family: phi-3-small and phi-3-medium variants. The company did not reveal when any versions of Phi-3 would be released to the broader public.

“We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3. trillion tokens, whose overall performance, as measured by academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT 3.5,” the Microsoft researchers wrote in their paper. “The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format.”

Microsoft isn’t the only company pursuing smaller models. As PYMNTS previously reportedInflection’s recent update to its Pi chatbot represents a shift towards developing smaller, more efficient AI models that make advanced technology more accessible and affordable for businesses.

The chatbot now features the Inflection 2.5 model, which nearly matches the effectiveness of OpenAI’s GPT-4 but requires only 40% of the computational resources for training. This model supports more natural and empathetic conversations and includes enhanced coding and mathematical skills, broadening the topics Pi users can explore.

Small language models (SLMs), which range from a few hundred million to 10 billion parameters, use less energy and fewer computational resources than larger models. This makes advanced AI and high-performance natural language processing (NLP) more accessible and affordable for a broad spectrum of organizations. The reduced costs of SLMs stem from their compatibility with more affordable graphics processing units (GPUs) and machine-learning operations (MLOps).

Small AI Advantage

Smaller AI models are popular among financial and eCommerce companies. They help personalize customer experiences, measure intent, and compare products. Arthur Delerue, founder and CEO of KWatch.io, which uses generative AI to analyze social media content automatically, said his company only uses SLMs.

“Smaller LLM [large language models] models have several advantages,” he said. “Firstly, they require less computational power and memory, making them more efficient to train and deploy. Secondly, they are faster and consume less power, which is essential for real-time applications. Lastly, smaller models tend to be more interpretable and easier to understand, which can be beneficial for certain tasks and industries.”

Unlike massive LLMs with unspecified parameters, smaller specialized LLMs are trained on industry-specific knowledge and can understand specialized language, as well as concepts, leading to improved accuracy, Raghu Ravinutala, the CEO of Yellow.ai, told PYMNTS.

“This approach results in a more efficient and personalized user experience, making the smaller AI models more effective and accessible,” he added. 

Generalized AI models, like today’s large-scale GPTs, are often built on vast datasets and can mimic human-like conversation. Still, Ravinutala said they typically need more specificity and nuance to unleash their full potential for business growth. 

“The current one-size-fits-all model of generative AI has led to generic outputs, poor integrations, hallucinations and vulnerabilities,” he added. “Companies seeking to integrate generative AI require technology tailored to their distinct needs, industry vocabulary, and unique character.”