The rise of several artificial intelligence models indicates a fresh surge in industry rivalry, possibly setting the stage for an array of innovative commercial applications.
OpenAI, Google and French startup Mistral AI unveiled the latest iterations of their AI models this week. The series of announcements kicked off shortly after Meta confirmed that its new AI model, Llama 3, would be available within weeks.
“With the diversity of models, it means that there will ultimately be many applications,” Muddu Sudhakar, co-founder and CEO of the generative AI company Aisera, told PYMNTS. “There are also advances with innovations like agents, which will help automate tasks on our behalf. It could do research for us, say, to plan a trip or buy a car or buy a house. Or it can do this for more advanced use cases, like drug discovery.”
Meta President of Global Affairs Nick Clegg told TechCrunch Tuesday (April 9) that the company will begin rolling out some of its Llama 3 models sometime this month.
“Within the next month, actually less, hopefully in a very short period of time, we hope to start rolling out our new suite of next-generation foundation models, Llama 3,” he said, per the report. “There will be a number of different models with different capabilities, different versatilities [released] during the course of this year, starting really very soon.”
OpenAI announced via a Tuesday post on X, formerly known as Twitter, that GPT-4 Turbo with Vision, the latest GPT-4 Turbo model, is now generally available to developers via the OpenAI API. This model retains the 128,000-token window and knowledge base up to December, characteristic of GPT-4 Turbo. The notable enhancement in this iteration is the introduction of vision capabilities, enabling the model to process and interpret images and other visual data.
GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.https://t.co/cbvJjij3uL
Below are some great ways developers are building with vision. Drop yours in a reply ?
— OpenAI Developers (@OpenAIDevs) April 9, 2024
Also on Tuesday, Google introduced its advanced large language model, Gemini Pro 1.5, to the public, offering a free usage tier of up to 50 requests daily.
Mistral AI, co-founded by former members of Meta’s AI team, also decided to jump into action Tuesday by embracing an open-source philosophy similar to Meta’s. It launched its latest sparse mixture of experts (SMoE) model, Mixtral 8x22B. However, unlike its competitors, the Paris-based startup, which raised Europe’s largest seed round in June and has become a rising star in the AI domain, didn’t push the release through a demo video or blog post. Instead, it dropped a download link of a 281GB file on X to test the new model.
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%https://t.co/2UepcMGLGd%3A1337%2Fannounce&tr=http%3A%2F%https://t.co/OdtBUsbeV5%3A1337%2Fannounce
— Mistral AI (@MistralAI) April 10, 2024
The real advantage of OpenAI’s GPT-4 Turbo with Vision lies in its capacity to process significantly larger prompts, Bohdan Khomych, head of R&D commercialization at SoftServe, told PYMNTS. The new version brings a 128K context window, enabling a deep dive into topics with an understanding of over 300 pages of text. Mistral 8x22B introduces 176 billion parameters with a 65,000-token context length, using a mixture of experts (MoE) architecture for efficient computation and broad task performance improvement.
AI parameters are the settings in an AI program that help it learn and make decisions. Tokens are like building blocks of words or information that allow an AI system to read and create text.
“Its open-source nature and the Apache 2.0 license democratize access to cutting-edge AI, encouraging widespread innovation,” Khomych said.
On the other hand, Google’s Gemini 1.5 Pro stands out with its ability to process up to 1 million tokens, equivalent to approximately 700,000 words or 30,000 lines of code, Khomych said. It brings new features like native audio understanding and system instructions to the Gemini API, giving developers more control and users more intuitive functionalities.
“Think of this model as a sidekick who can ‘hear’ and ‘listen’ to audio, then package up the useful information without having to read a transcript,” he said. “It’s great to see how these improvements make AI interactions easier and more user-friendly.”
However, Sunil Srivatsa, CEO of the software development firm Storm Labs, told PYMNTS the new models are incremental improvements over existing offerings.
“The main benefits of these models are supporting multiple modalities and being able to handle increasingly more complex logic and reasoning,” he said. “We will probably have to wait for GPT-5 to see the next big leap in performance.”
Steve Brotman, founder and managing partner of Alpha Partners, which invests in AI, told PYMNTS he expects the number of large language models (LLMs) to continue to multiply rapidly.
“Given the current enthusiasm, we predict this AI layer will become a commodity, similar to the early 2000s scenario when an overbuild of networking fiber drastically reduced the cost of connectivity to nearly zero,” he said. “That shift gave rise to Google, Facebook, YouTube, Netflix, etc. Similarly, we expect that this proliferation will give rise to similar types of applications that will run on very low-cost LLMs. It will be interesting to see which applications emerge.”
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.