CareCredit Women's Wellness Index July 2024 Banner

Meta Releases AI Models That Generate Both Text and Images

Meta has released five new artificial intelligence (AI) research models, including ones that can generate both text and images and that can detect AI-generated speech within larger audio snippets.

The models were publicly released Tuesday (June 18) by Meta’s Fundamental AI Research (FAIR) team, the company said in a Tuesday press release.

“By publicly sharing this research, we hope to inspire iterations and ultimately help advance AI in a responsible way,” Meta said in the release.

One of the new models, Chameleon, is a family of mixed-modal models that can understand and generate both images and text, according to the release. These models can take input that includes both text and images and output a combination of text and images. Meta suggested in the release that this capability could be used to generate captions for images or to use both text prompts and images to create a new scene.

Also released Tuesday were pretrained models for code completion. These models were trained using Meta’s new multitoken prediction approach, in which large language models (LLMs) are trained to predict multiple future words at once, instead of the previous approach of predicting one word at a time, the release said.

A third new model, JASCO, offers more control over AI music generation. Rather than relying mainly on text inputs for music generation, this new model can accept various inputs that include chords or beat, per the release. This capability allows the incorporation of both symbols and audio in one text-to-music generation model.

Another new model, AudioSeal, features an audio watermarking technique that enables the localized detection of AI-generated speech — meaning it can pinpoint AI-generated segments within a larger audio snippet, according to the release. This model also detects AI-generated speech as much as 485 times faster than previous methods.

The fifth new AI research model released Tuesday by Meta’s FAIR team is designed to increase geographical and cultural diversity in text-to-image generation systems, the release said. For this task, the company has released geographic disparities evaluation code and annotations to improve evaluations of text-to-image models.

Meta said in an April earnings report that capital expenditures on AI and the metaverse-development division Reality Labs will range between $35 billion and $40 billion by the end of 2024 — expenditures that were $5 billion higher than it initially forecast.

“We’re building a number of different AI services, from our AI assistant to augmented reality apps and glasses, to APIs [application programming interfaces] that help creators engage their communities and that fans can interact with, to business AIs that we think every business eventually on our platform will use,” Meta CEO Mark Zuckerberg said April 24 during the company’s quarterly earnings call.

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.