Want to get featured here? Explore premium visibility opportunities.

Contact us

AI NewsGoogle’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

2:51 AM IST · May 20, 2026

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

When Google launchedGemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats. Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to “create anything from any input.” Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science. Omni also lets users edit photos with plain text commands rather than complex editing software, similar toGoogle’s Nano Banana. Google already has a dedicated video model,Veo, that lets users turn text and images into videos, and evendirect and customize avatars. But Google DeepMind director of product management Nicole Brichtova says that today’s release is more than a Veo update: “It’s the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models.” One example that Koray Kavukcuoglu, DeepMind’s chief technologist, gave reporters during a media briefing on Monday: When Omni was given a simple prompt like “a claymation explainer of protein folding,” it quickly rendered a video of a stop-motion explainer with a voice-over that said, “Proteins start as chains of amino acids. They fold into patterns like the alpha helix and flat sections called beta sheets, forming a perfect three-dimensional shape.” The long-term vision for Omni is broader, involving the model being used to do things like generate images from audio, or audio from video. “When we first announced Gemini, it was our first AI model to be natively multimodal,” Pichai said during the briefing. “We knew that training it on a combination of text, code, audio, images, and video would give it a deeper understanding of the world. With world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction.” As part of the release, users will also be able to create videos with their own digital avatars — something OpenAI popularized on its now-defunct Sora app with Cameos. To prevent deepfakes, users will have to go through a dedicated product onboarding, which involves recording themselves and speaking out a series of numbers, per Brichtova. The avatar then gets stored for future use. Additionally, all videos created with Omni will include Google’s SynthID digital watermark, which allows users to verify if videos were generated via the Gemini products. The first model in the family is Gemini Omni Flash, which will roll out today to the Gemini app, YouTube Shorts, and AI creative studio Flow. Flash will be capable of rendering 10 seconds of video, which Brichtova says isn’t a model limitation, but rather a decision based both on a desire to get it into more hands and an anticipation that most users won’t want to make much longer videos yet. Longer video durations are in the pipeline for the near future, though. Google seems to be pitching Omni Flash as more of a consumer tool. The examples Brichtova and Gabe Barth-Maron, a research engineer at DeepMind, gave on a call with TechCrunch of uses for digital avatars were all personal: Making a video of yourself winning an award or going to the moon, or removing a passerby from the background of a video you took on vacation. Barth-Maron put it more simply: “They’re like personalized memes.” “We definitely did focus on making this easy to use for consumers,” Brichtova said. “Not many video models have breached that chasm with consumers, so this is our play to do that.” The ease of use comes with a caveat: Brichtova and Barth-Maron noted that editing prompts will need to be highly specific, otherwise Omni risks over-editing or unintentionally altering elements the user wanted to keep — a problem Nano Banana users would have run into. Despite the near-term consumer focus, Omni’s enterprise andcreative implicationsare obvious, and Google will make Omni available via API in the coming weeks. The avatar-generating tool — a capability that is available today on Shorts — is something Google expects content creators to pick up. But more broadly, an end-to-end multimodal workflow could be transformative for advertisers and filmmakers. Startup Luma AI is building something similar,an agentic toolthat can generate an entire ad campaign based on a short brief and a product image, powered by its own “unified” model. “We’re actually pretty proud of the model’s text-rendering capabilities, which is really useful for things like advertising,” Brichtova said. “If you want a product somewhere, or even just a slogan, it needs to be accurate … We definitely anticipate filmmakers and other kinds of creators are going to be using this model as well.” The more professional use cases might be better served by the Omni Pro model, which should perform better across all Omni tasks. Google hasn’t said when it will release Pro yet, but Brichtova said that will happen when “we feel like we’re at a point where we have a step change above Flash.”

read more

Latest AI News

View All News →
In the Weights is your new AI-centric vanity search

In the Weights is your new AI-centric vanity search

Anyone who’s Googled themselves recently knows that it doesn’t quite hit the way it used to. Sure, there’severything going on with Google search itself, but there’s also an inescapable feeling that web search isn’t the canonical source of information that it used to be, with just as many people learning about who you and I might be from chatbots. Thomas Dimson and Joey Flynn had a similar feeling, leading them to createIn the Weights. The“weights”in question are the numerical parameters that shape an AI model’s training and output, so the websitepurportsto measure how well “a model is able to recall someone without using tools like web search.” “Being in the weights means your existence was deemed important in the process of creating superhuman artificial intelligence,” the website says. To achieve this, In the Weights supposedly queries different models (including Grok, Gemini, multiple versions of GPT, Claude, and Llama, plus lesser known models) with a question similar to, “Who is <name>? Give up to 10 results, each with a short description and confidence.” It then “cluster[s] similar descriptions together and assign[s] a strength score.” For example,this humble tech bloggerreceived a strength score of 641, placing me in the top 6% of names. I was feeling pretty good until I saw thatmultipleTechCrunchcolleaguesscored even higher. And theleaderboardhas been shifting as I write this post, with “Home Alone” star Macaulay Culkin currently in the top slot with a strength score of 988, neck-and-neck with opera singer Luciano Pavarotti. The results also show which models returned which answers for a given name, and they highlight potential hallucinations — apparently GPT-5.4 Mini says that Anthony Ha is an “ambiguous name form that could refer to multiple people with the initials A.H.A.” Asked why he built In the Weights, Dimson told TechCrunch via email that he and Flynn were looking to “get the creative juices flowing again” after leaving OpenAI (which they both joined throughthe acquisition of their design startup Global Illumination). Dimson said he was thinking about how “Google vanity searches are the wrong objective in 2026 as more traffic moves to LLMs” and about the fact that “so many lives are encoded somehow in a bunch of floating point numbers inside the AI brain.” He also said the direction of the site was “sealed” bya tongue-in-cheek blog postriffing on AI weights and Terry Bisson’s classic short story“They’re Made Out of Meat.” “Reception has been insane so far, we thought this would be a mild curiosity but it seems like it has struck a nerve of wanting to see if you live forever in the super intelligence (the comparison factor doesn’t hurt either!)” Dimson added. While I’m not as convinced that being “remembered” by a chatbot is a guaranteed ticket to immortality, I can’t deny that I find the results both intriguing and jealousy-inducing, especially since they’re codified in an easy-to-compare score. (AI critic Anthony Moserscoffedthat this is “literally the same as asking 13 chatbots to tell you about yourself.”) Also helping: The fact that the site features a cute,Nintendo-inspiredretro design. Dimson said he plans to dig in further into why different models in the same series return different results, which models are biased towards different types of people, and which people “should have a Wikipedia article but don’t.”

2 hours ago

View

Signal’s Meredith Whittaker wants you to remember that AI chatbots ‘are not your friends’

Signal’s Meredith Whittaker wants you to remember that AI chatbots ‘are not your friends’

Asked about the privacy implications of chatbots like ChatGPT and Claude, Signal President Meredith Whittaker answered, “These are not your friends. These are not conscious beings. These are not sentient interlocutors.” Whittaker made those comments ina broader interview with Bloombergabout policy, privacy, and Signal. She acknowledged that she uses AI tools “to format a document here and there,” but insisted, “I don’t ask them questions. I’m very serious about my thinking and writing, and I don’t want the process of working through an idea […] to be foreclosed or eclipsed by the response of a system that’s averaging what’s already out there.” As for Microsoft AI CEOMustafa Suleyman’s predictionthat users could let Microsoft Copilot handle all their Christmas shopping this year, Whittaker argued this scenario — where Copilot is eavesdropping on the family group chat to determine who wants want — means giving it “access to my credit card, my browser, my Signal, the ability to message my siblings on my behalf, my home address [and] my calendar.” “What you’ve just described is a system with very pervasive access across multiple applications and services,” Whittaker said. “In the context of Signal, it would constitute a kind of a backdoor.”

2 hours ago

View

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

John Jumper, who shared a recent Nobel Prize in chemistry, announced Friday that he’s making the leap to Anthropic after “nearly 9 years” at Google DeepMind. Ina post on X, Jumper wrote that DeepMind CEO Demis Hassabis “took a real chance letting me lead the AlphaFold team just six months after finishing my PhD, and the entire GDM team taught me so much about how to do great science.” Jumper (pictured above right, with Hassabis) added, “GDM is a special place, and I’ll still be excited to hear about what amazing things they discover next.” Bloomberg reports that Jumper wasa key member of Google’s team developing coding tools, which the company has struggled to sell to businesses. Character AI co-founder Noam Shazeer also announced this week thathe’s leaving DeepMind— though in Shazeer’s case, he’s joining OpenAI. Jumper and Hassabis won the Nobel Prize in 2024for their work on AlphaFold, an AI model that can predict the 3D structure of proteins based on their genetic sequences.

6 hours ago

View

Nobel-Winning AlphaFold Scientist John Jumper Leaves Google DeepMind for Anthropic

Nobel-Winning AlphaFold Scientist John Jumper Leaves Google DeepMind for Anthropic

For his work on AlphaFold, Jumper shared the 2024 Nobel Prize in Chemistry with Demis Hassabis and scientist David Baker.

14 hours ago

View