Digital Product Studio

Google DeepMind Goes Big in Video with Release of Veo, Setting Their Sights on Surpassing OpenAI’s Sora

Just today, in the Google I/O announcements, Google released a new model called Veo, a new text-to-video AI model. Veo creates videos at 1080p and is very realistic and able to create videos in a variety of styles. With Veo, Google DeepMind takes aim at surpassing OpenAI’s widely acclaimed Sora which currently leads in quality, scale, and control of text-to-video generation. 

Google Deepmind Introduces Veo

Veo is Google DeepMind’s most advanced generative text-to-video model. Veo has the ability to generate high-quality that go beyond a minute in length, offering a wide range of cinematic and visual styles. It understands text prompts accurately. It provides filmmakers and content creators unprecedented control over video generation while maintaining coherence across frames.

Example Videos Generated by Deepmind Veo

Key Features of Google Deepmind Veo

1. Understanding Prompts and Visual References

Veo uses advanced natural language understanding and visual semantics to accurately interpret text prompts and generate coherent video scenes following the description. It captures nuances and tones from the prompt while rendering intricate visual details. This allows Veo to generate videos that closely follow provided inputs.

2. Creative Controls for Storytelling

Veo supports various parameters that give users control over how videos are generated. It can take an initial video along with editing commands like adding objects and applying the changes. Masked editing is also possible where changes are made to specific masked areas. Additionally, Veo can generate videos conditioned on both text prompts and reference images.

3. Consistency Across Frames

Maintaining consistency of characters, objects and styles between frames is challenging but critical for video generation. Veo leverages powerful latent diffusion models to reduce inconsistencies, keeping elements stable like in real videos. This enhances the viewing experience of generated clips.

4. Long-form Video Generation

Unlike other models limited to short clips, Veo is capable of producing videos exceeding a minute from single or sequenced prompts. It can tell longer stories seamlessly when provided with multiple prompts describing different scenes in order.

5.  Watermarked Videos For Safety 

To ensure responsible usage, videos created by Veo are watermarked using SynthID, a state-of-the-art tool for watermarking and identifying AI-generated content. Furthermore, safety filters and memorization checks are applied to mitigate privacy, copyright, and bias risks associated with the generated videos. 

How Veo Was Developed?

Veo is not an isolated creation but builds upon years of generative video model research conducted by Google DeepMind. It incorporates advancements from various models such as Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. Additionally, the model utilizes the Transformer architecture and Gemini. Through continuous improvement and incorporating feedback from leading creators and filmmakers, Veo strives to enhance its capabilities and benefit the wider creative community.

Google Veo vs OpenAI Sora

If we compare both models, it’s clear that OpenAI Sora’s videos exhibit superior resolution, colours and detail retention across frames. Their videos maintain consistency without sudden changes or distortions between shots. On the other hand, some of Veo’s videos showed mildly muddled imagery and broken continuity at times.

Sora supports a wider range of video formats, lengths and aspect ratios directly without resizing. This flexible sampling allows customization for multiple devices and formats. Veo currently focuses only on 1080p resolution and minute-long clips.

Sora impresses with its skills like animating images, extending or seamlessly connecting videos using other media as priors. Veo still relies solely on text prompts. This conditional generation grants Sora more creative applications.

Sora’s high cost may be a limitation, though both teams are likely working to improve efficiency. Google’s massive computing and dataset advantages, along with YouTube, may help Veo eventually surpass Sora’s quality-to-cost ratio. However, OpenAI’s lightweight approach focuses on flexible prompt-style customizations and seems more scalable currently.

Some sceptics question whether Google cherry-picked Veo examples or staged interactions to exaggerate its capabilities. OpenAI more transparently demonstrates Sora’s full range of uses. Both are early in development, so expect issues like temporal inconsistencies. 

Applications and Future of Veo

Veo will power text-to-video creation tools from DeepMind, such as VideoFX. Some capabilities may be integrated with YouTube and other Google products later as well. With continued research, Veo aims to make video generation widely accessible while preserving creative control for all types of users. To demonstrate Veo’s potential, DeepMind partnered with acclaimed filmmaker Donald Glover and his studio Gilga. Their preview recordings showcase how established and up-and-coming storytellers can take generative media in unexplored directions. Check the video below:

Concluding Thoughts

Veo marks a huge leap for DeepMind in generative video, but as ever, Google never leaves a chance to copy OpenAI’s successes. Sora set a high bar that Veo is not fully clearing yet. With their deep pockets and computed prowess, though, it’s only a matter of time before Google closes that gap through mimicry rather than innovation, as is their style. Lastly, to try Veo, sign up for VideoFX.

| Also read Latest From Us


Stay updated with the latest news and exclusive offers!

* indicates required
Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.