Generating video from text

Sora is an AI model that can generate realistic and imaginative scenarios from text prompts.

Sora is able to generate complex scenes with
multiple characters, specific types of motion
and accurate details of the subject and background.
The model understands not only what the user has
asked for in the prompt, but also how
those things exist in the physical world.

Prompt: A drone camera orbits a stunning historic church perched on a rocky outcrop along the Amalfi Coast. The scene highlights intricate architectural details, tiered pathways, and patios. Below, waves crash against the rocks, framing the coastal waters and hilly landscapes of Amalfi, Italy. In the distance, figures stroll along patios, marveling at the dramatic ocean views. The warm afternoon sun casts a magical and romantic glow over the scene, beautifully captured through photography.

Prompt: In Tokyo, a fashionable woman strolls down a bustling street illuminated by vibrant neon signs and animated city lights. She exudes style in a black leather jacket paired with a flowing red dress and sleek black boots, complemented by a black purse. Sporting sunglasses and bold red lipstick, she walks with confidence and ease. The damp pavement reflects the luminous colors, enhancing the lively atmosphere filled with pedestrians.

Designing Captivating Graphics

Sora's advanced language comprehension allows it to interpret prompts accurately, crafting compelling characters imbued with vivid emotions. Moreover, Sora can generate multiple scenes seamlessly in a single video, maintaining consistency in characters and visual style throughout.

Safety

It sounds like you're involved in a comprehensive approach to developing and deploying AI, particularly focusing on safety and ethical considerations. Here's a summary:

You're collaborating with domain experts in areas such as misinformation, hateful content, and bias to rigorously test the model.

You're developing tools, including a detection classifier, to identify content generated by the model. Future plans involve incorporating C2PA metadata for transparency and accountability.

Existing safety methods, initially built for DALL·E 3, are being adapted for Sora. These include text classifiers to filter out inappropriate prompts and image classifiers to ensure generated content meets usage policies.

Engagement with policymakers, educators, and artists worldwide is ongoing to understand concerns and explore positive applications of the technology.

Despite extensive research and testing, acknowledging the unpredictability of both beneficial and abusive uses, you emphasize learning from real-world applications to continuously improve AI safety.

This approach aims to ensure responsible deployment and usage of AI technologies while fostering innovation and addressing potential risks.




Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.

Prompt: A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.




Sora is capable of generating complex scenes with multiple characters, specific types of motion, and accurate details of both the subject and background. The model not only understands what the user has requested in the prompt but also how those elements exist in the physical world.

Today, Sora is being made available to red teamers to assess critical areas for potential harms or risks. We are also providing access to several visual artists, designers, and filmmakers to gather feedback on how to improve the model for creative professionals.

We are sharing our research progress early to collaborate with and receive feedback from individuals outside of OpenAI, as well as to inform the public about the upcoming AI capabilities.

Our goal is to teach AI to understand and simulate the physical world in motion, ultimately training models that assist people in solving problems requiring real-world interaction.

Introducing Sora, our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and fidelity to the user's prompt.



Prompt: The camera directly faces colorful buildings in Burano Italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.

Prompt: A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.

Exploring Research Methods

Sora is a diffusion model that generates videos by starting with an initial state resembling static noise and gradually refining it through multiple steps to remove the noise.

Like GPT models, Sora uses a transformer architecture, enabling exceptional scalability.

© Copyright 2024 Sora - All Rights Reserved