Superhuman AI
Posts
OpenAI announces a new video model

OpenAI announces a new video model

ALSO: There's a powerful new Gemini in town

Zain Kahn
February 16, 2024

Read time: under 4 minutes

Welcome back, Superhuman

I give up. The pace of groundbreaking product releases in AI is beyond ridiculous, and I’ve simply run out of adjectives to describe what’s going on at this point. But I’ll try anyway.

_{First time reading? Sign up to get this newsletter}_here

TODAY’S MENU

OpenAI announces Sora — AI-generated video is now a reality
Google drops Gemini Pro 1.5, unlocks powerful new capabilities
How to use Pika to create videos from text
Friday Laughs: OpenAI steals Gemini’s spotlight
5 new AI tools to boost your productivity
AI Generated Images: Dogs in hats with Wes Anderson

NEWS

Today in AI & Tech

Meta and Chill: Meta releases V-JEPA, a method for teaching machines to understand and model the physical world by watching videos, potentially unlocking the ability to accomplish tasks in the real world through that understanding.

Helping Hand: Apple is reportedly developing an AI tool that would help developers write code for apps, in direct competition with Microsoft’s Github Copilot.

Nope™: OpenAI cannot trademark “GPT" according to a new ruling by the U.S. Patent and Trademark Office.

Coding Pal: Magic, an AI-powered virtual coworker that can write code, raises $117M in funding from former GitHub CEO Nat Friedman and others.

NEXT IN AI

OpenAI announces Sora — AI-generated video is now a reality

Source: OpenAI

If you look up the definition of “stealing someone’s thunder” in the dictionary, chances are you’d find a photo of Sam Altman and OpenAI. That’s because the company behind ChatGPT, DALL-E, and GPT-4 has a habit of one-upping its competitors…especially Google.

Yesterday, Google released its newest Gemini 1.5 Pro model (more on that later). Just a few hours later, OpenAI said ‘Hold my beer’ and announced Sora – its new, cutting-edge text-to-video model.

Named after the Japanese word for "sky," Sora is OpenAI’s first foray into AI-generated video. In an official announcement, the company said Sora can “create realistic and imaginative scenes from text instructions.” Each video can be about a minute long and the quality is what some have called Hollywood-worthy. That might be due to Sora’s ability to create complex scenes with multiple characters … and even understand emotions.

OpenAI says Sora won’t be released to the general public until a red team has had a chance to analyze and scrutinize its every vulnerability. While they haven’t officially said it, Sora will likely be released as part of ChatGPT, where you’ll be able to generate videos with simple text prompts.

To put the enormity of this achievement in perspective, this time last year we could barely generate realistic pictures with AI. Fast forward to today, and we’re on the cusp of having Hollywood-grade AI-generated footage at our fingertips. Things are about to get very interesting.

But you don’t have to take our word for it. Check out some early videos that OpenAI shared on their website.

GEMINI FAMILY

Google unlocks powerful new capabilities with Gemini 1.5 upgrade

Context lengths of leading foundation models - Source: Google

You have to feel for Google. Barely a couple of hours had passed since they announced Gemini Pro 1.5 and OpenAI swooned in outta nowhere to steal their limelight with Sora. But that doesn’t make Google’s latest model any less interesting or important.

The most impressive thing about Gemini Pro 1.5 is that has a 1 million token context length, which means that it can simultaneously process up to 1 million units of information like words.

What that means in practice is that Gemini 1.5 can process up to 1 hour of video, 11 hours of audio, and codebases with over 30,000 lines of code in one go — a stunning technical achievement that is well beyond what any other AI model is currently capable of.

The real-life implications of this breakthrough are that Gemini 1.5 Pro can do things like:

Watch a silent movie and still pick up on plot points
Understand, reason about, and identify certain details in audio transcripts or lengthy documents like contracts
Analyze entire code libraries for a product and explain how it’s built

Google put specific emphasis on the enhanced reasoning abilities of Gemini Pro 1.5, claiming that it can learn new skills like translating rare languages or improving a codebase with 100,000 lines of code.

Google says they’re currently giving developers and enterprise customers limited access to Gemini 1.5, and that a full release to the general public is “coming soon.”

AI AT WORK

How to generate videos from text with Pika

Source: Pika

OpenAI’s Sora isn’t ready for you to use just yet. Until then, there are plenty of other text-to-video models you can try out, like Pika:

Sign in to Pika.art
Go to the Pika 1.0 Dashboard and click 'Explore’
Write out your video idea in the prompt section in the middle of the screen. Be as descriptive as possible, adding adjectives and specific details
Include any previous images or videos, if you have any
Choose your preferred aspect ratio and frames per second
Play around with motion control to understand how the camera moves in your video
Press Enter to generate your video
Find your finalized video in the “My Library” section
If you’re not happy with the final results, you can press “retry,” “reprompt,” or “edit”

FRIDAY LAUGHS

OpenAI steals Gemini’s spotlight

Source: @ai_for_success on X

PRODUCTIVITY

5 AI Tools to Supercharge Your Productivity

Jasper AI: Helps businesses scale up marketing content like blog articles, social media posts, sales emails, website copy, and more.

Writingmate: AI copilot for Google Docs, Slides, and Sheets. Answers your questions and writes your emails, compatible with all websites.

Innovating With AI*: The No-Code AI Toolkit — A free guide to transforming your office workflows with AI. Reports, calendar, proposals and more. Click to get it in your inbox – free, instant access.

Lindo AI: Build a website by describing your business and get everything you need, from landing pages to automated organic lead generation.

Recast: Turn your read-later list into bite-sized audio summaries with AI.

^{* indicates a promoted tool, if any}

AI GENERATED IMAGES

If Wes Anderson characters were dogs

Source: @boubi1000 on Midjourney

Prompt: portrait from 3/4 side of a dog dressed in a jacket and a hat, Wes Anderson style, soft colors, kodachrome photograph --v 6.0 --ar 3:4

Acquire new customers and drive revenue by partnering with us

Superhuman is the world’s biggest AI newsletter for businesses and professionals with 500,000+ readers working at the world’s leading startups and enterprises. Companies like Amazon, Calendly, and Notion feature their products in Superhuman. Main ads are typically sold out 4 weeks in advance. You can book future ad spots here.

🧞 Your wish is my command

What did you think of today's email?

Your feedback helps me create better emails for you!

Reviews of the day

Thanks for reading.

Until next time!

Zain

p.s. if you want to sign up for this newsletter or share it with a friend or colleague, you can find us here. For questions and feedback, email us at [email protected]. We always listen and respond.