- Superhuman AI
- Posts
- OpenAI's plan to make AI more accurate
OpenAI's plan to make AI more accurate
ALSO: 'Restored footage' of the Roman Empire
Read time: under 4 minutes
Welcome back, Superhuman
On average, for every 100 questions you ask an AI model, it will fabricate at least three answers. OpenAI has come up with an idiosyncratic way to solve that problem: Using AI to catch its own model’s mistakes.
Today’s Insights
OpenAI’s plan to solve AI hallucinations
Tutorial: How to translate videos with accurate lip-syncing
5 new AI tools to boost your productivity
Everything else you should know today
AI-Generated Images: The Capybara Blues
NEXT IN AI
OpenAI’s approach to spotting hallucinations: Use AI
Source: AP
You know what they say: “You’re your own worst critic.” OpenAI took the phrase literally and built a model called CriticGPT that tries to find flaws in GPT-4’s responses. The strange part: The new model is powered by none other than GPT-4 itself.
How can an AI model catch its own mistakes? Because CriticGPT was specially trained to become an expert lie detector. Human researchers fed the model false information, then showed it how to respond with detailed critiques. For now, OpenAI is only using CriticGPT to assess GPT-4’s coding abilities, since the answers are cut-and-dry. More open-ended questions, meanwhile, can generate subjective responses that are harder to judge as strictly “right” or “wrong.”
Wouldn’t humans be better at flagging errors? Most AI companies, including OpenAI, still rely on humans to assess models’ responses. An LLM will generate several responses to the same question, then a human will pick which one is most relevant and accurate — which, in turn, helps refine the model’s future answers. But as LLMs get more sophisticated, it’s getting harder for testers to keep up. Besides, it’s inevitable that humans will sometimes introduce their own mistakes and biases into assessments.
So, how did CriticGPT compare? It managed to spot 85% of coding bugs, while trained humans only found 25% of them. In the end though, the best option turned out to be pairing humans with CriticAI. When humans and the model worked together, they performed 60% better than humans alone.
OpenAI isn’t the only organization working on this: Researchers from the University of Oxford just unveiled an algorithm that they say can spot AI hallucinations 79% of the time. That’s about 10% better than today’s best methods. There’s still work to be done, though, since the approach also uses about 10 times more energy than a typical chatbot interaction.
PRESENTED BY AE STUDIO
Don’t lose the AI race, hire AE Studio
Trusted by leading startups and Fortune 500 companies
AE Studio is the key to industry dominance, securing AI talent from Harvard, Stanford, and MIT to streamline operations.
What AE Studio will do for you:
Turbocharge your business, save hundreds of hours, and WIN!
Have AE build custom your custom software and AI solutions
Hire AE to pinpoint where and why you should be building NOW!
Stop waiting and start winning. Schedule your FREE strategy session to get your business ahead of the AI race. Get in touch here
AI AT WORK
How to translate videos with accurate lip-syncing
Go to ElevenLabs and sign up
Select dubbing from the left tab and enter all the relevant details for the video you want to translate
Upload your video or insert a video link in the ‘Select a Source‘ tab
Select source language, target language, and the number of speakers
Press Create and wait for the video to be processed
Voila, you’re done! You can download your translated video if you wish.
For the example above, I used an English-language video I found on YouTube and converted it into French.
PROMPT OF THE DAY
Literary Critic
Prompt: I want you to act as a language literary critic. I will provide you with some excerpts from literature work. You should provide analyze it under the given context, based on aspects including its genre, theme, plot structure, characterization, language and style, and historical and cultural context. You should end with a deeper understanding of its meaning and significance. My first request is "To be or not to be, that is the question."
You can adapt the prompt to your specific needs.
Source: @lemorage on Github
PRESENTED BY PIPELINE TALENT
If you don’t have an assistant, you are the assistant
Hire elite offshore talent at a fraction of US salaries, so you and your team can focus on the work that really matters. We evaluate 1,000+ candidates every month to help you find the best:
Years of experience across multiple job functions
College-educated and fluent in English
Vetted by our team of expert recruiters
What’s more? If you don’t like your hire in the first 6 months, we’ll find you a replacement. Find your next best hire today
AI & TECH NEWS
Everything else you need to know today
Source: Character AI
Pocket Pals: Character AI will now let users call AI avatars on the phone. The AI characters can help you prep for an interview and learn a new language or can act as a role-playing companion.
Stacking the Deck: Rain AI — a startup that’s building more efficient AI semiconductors — has recruited former Apple hardware expert Jean-Didier Allegrucci, marking its second big-name hire in a month.
Bot or Not: Meta will begin rolling out a new Instagram feature that lets creators build chatbots modeled after themselves, although the avatars will be clearly labeled as AI.
Copycat Catastrophe: Amazon Web Services has launched an investigation into its cloud partner Perplexity after reports emerged that the AI-powered search engine was plagiarizing material from across the web.
😄 One Fun Thing: An AI-generated video depicting “restored footage” of the Roman Empire has generated nearly 6,000 upvotes on Reddit. The creator came up with the 48-second clip by first generating images of the Roman Empire in Midjourney, then feeding them into Luma AI’s Dream Machine.
PRODUCTIVITY
5 AI Tools to Supercharge Your Productivity
✅ ElevenLabs Reader App: Choose a voice from an extensive library, upload any type of text content, and listen on the go.
✅ Question Base: An AI-powered autoresponder for Slack. It answers the most repetitive questions, so you don’t have to.
✅ Jellypod: Convert your emails and newsletters into a personal podcast.
✅ AITerm: An AI assistant that helps developers and command-line users directly within their terminal via natural language.
✅ Scoopika: An open-source developer platform to build personalized AI agents that can see, talk, listen, and more.
PS: Want more? Check out our Top 100 AI Tools.
* indicates a promoted tool, if any
AI-GENERATED IMAGES
Capybara Blues
Source: @unicorn0908 on Midjourney
Prompt: A realistic photo, capybara playing [insert musical instrument, like saxophone] in a [insert location here, like pub], the capybara's expression one of pure enjoyment, the saxophone's golden tones shining brightly, the pub's warm, ambient lighting casting a glow on the scene, filled with vintage decor and a cheering crowd, creating an atmosphere of joy and entertainment, Photography, shot with a Nikon D850 and a 35mm f/1.8 lens --ar 16:9
Acquire new customers and drive revenue by partnering with us
Superhuman is the world’s biggest AI newsletter for businesses and professionals with 600,000+ readers working at the world’s leading startups and enterprises. Companies like Amazon, Hubspot, and Salesforce feature their products in Superhuman. You can learn more about partnering with us here.