Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

CSM-1B

Sesame AI's Conversational Speech Model (CSM) is a groundbreaking advancement in voice technology, designed to create natural, human-like conversations. Unlike traditional text-to-speech, CSM offers a "voice presence," launched in 2025, and is available on GitHub and Hugging Face. Built with a dual-transformer setup, CSM can deliver speech with emotion and context, boasting a rapid 500-millisecond response time. Trained on a vast dataset, it supports applications like empathetic customer support, dynamic language lessons, engaging AR experiences, improved accessibility for the visually impaired, and personalized podcast narration. The technology is open-source, allowing developers to explore its potential. A Python setup guide shows how to create a "Hello World" audio file using CSM, demonstrating its capabilities and encouraging further experimentation. CSM is positioned to revolutionize AI interactions across various fields.