
6 Trending Speech to Text Applications
Discover 6 trending speech-to-text applications as NLP continues to boom and be harnessed by companies of all sizes across global industries.
Table of contents:
There are numerous speech-to-text applications that have surged in popularity recently, including Otter.ai, Microsoft Copilot and Descript. Here, we list 6 trending applications that use speech-to-text (STT) technology.
1, Otter.ai
Primary Use: Meeting notes transcription
Tech stack: Proprietary ASR (Automatic speech recognition) engine, NLP, AI summarization
Otter.ai is amongst the most popular speech-to-text applications on the market, with over 14 million registered users and over 1 billion meetings having been transcribed using their system. At its core, it is an AI meeting assistant that automatically transcribes and summarizes meeting notes.
Their tech stack includes a proprietary automatic speech recognition (ASR) system that when combined with natural language processing (NLP), can accurately summarize online meetings held on platforms such as Zoom, Google Meet and Microsoft Teams.
2, Microsoft Copilot
Primary Use: Office productivity tools
Tech stack: Azure Text to Speech and Speech to Text engines, GPT-based LLMs, cloud computing
When it comes to trending computer technology, Microsoft is often in the mix with their own innovative offerings.
Copilot combines Microsoft’s Azure Cognitive Speech Services with GPT-based language models to offer solutions designed to improve office productivity. The Azure engine features both text-to-speech and speech-to-text components.
Copilot is embedded within Microsoft 365, integrated straight into apps such as Word, Teams and Outlook.
3, Descript
Primary Use: Podcast and video editing
Tech stack: Whisper ASR, NLP, Neural Audio Editing
Descript entrenched itself as a near-essential tool for small content creators. The ability to edit audio like it is a Word document has revolutionized podcast and video editing, cutting down on the resources and skillset needed to produce top-quality content.
Descript is built on OpenAI’s Whisper model, an ASR system that was trained on 680,000 hours of multilingual and multitask supervised data collected from across the web.
OpenAI describes the architecture of Whisper as:
“a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and English-to-English speech translation.”
4, Google Recorder
Primary Use: Pixel-device transcription for notetaking
Tech stack: On-device ASR, Tensor Processing Unit, edge AI
Google describes the functionality of Recorder as enabling users of Pixel phones to “Share, play and search your audio”.
Google Recorder has been a strong player in the automated note-taking industry for a good few years, backed by useful features such as the ability to tell multiple speakers apart, even if they talk over each other.
As the ASR is on the device rather than over the cloud, audio can be transcribed even if the device is offline.
Not only that, but the data never leaves the device, making it ideal for privacy-conscious users who do not want their data uploaded and processed on a third-party cloud.
5, Rev.ai
Primary Use: Human-in-the-loop transcription
Tech stack: ASR API, HITL, machine learning
Rev.ai claims to be the world’s most accurate API for both AI and human-generated transcripts. Quite the bold claim, but backed up by a thriving customer base in industries such as the legal and financial world, where complete accuracy is the minimum requirement.
The Word Error Rate (WER) for rev.ai claims to be lower than their competitors, especially with regard to diverse ethnic backgrounds, nationality, gender and accents.
As well as transcripts, rev.ai offers a suite of NLP-powered insights, such as language identification, sentiment analysis, topic extraction, summarization, translation and forced alignment.
The security standards of rev are second to none, with the highest levels of data security accreditations such as SOC II, HIPAA, GDPR and PCI compliance.
6, Sonix.ai
Primary use: Transcription and translation for legal, professional and business purposes
Tech stack: Cloud-based ASR, NLP, Translation engine
Sonix.ai is a cloud-based ASR designed to process large audio files to provide accurate, searchable transcripts in 54+ languages. The platform has features such as word confidence scoring and custom dictionaries.
One of the most attractive features of Sonix is the ease of integrating the solution into a standard tech stack that includes tools from Adobe Audition through to Zapier and everything in between.
Some of the stated use cases of Sonix include but are not limited to:
- Court reporters
- Investigators
- Lawyers
- Journalists
- Researchers
- Video producers
- Podcasters
- Coaches
- Students
- Filmmakers
- Newsrooms
- Research firms
- Sales teams
NetGeist is here to help you build the next trending speech-to-text application
If you have the idea for the next big thing involving STT and need a specialist to make your idea a reality, NetGeist are here to help.
Our own STT solution primarily functions in the Lithuanian language. Business efficiency can significantly increase by processing and organizing long audio recordings. A wide range of Lithunian language styles and dialects are accurately processed by our STT solution, helping to massively improve customer service and overall business capabilities.
We always discuss new custom projects, helping companies like yours grow their operation through the integration of NLP. Whether you intend to roll out your new application to the general public, or have a domain-specific usage in mind for your organization, NetGeist specializes in creating custom NLP solutions tailored to your requirements.
Contact us to discuss your project.