Neurotechnology AI SDK

A multilingual system for building Speech-to-Text solutions.

Key features and capabilities

The Neurotechnology AI SDK is a proprietary Software Development Kit (SDK) built to provide developers with the tools to create Natural Language Processing-based solutions. The SDK includes two main components: an Automatic Speech Recognition (ASR) Engine responsible for accurately transcribing audio streams into text and a Speaker Diarization Engine that partitions an audio stream by different speakers.

Available as a software development kit for development on Microsoft Windows and Linux.

Automatic Speech Recognition engine

The Neurotechnology AI SDK includes a proprietary ASR engine which provides speech-to-text functionality for records in English, Lithuanian, Latvian and Estonian languages.

Speaker diarization

Process records with multiple speakers – the algorithm will recognize who and when is speaking in the record, and mark them in the text output.

High performance

Get fast, accurate results with our optimized engines. The SDK is built for versatile hardware options, whether you're using a standard CPU, a powerful GPU or an integrated accelerator, it functions at full speed.

Keep your data and infrastructure

On-premises deployment

Have complete control of your systems and environment as the SDK is built to run on your servers with no dependency on external services and infrastructures.

Privacy and security

Your data is in your hands only – no information is ever sent to third-party systems or external servers. All processing is done locally, therefore, your data remains fully private and secure.


Flexible system architecture

You can build stand-alone systems, which provide the functionality on a single machine, or make scalable client-server systems with higher performance to meet the demands of any project.


Flexible architecture and multi-platform support

Modular design

Use individual components, like the Automatic Speech Recognition engine or Speaker Diarization engine, on their own or combine them to build more complex processing pipelines. The modular architecture helps create adaptable applications, tailored to different industry standards.

Multi-platform support

The SDK supports Microsoft Windows and Linux platforms. It provides native libraries for Python, C++, Java, and .NET, making it easy to integrate into your existing systems and highly adaptable to various projects.

Applications

woman-with-headset

Governmental institutions

Transcribe meetings and create extensive, searchable documents that can help facilitate decision-making, contract creation and implementation of new regulations.

Call center and customer support workflows

Our SDK can turn customer-agent calls into comprehensive transcriptions with separated speakers. This helps to simplify sentiment analysis and improve quality assurance as you can get accessible information on your products and services hassle-free.

Media and news outlets

Process audio from podcasts, interviews and videos to create searchable archives. The Speaker Diarization Engine can separate the speakers and the ASR Engine can generate time-stamped text for subtitles or content indexing. High performance and fast processing of the ASR Engine enable the generation of captions for events, broadcasts, or online meetings.

Educational institutions

Build tools that can automatically create readable transcripts of lectures, exams and the like. Our technology not only converts audio to text, but also identifies each speaker. This makes it easier to follow the flow of dialogues and group conversations.

Functionalities - automatic speech recognition

Automatic speech recognition

The Automatic Speech Recognition (ASR) Engine is responsible for transcribing audio streams into text.

The tokenizer

The Tokenizer is a component that converts raw text into a structured sequence of tokens for various natural language processing tasks.

Functionalities - speaker diarization

Speaker diarization

The Speaker Diarization Engine partitions an audio stream by identifying different interlocutors, which is an important part of conversations with multiple participants.

Voice activity detector

The Voice Activity Detector (VAD) detects speech versus silence regions in an audio stream, which can be used to optimize processing.

RTTM object

The RTTM Object is a standardized data structure for storing the output of the diarization process, representing speaker-labeled time segments.

Pricing

  • Our pricing works as pay once, use forever
  • No annual fee

Click the button below to learn more about pricing on Neurotechnology’s official website.

Pricing page

Licencing

The Neurotechnology AI SDK is offered under a flexible licensing model that supports both product development and deployment. Licenses are perpetual, transferable, and cover a range of components for speech recognition and speaker diarization.

Click the button below to view full licensing details on Neurotechnology’s official website.

Licencing model