Types of Speech Recognition

by marketing team | Sep 30, 2024

There are two types of automatic speech recognition: Grammar ASR and Transcription ASR.

Grammar ASR uses a closed set of rules (a grammar) that includes all possible inputs from the user. Think of the audio phone trees that give you several options to direct your call to the right department.

Grammar ASR doesn’t need to understand the universe of possible words and numbers. The user is prompted with a closed set of options: “say 1 for accounts receivable or 2 for all other inquiries.”

With grammar ASR, it’s possible to achieve an extremely high level of accuracy because there is a limited number of possible choices for each utterance, which reduces the probability of error.

In grammar ASR, a good engine can achieve upwards of 96% accuracy, while a great one can reach 98 to 99% accuracy, or a Word Error Rate (WER) of under 2%.

Transcription ASR

Transcription ASR is much more challenging. In transcription ASR, the engine has to recognize every possible word, in every available dialect.

Achieving high levels of accuracy in Transcription ASR requires a language model that covers every regional dialect. It is a massive data problem.

Many of the leading Transcription ASR engines have been mired under 90% accuracy for years.

Recently there have been stunning breakthroughs in transcription ASR thanks to deep neural networks. DNN can achieve transcription ASR accuracy of over 90%, or a WER of less than 10/100.

The key to understanding the two types of speech recognition is that one is not necessarily better than the other. It depends on what you want to do with it.

Grammar ASR vs. Transcription ASR

Grammars ASR tends to be the best choice for Interactive Voice Response systems, or IVR.

If you have a limited set of options you want to give a caller, you’ll want to use grammar ASR.

If you want to use automatic speech recognition to transcribe live or recorded audio, the type of speech recognition you’ll want to use is Transcription ASR. Voicebots also use Transcription ASR.

Transcription ASR will be able to accurately recognize and transcribe words from the entire language.

Speech recognition and regional accents

If your system needs to understand a range of dialects and accents you should use a Transcription ASR engine that uses deep neural networks to train on large, all-encompassing data. Such an engine should handle the variability without placing limits on the number of ways a word can be pronounced. This approach will be more efficient for your business, as opposed to deploying multiple languages to account for every dialect.

Whether you need grammar ASR, transcription ASR, or a hybrid of these two types of speech recognition, LumenVox by Capacity can help. Book a free consultation to discuss which type of speech recognition is right for your business.

Cookie	Duration	Description
__tld__	session	Description is currently not available.
_cfuvid	session	Description is currently not available.
_no_tracky_101397840	1 hour	Description is currently not available.
rl_session	1 year	Description is currently not available.
ubpv	6 months 1 day	No description available.
ubvs	5 months 27 days	No description available.
ubvt	3 days	No description available.
VISITOR_PRIVACY_METADATA	5 months 27 days	Description is currently not available.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
INGRESSCOOKIE	session	This cookie is used for load balancing and session stickiness. This technical session identifier is required for some website features.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
visitorId	1 year	ZoomInfo sets this cookie to identify a user.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.

Intelligent Virtual Agents

Agent Assist + Live Support

Campaigns + Workflows

Conversational AI

Insights + Analytics

Security + Integrations

Increase Deflections

Reduce Handle Time

Increase Conversions

Automate Processes

Chat

Email

SMS

Voice

Web

Answer Engine

Coach

Cobrowse

Helpdesk

LiveChat

Knowledge Base

Monitoring

Recorder

Replay

Sites & Articles

Suggestions

Automations

Dev Platform

Workflows

Campaigns

CPA

CRM

Scheduling

Surveys

Payments

Industry

Use Case

Team

Contact Centers

Customer Support

HR & Ops

IT Support

Sales & Marketing

See all

Automotive

Beauty

BPO

CPG

Retail/Ecommerce

Education

Banking/Credit Unions

Insurance

See all

Authentication

Benefits Administration

Call Coaching

Call QA

Campaigns

Email Automation

Employee Onboarding

Intelligent Voice Assistant

Lead Generation

Tech Support

Blog

Events

Guides

Support

Videos

Webinars

Your competitors are automating. Are you?

About Us

Careers

Contact

Ethics

Legal

Newsroom

Partners

Who is Capacity?

Types of Speech Recognition