Blog

Digital Voice Agents: What, Why and How

Himanshu Gupta

May 2, 2022

Skit.ai Digital Voice Agents

Systems that can handle mundane tasks have existed for several years. But in the recent past, we have seen an uptick in conversational assistants such as Siri, Alexa, Google Home, and Samsung Bixby. These systems handle human conversations and respond in a human-like manner. In fact, it has become an internal part of our daily lives.

The speech and voice recognition market is expected to grow from USD 8.3 billion in 2021 to USD 22.0 billion by 2026; it is expected to grow at a CAGR of 21.6 % during the forecast period.

When it comes to CX, the always-on customers expect more when it comes to customer service. They need personalized and faster resolutions. They can no longer wait for minutes together to connect with an agent or navigate through complex IVR menus.

Solutions like voice bots are disrupting customer service as they promise the same level of customer experience as a human agent. Advanced AI-powered Digital Voice Agents can help CX leaders elevate their customer experience while reducing costs, thereby solving two of the biggest challenges faced by them on a daily basis. The solution is scalable and more efficient than other channels like email and IVR.

What is a Digital Voice Agent?

A Digital Voice Agent is a conversational robot (commonly known as a voice bot), that has the ability to interact with a user and take a certain set of actions in order to meet an end goal. It is very similar to voice assistants like Apple Siri, Google Assistant, Alexa we use on a daily basis.

But what’s the difference? 

Voice assistants are designed to handle one or two turns of the conversation to meet generic day-to-day goals.

Example of a single turn conversation

Digital Voice Agents, on the other hand, are designed to solve specific problems which require much more than two turns of conversation, just the way we humans solve queries by first asking multiple questions to understand the context and all the required information to solve any problem.

For example, a lost credit card is blocked by asking a series of standard questions: the first couple of questions to verify the caller, and the next set of questions to confirm which credit card to be blocked and then followed by an action where the customer is sent a new credit card. Typically, this is a 6-7 turn conversation that generic voice assistants are not designed to handle. Specialized voice bots are required to be trained to handle such tasks.

So, How does Skit’s Digital Voice Agent work?

Fundamentally, there are at least four components (engines) to any voice bot:

ASR (Automatic Speech Recognition): This converts the voice into text transcription. This is alternatively called Speech-to-text or STT Engine.

SLU (Spoken Language Understanding): This is the brain of the voice bot. It extracts intents and entities (data points) from the text sentence produced by ASR and then comes up with the best possible action. That action can be performed in terms of voice reply or sending a document or a text message, or transferring the call or raising a ticket etc.

TTS (Text to Speech): The block that translates the text into voice for generating a reply. 

Dialogue Manager (Orchestrator): The block that manages the flow of data among the above three blocks and the flow of the conversation.

All these processes happen in real-time and within milliseconds. This is only one turn of the conversation and this process gets repeated for subsequent turns.

All these processes are performed in the cloud after the voice packets are received from a user. So it doesn’t really matter which device the caller is using, whether it’s a smartphone or a feature phone or a wired telephone. Skit’s Digital Voice Agents leverage all these layers to seamlessly plug into contact centers and augment the work of human agents.

How are Digital Voice Agents different from Chatbots?

Technically, an AI-powered voice bot has two extra engines that a chatbot doesn’t need. Since chatbots do not deal with voice, the two engines related to voice (ASR and TTS) are not required. The text input is fed directly to NLU and the intents and entities are extracted and the response is synthesized in text format and relayed back to the user.

Furthermore, voice queries on call bring with it certain challenges like noisy backgrounds, different accents and dialects of speaking the same language, language disfluencies and unique way of adding filler words and pauses, barge-in by a person while the other one is speaking; all of which directly impact accuracy. 

And for the same reason, voice bots are much more difficult to build. Everything has to be real-time within milliseconds and there is little to no room for error, else communication experience is hurt.

What sets voice bots apart is that they’re faster. Voice is the quickest and most natural form of human communication—faster than typing or navigating drop-down menus with a mouse. It continues to be one of the most sought-after by end customers seeking support.

What are the common applications of Digital Voice Agents and how does it add value?

The key to improving customer service is not just automating cognitively routine communications, but augmenting human agents and freeing up their time. This creates great self-service options, increases customer satisfaction and makes your employees more productive.

At a broad level, a Digital Voice Agent can be used whenever businesses want to communicate with their customers en-masse. However, let’s make it simple for you. There are two types of business communications:

Inbound communication

This is when a customer tries to call a business to get their queries resolved. For example, to register a complaint, to activate or deactivate a service etc.

Companies have contact centres to resolve the customer queries where human agents are trained to resolve the customer complaints coming from various channels such as calls, emails, social media etc.

How does a Digital Voice Agent add value here?

Automate mundane support queries: It can automate the simple repetitive queries end-to-end such as knowing the account balance in case of banking, the status of the order in case of e-commerce etc. Your human agents can now move to solve more complex queries. So your average service levels will drastically improve as your customers will be served without any waiting time.

Reduce average handling time: For more complex queries, Digital Voice Agents help reduce the average handling time of the human agent by collecting basic tasks, for example, caller verification, collecting basic information such as order number etc that is mandatory for the human agent to solve the query. After performing the preliminary checks the call can be transferred to the human agent with the context of the query and data collected so far.

Outbound communication

This is when a business tries to reach out to customers for a variety of reasons such as lead qualification calling, welcome calling, reminder calls, renewal calls.

How does a Digital Voice Agent add value here?

Lead Qualification: Since the Digital Voice Agent is a scalable machine, it can reach out to thousands of prospects concurrently in real-time as soon as the prospect has shown interest in the product or service to gauge interest and thereafter transfer the call to live agent in real-time to convert the customer. In the case of semi-qualified leads, it can mark those and send them to nurturing workflows. Your human agents are only given the more qualified leads to work on and hence human agent productivity shoots multifold.

Reminder calling: The Digital Voice Agent can place the automated calls to your existing customers based on pre-defined triggers such as on the nth day of the month or if the payment is not received by this day of the month etc. It eliminates the need for human agents for such simple tasks. It can take a propensity to pay or renew, the date by which it will be done, objection & FAQ handling, the reason for non-payment etc.

” About 75% of companies plan to invest in automation technologies such as AI and process automation in the next few years. AI, chatbots, voice bots and automated self-service technologies free up call centre employees from routine tier-1 support requests and repetitive tasks, so they can focus on more complex issues.” (Source: Deloitte)

Broadly, various kinds of voice bots are among the most popular automation solutions, and are quickly becoming a must-have for any contact centre. Skit’s Digital Voice Agents take it up a notch by being able to forge seamless human-AI partnerships for contact center modernization and optimization.

What are Digital Voice Agents good at compared to humans?

On-demand Scalability: Humans cannot be replicated on-demand. When we want to add a number of agents in the contact center, it takes its own sweet time of hiring, onboarding, and training. And it has to be repeated for every single agent we hire.

Digital Voice Agents can be scaled up and down as and when required with marginal cost.

Economic & Reliable: Employing human resources for repetitive mundane tasks is costlier. There would be a high cost of hiring, training, retraining, associated with a higher churn rate. And that has to be done for every human resource we employ. Bots on the other hand need to be built and trained only once and the benefit of incremental learning and retraining is huge and available across the board.

We all know that machines are exceptional at performing repetitive tasks with high efficiency and high reliability. If a Digital Voice Agent is asked by a customer not to call during office hours or to call at specific times in future, it can do so without fail. Humans are not so good at it.

Available 24×7: Machines don’t get tired or complain either. Sad but true that they don’t have a family to go to or need time to sleep. So you can be available to your customers round the clock.

Looking up for information in a knowledge base: Digital Voice Agents can easily fetch information from a knowledge base for answering a wide range of support queries. 

Consistent learning and training at scale: Apart from using Artificial Intelligence for answering questions, Skit’s Digital Voice Agents also leverage different machine learning models and past conversations to automatically improve the quality of answers.


Resources

Blog

Here’s Why You’re Doing Feedback Collection Wrong

It doesn’t matter what industry you’re in: whether you work in retail, hospitality, banking, or any other industry selling products or services, you know how important it is to gather your customers’ feedback. Bad customer experience costs companies a lot of money—a study estimated that U.S. companies typically lose $75 billion a year due to […]

Learn More
Blog

Voice AI: The Magic Pill for All Major Debt Collection Challenges

Let’s begin by addressing the elephant in the room—the collection rates have dramatically fallen in the last decade. The State of Debt Collection 2020 Report reveals that in 2010, U.S. businesses placed $150 billion in debt with collection agencies, of which they could collect just USD 40 billion. On delinquent debt, the collection rates have […]

Learn More