Harshad Bajpai
August 10, 2022
Hello! Welcome to Skit.ai. Click here to book a demo.
August 10, 2022
You’re not sure about Voice AI, you have some doubts, and you need some guidance? Are you wondering what a Voice AI solution can do for your company or agency; which risks are involved; and will this technology help you get ahead of your competition?
This guide seeks to answer all of your questions about Voice AI.
This is a unique ebook designed to enable informed and quick decision-making for debt collection CXOs — a comprehensive step-by-step guide for CXOs in the debt collection space to explore Voice AI technology and understand its core capabilities and the qualities of an ideal vendor. Additionally, we’ve included a section detailing the entire implementation process, from ideation to execution and beyond.
The ebook is divided into three sections:
From its peak in 2009, consumer debt grew by $2.3 trillion to almost $14 trillion in 2019. In 2010, U.S. businesses placed $150 billion in debt with collection agencies but recovered a fraction, i.e., just $40 billion. The industry averages a 20% collection rate on delinquent debts, decreasing from 30% a few decades ago. Overall, the performance of debt collection companies seems to be facing major challenges.
Rapid changes in regulatory and customer experience expectations are taking place in the collection space and are posing serious challenges to collections agencies.
These issues ultimately result in lower collection rates and high collection costs.
Before we dive into how Voice AI solutions can help debt collectors, let’s understand the fundamentals of what a Digital Voice Agent is and how it works.
What is a Digital Voice Agent?
A Digital Voice Agent is an AI-powered conversational robot (commonly known as a voicebot) that has the ability to interact with a user and take a certain sets of actions to meet an end goal. It is very similar, but not the same, as voice assistants like Apple’s Siri, Google Assistant, and Amazon’s Alexa.
How is it different from voice assistants?
Voice assistants are designed to handle one or two turns of conversation to meet generic day-to-day tasks and are not designed to retain context longer than that.
Intelligent Voice Agents, on the other hand, are designed to solve specific problems which require much more than two turns of conversation, just the way humans solve queries by first asking multiple questions to understand the context and all the required information to solve a given problem.
For example, a lost credit card can be blocked by asking a series of standard questions. The first couple of questions are to verify the caller, and the next set of questions are to confirm which credit card should be blocked, and then followed by an action where the customer is issued and sent a new credit card. Typically, this is a 6-7 turn conversation that generic voice assistants are not designed to handle. Specialized voice AI agents are required to be built and trained to handle such tasks.
Digital Voice Agents sit on top of telephony and dialer systems. So apart from these two, fundamentally, there are at least five components (engines) to any voice bot:
ASR (Automatic Speech Recognition): This converts the voice into text transcription. This is alternatively called speech-to-text or STT Engine.
SLU (Spoken Language Understanding): This is the brain of the voice bot. It extracts intents and entities (data points) from the text sentence produced by ASR and then comes up with the best possible action. That action can be performed in terms of voice reply or sending a document or a text message, or transferring the call or raising a ticket etc.
TTS (Text to Speech): The block that translates the text into voice to generate a reply.
Dialogue Manager (Orchestrator): The block that manages the flow of data among the above three blocks and the flow of the conversation.
Integration Proxy: These are integration sockets that connects with CRMs, Payment gateways, Ticketing systems, etc in order for voice agent to be effective and efficient in end-to-end automation.
These processes happen in real time and within milliseconds. This is only one turn of the conversation and the process is repeated for subsequent turns.
All of these processes are performed in the cloud after the voice packets are received from a user. So it doesn’t really matter which device the caller is using—whether it’s a smartphone or a feature phone or a wired telephone. Skit’s Digital Voice Agents leverage all of these layers to seamlessly plug into contact centers and augment the work of human agents.
How are Digital Voice Agents different from chatbots?
Technically, an AI-powered voice bot has two extra engines that a chatbot doesn’t need. Since chatbots do not deal with voice, the two engines related to voice (ASR and TTS) are not required. The text input is fed directly to NLU and the intents and entities are extracted and the response is synthesized in text format and relayed back to the user.
Furthermore, voice queries on call bring with it certain challenges like noisy backgrounds, different accents and dialects of speaking the same language, language disfluencies and unique way of adding filler words and pauses, barge-in by a person while the other one is speaking; all of which directly impact accuracy.
And for the same reason, voice bots are much more difficult to build. Everything has to be real-time within milliseconds and there is little to no room for error, else communication experience is hurt.
What sets voice bots apart is that they’re faster. Voice is the quickest and most natural form of human communication—faster than typing or navigating drop-down menus with a mouse. It continues to be one of the most sought-after by end customers seeking support.
What is an IVR?
Interactive Voice Response (IVR) is an automated phone routing system that interacts with callers and gathers information by giving them multiple choices via a menu. The system then performs actions based on the answers of the caller through the telephone keypad, which is also called DTMF (Dual Tone Multi Frequency).
IVRs are used by companies or contact centers to route calls based on the choices made by the caller in order to organize call queues of call centers. Through the caller’s selection, the system can determine if the caller wants to contact the billing department, the technical support team, or simply wants to talk to a human operator.
IVR in its backend is a top-down tree structure in which input from user determines which downstream node the call will flow to. End of the node can be either human agent transfer node or self-serve node. In case of self-serve node, a pre-recorded message is fetched from the database and played, for example, in account balance enquiry node, a pre-recorded message with be played along with a variable value, in this case fund balance.
IVR is also used to provide information like promos, updates, or other important information or instructions. One example is to inform callers that the system will record the call.
Lately, IVR providers have come up with voice response instead of DTMF. For example, to reach the billing department, the caller has to say “billing” instead of pressing a key on the the phone. This works on keyword matching. However, if caller utters a long sentence and doesn’t include the relevant keyword, IVR would not be able to recognize the input.
Typically, an Outbound IVR (Interactive Voice Response) is also used to reach out to a large number of customers in a personalized manner using different interaction channels, such as voice messages. The most common use cases are feedback, promotions, announcements, reminders, etc.
Robocaller or outbound IVR has essentially two components in it: (1) a dialer capability and (2) a text-to-speech engine (Advanced Outbound IVRs) or a recorded voice message (Robocaller). Businesses can upload thousands of contacts to the dialer and configure certain parameters such as number and time of retry attempts, time of call etc. The dialer calls these contacts and plays a voice message which consumers can listen to. At the end of call, the consumer can provide keypad based number input to listen to the message again and perform other tasks.
Limitations of IVR
In the 1990s this technology was a game-changer and led to a significant improvement in efficiency. However, today this system is ineffective and unnecessary, to say the least.
Even the most sophisticated outbound IVRs ail from persistent challenges as enumerated below:
IVRs, even at their best, do not contribute to CX or major productivity gains, whereas a bad IVR experience can prove very costly. The State of IVR in 2018 noted that 83% of customers would avoid a company after a poor experience with an IVR.
The more pressing problem still remains:
“How to automate the mundane, repetitive and non-value additive tasks human agents are doing”
For a long time, we did not have an answer, or we did not have a commercially viable technology solution, but today we have, and it is Intelligent Voice AI Agent.
Digital Voice agents are AI-powered virtual agents that allow customers to converse intelligently, without having to punch 1,2,3,4 on their screen to hold meaningful contextual conversation. It is able to converse with your consumers just like your human agents.
It is capable of understanding, interpreting, and then analyzing conversational voice input expressed by an individual and responding to them in an everyday language.
A Virtual Voice Agent goes beyond understanding words, and determines what the consumer is saying based on underlying semantics, without relying on specific keywords. Using machine learning, a Virtual Voice Agent is continuously improving itself and the customer experience.
Debt collection is not a simple industry. It is heavily regulated and involves a whole gamut of laws, which keep on changing. Additionally, it’s affected by the pressure to cut down on costs for the collection agencies.
For the first time, there is a technology that answers most of the challenges faced by debt collections agencies. Still, incorporating this tech presents its own set of risks.
Being experts and experienced in the debt collection space, we at Skit.ai have outlined a guide that helps CXOs understand what capabilities to look for when selecting and evaluating a Voice AI vendor.
Look for these core capabilities as you decide how to transform your debt collection business with Voice AI.
A voice technology company can have an impressive tech stack but may still not be suitable for you if they lack domain or industry expertise. They need to understand the nuance of the business and the consequences of conversations, reach out, and promises.
Why is it important?
A deep knowledge and understanding of business operations and processes in the collections space is essential, because debt collection is a complex, heavily regulated industry. Lack of knowledge is not only risky from a compliance standpoint; it can also hinder the creation of intelligent and intuitive conversation designs.
Designing a DVA is as much an art as it is a scientific and technical process.
The conversation with a consumer will be drastically different for a debt which is 30 days old compared to the one that is 5 years old, consumers might not remember the debt or card after some time. Conversation design will drastically change on various factor such as:
If these factors are not considered, the end product would be suboptimal and end consumer will drop out of the conversations.
Consequences of lack of expertise in the area
Here are some of the issues you are likely going to run into if your Voice AI provider does not meet the aforementioned standards:
You should expect your Digital Voice Agant to have the capability to deliver end-to-end automation. In other words, they must have the capability to handle calls from start to finish without the help or intervention of a human agent.
Why is it important?
These days, AI-powered Digital Voice Agents should be capable of handling conversations end-to-end. It would be limiting to use DVAs only for call routing and to identify right-party contacts and transfer calls to human agents.
On average, 70% of customer requests fall into the tier-I bucket; this means that a Voice AI agent must be able to automate, End-to-End, a majority of calls.
This is the most vital capability of a Voice AI solution as entire value creation, productivity enhancement, and business performance rest on it.
Imagine the kind of value that can be created by taking away more than 70% of frustrating calls your human agents are handling.
Here is a list of a few capabilities that augment End-to-End Automation:
Consequences of lack of expertise in the area
What happens if the vendor you are speaking with does not have a high-end-to-end automation capability?
Impact on scalability: We know that maintaining a large human agent team is a painful task. The highest attrition rates, not only make it an operational hassle but also escalate the costs to retain them, and keep them engaged and satisfied. With End-to-End Automation capability, Voice AI technology is minimizing your reliance on human agents. You do not need to recruit more when call volumes surge, nor do you need to have a larger team if you want to deal with a bigger portfolio of delinquent accounts. Let’s compare to make the point crystal clear:
A platform approach has its typical advantages. Cloud-based modularity makes enhancements and tweaks very easy.
Why is it important?
A platform gives visibility into the system, and for many elements, the adopting company can have the option to tweak things such as conversation flows to better voice agent performance. Additionally, it is easier to deliver upgrades and enhancements collaboratively and transparently.
For instance, Skit.ai offers access to the Skit studio platform, which gives its clients a comprehensive view into how things are moving along. This makes the entire BTDME — build, test, deploy, monitor, and enhance — journey significantly smoother.
Having a user-friendly platform also helps with the integration of third-party applications such as payment gateways, CRM, and other business applications. In the long run, these capabilities can be the difference between winning and losing.
Consequences of lack of expertise in the area
The lack of a platform converts the Voice AI solution into a black box. You have no idea about its functioning, and you will depend on your vendor for everything. This will not only elongate the enhancement process but will also make it costly.
More often than not, time is everything. Consider the damage a wrong information-based conversational flow can do if not updated in time. The compromise on agility is severely debilitating for any company sensitive to CX and changes in consumer behavior.
Everyone in the debt collection space is aware of Reg F. and the challenges it posed to debt collection agencies as they work to understand the implications and ensure proper compliance. If your vendor does not have the required knowledge and expertise on compliance and regulations, the consequences can be problematic for your agency.
Why is it important?
Leaving alone the increasing fines and penalties imposed by the regulators way more significant are getting involved in lawsuits and court battles.
Companies must seek a vendor who knows the law in and out. Considering the direction of regulations going stringent by the year, the significance of expertise in this area can not be hyperbolized.
Various tasks such as data scrubbing are difficult for a human agent but a breeze for Voice AI and can prevent a potential lawsuit. Furnishing statutory information such as Mini Miranda or relating to other laws is easy for voice AI agents, but your vendor must have the in-depth expertise to train the voicebot for it.
Consequences of lack of expertise in the area
There are two significant disadvantages if your vendor lacks in this area:
Business Performance: Faltering at one regulation, or one lawsuit puts the entire company on a backfoot and triggers introspection which slows down the entire business.
Read this whitepaper by Mike Frost to read more about compliance for DVA.
Looking into MLops, capabilities are essential as they have a lasting impact on the performance and competitive edge.
Why is it important?
At the core of Voice AI lies the capability of the algorithms to learn and improve as more and more conversations are fed into it.
The more extensive this capability, the more robust will be the learning gains, and the ability of the system to improve the conversations.
Consequences of lack of expertise in the area
The absence of AI/ML or only feeble attempts at it has severe consequences because as companies who are updating their AI/ML models, regularly feeding more and more data will create superior conversations, and will augment their capability to handle conversations.
This means having a proprietary technology stack and not relying on open source technologies.
Why is it important?
A score of reasons are there for you to look for proprietary technology.
Consequences of lack of expertise in the area
Lack of tech ownership has many negative consequences. It slows down the entire process. Also, your vendor will not have control over the process because it is using many third-party integrations, and failure at one will cause the failure of the entire process.
In essence, the entire experience is compromised because of inferior performance if the vendor does not have ownership of the core tech stack. Every company uses integrations, they are the best ways to scale capabilities, but it should not be the case for the core tech stack.
A unified view of the entire process and the ability to analyze and have actionable insights.
Why is it important?
Every conversation is a potential treasure trove of value. Companies must not waste such valuable resources and an ideal vendor must possess the capabilities to draw insights from data such as dispositions.
Look for capabilities such as bucketing dispositions into meaningful buckets, forwarding disputes to select departments, and more.
A dashboard to monitor the effectiveness of conversations is an essential feature. Also, analysis of AHT trends and more are a must.
Consequences of lack of expertise in the area
We can not improve that which we can not measure. Not having the capability to run analytics will impact business performance improvements and will lead to competitive losses.
In this section of our guide, we’ve compiled a list of essentials to help your company properly onboard your chosen vendor and implement their Voice AI solution for debt collection.
In order for the DVA to be effective, you will have to share a lot of information for your vendor to be able to understand the consumer persona. Always sign an NDA before sending any documents or sensitive information.
Ensure to have a focused approach to incorporate the Voice AI from the very beginning of the process. A steering committee can have a mix of expertise from technology to business, operations, and HR.
This is of serious importance. Lean means that your pilot should be undertaken in such a way that your organization gets disturbed in a minimal manner. Avoid unnecessary integrations that will increase the load and complexity of the pilot and can affect the results in a complex way. Also keeping it lean will minimize your and your team’s involvement so that your sunk cost in terms of time investment is low if the project goes south and doesn’t bear the fruits.
Going all out is not the best strategy here. Segment the portfolio you are handling in terms of volume and value. Prioritize 2-3 different segments for the pilot and provide representative call recordings for your vendor to understand the consumer persona. Also help your vendor with call dispositions i.e. different kind of flows your typical calls end up in, for example, percentage of calls that are wrong party, debt dispute, cease communication requests etc. This will help your vendor plan the development strategy.
The Voice AI agent will be as good as the information you feed it. It is essential that you provide to the vendor all the essential information, e.g. if you have 12 types of customers, then provide the audio recording of each type of customer. Failing that will result in poor conversation flows that are designed for only a few types of customers.
Additionally, the number of files shared is also important to help in the training of the voice agent. It is best if you share actual conversations in large volume so that it makes ML models better.
After reviewing the call recordings, your vendor should be able to come up with the conversational design, call flows, and scripts. Once your vendor is ready with conversation designs and flows, it is crucial that specialists from your organization review and help them refine the those. This step will have a lasting impact on DVA performance.
A lot of people delegates the UAT (User Acceptance Test) tasks to junior resource or ignore all together. It’s the worst mistake to make especially in the debt collection space where one small mistake can be costly. It’s important to stress-test the DVA built by the vendor before deploying and rolling out for customers.
You can pilot on 100 calls per day for a week and decide to go for the full-scale implementation. However, for an AI solution, 100 calls are not a representative enough sample, especially for debt collection applications. In case of outbound, 80% of the call might go unanswered, so you will be left with 20% of the calls to test the bot. If you pilot on 20 calls per day for 5 days, you have piloted only on 100 calls, which might not be a bog enough datapoints to base your decisions on.
At Skit.ai we recommend at least 10,000 calls/day for about 4 weeks.
You must run an ROI exercise, to understand what quantum of value the Voice AI solution will create for your company before moving any further.
This exercise must be done for one year period, ideally for 2-5 years. The variables involved are simple – call volume, cost of the human agent, cost of deploying voice agent, number of integrations, inbound/outbound, call complexity, and deployment type. Your vendor should be able to provide you with notional value creation/cost savings.
Value creation is not as simple:
You may choose to factor in direct and indirect benefits out of voice AI deployment.
A lot can go wrong here, so it’s better to be aware of the risks of lack of proper technology architecture planning.
Be clear about the call volumes you expect over the years because you need to assess the supporting tech infrastructure around it. Relevant integration, legacy telephony assessment, CRMs, gateways, and more must be assessed and optimized for minimum human interventions and sufficient to last the planned phase.
It must be duly noted that running a Voice AI solution is a process, a continuous journey filled with improvements and upgrades. In order to sustain and be further along the learning curve, training the Voice AI solution on new data is vital.
New use cases, business verticals, customer regulations, and more — we live in a dynamic world, and constant effort to innovate the voice solution are essential for being at the top of the game and beating the competition.
It is essential to assess a voice solution in granular detail before moving forward with it. We hope this guide will help you in your buying journey.
For more information and a free demo, you can schedule a call with one of our collections experts. We’ll be happy to help!
Customers dislike long wait hours for query resolution and chatbots aren’t suitable for emergency requests. To ensure better services, Voice AI-led solutions work best. In 2020, the University of Texas at Austin conducted an interesting experiment wherein 200 participants were invited to reconnect with an old friend through either a phone call or email. Despite […]