how technology helps businesses find their voice


Share post:

75 / 100

Virtual assistants, speech analytics in call centers and collection of voice biometrics have become common tools for businesses.Xu Techs has compiled a list of solutions that would be impossible without speech technologies

Businesses and users are increasingly turning to developments based on speech recognition and synthesis technologies for help.

Companies create their own branded voices and use voice assistants to improve the quality of service, and ordinary users ask robots for help in everyday matters.

According to Brandessence estimates , the global market for conversational artificial intelligence in 2023 is $8.2 billion, and by 2028 it will grow to $32.5 billion. The growth in consumption of these technologies is mainly associated with the emergence of solutions from a related segment.

Svetlana Safronova, executive director, head of b2b communications department at SberDevices, identified five main trends in the development of voice technologies.

Voice robots become more empathic and learn to recognize emotions to build deeper dialogue.

Companies strive to create a unique, memorable voice that represents their brand.
Recognition improves—sound can now be recognized over interruptions, background noise, and spontaneous speech from multiple people.

Recognition technology has begun to help companies track important business metrics. For example, certain ML models based on speech recognition help predict the Customer Satisfaction Index (CSI) based on the outcome of a call.

Cloud server placements for speech technologies are becoming more relevant due to the shortage of servers and their rising costs.

Together with an expert, we figure out what solutions and tools would be impossible without speech technologies.

Voice assistants

These are the most popular products based on speech technologies. Businesses use virtual assistants to automate communication with clients, while ordinary users use them to obtain the necessary information, navigate applications, and control a smart home, speakers, or car. The most famous are Siri from Apple, Alexa from Amazon. On the Russian market, the most popular virtual assistants are “Salute” from Sber, “Marusya” from VK, “Oleg” from Tinkoff and “Alice” from Yandex.

All responses from voice assistants can be divided into two types: scripted and free communication. In the first case, the assistant launches one of the pre-defined conversation scripts and understands what information he needs to provide. In the format of free communication, the assistant responds with the most appropriate phrase on the topic, that is, simply maintains the dialogue.

Voice assistants can also make automated calls, including cold and warm calls. As a rule, such formats are used by companies from areas where outbound calling of clients is needed, including from the construction, retail, telecom and finance industries.

Speech analytics at touchpoints

With the rise of voice technology, it has become clear that it can be used to monitor and extract data from incoming and outgoing calls. This is how speech analytics appeared, the main task of which is to analyze the communication of operators with customers and, as a result, improve the quality of service.

Now this works most often in post-processing mode, that is, the analytical results are available after some time, but there is a trend for the technology to evolve and work online, in real time or close to it.

Call centers with more than 50 operators are in greatest need of speech analytics. It allows you to find out the reason for the client’s request, his attitude towards the product and the company, and also ensure control of compliance with service standards and the offer of cross-products and promotions by the operator.

Modern speech technologies make it possible to analyze not only vocabulary, but also emotional characteristics, speech speed, interruptions, and other things.

“Our SaluteSpeech Insights technology automatically analyzes customer satisfaction without requiring them to spend time evaluating agent interactions. Moreover, this technology allows you to analyze all 100% of requests, and not just those 5–10% where the client agreed to leave his rating. It also helps prevent employee burnout by monitoring the index of customer satisfaction and operator emotions in dialogues,” said Svetlana Safronova.

Pick-by-voice, or voice control in the warehouse

This technology is popular in all large warehouses where it is necessary to simplify the search and work with goods. Its task is to free the hands and eyes of warehouse operators from most of the work with a mobile data collection terminal.

The pick-by-voice technology works as follows: an employee receives voice messages from the warehouse management system and answers out loud – for example, he calls the last digits in the barcode and the number of units of goods.

The retailer Magnit introduced pick-by-voice technology in its logistics complexes in 2020 . The voice assistant began helping employees collect orders for stores. In April 2023, the retailer X5 Group announced the development of similar technology at distribution centers .

Voice biometrics

With the development of technology, various areas of business have developed a trend towards the use of biometric identification – by face, fingerprint and voice. Voice biometrics are most often used by call centers of banks and insurance companies in order to quickly recognize customers when they call from numbers not registered in the system, and to protect themselves from fraudulent transactions with bank cards.

The process of identification by voice occurs as follows: upon first contact, the client undergoes personal identification, and a voice cast is built based on the recording of his voice. If the client applies again, his voice is compared with the saved voice casts. Among the largest Russian developers of technologies that use voice for identification are 3iTech, BSS and TsRT.

Branded voices

Artificial intelligence generating sound has made it possible to create a unique voice for a particular brand. For example, VTB introduced its own voice for digital communications with clients, based on SpeechKit Brand Voice technology from Yandex, in 2022 .

The Em.V avatar from M.Video has a unique sound – it is used to communicate with a young audience and for experimental marketing projects. SberDevices offers a branded voice development service – SaluteSpeech YourVoice .

Companies can choose a ready-made voice from a catalog of more than 80 female, male and children’s voices already loaded, or create their own in just a month – this requires only three hours of voice-over work in the studio. Using a selected or created voice, you can synthesize text of any volume and complexity.

Speech synthesizers

Speech synthesis technologies (text-to-speech, TTS) allow you to voice any text in a given voice. The speaker records various specially selected texts in the studio for several hours, and the resulting phonograms are used to train ML models.

Usually there are several of them: an acoustic model, a vocoder, as well as a number of auxiliary ones – for example, for predicting pauses, intonations and question words.

“Controlling intonation is very important: sometimes the meaning of a sentence and the effectiveness of user interaction depend on the wrong pause or the wrong question word. But no matter how good the models are, sometimes manual control is required.

For example, in the phrase “will you pay tomorrow?” the user’s response depends on the semantic emphasis on the words “pay” or “tomorrow”. Our technologies allow us to emphasize the correct word in a question phrase using simple SSML syntax,” shared Svetlana Safronova.

Advances in technology have reduced the amount of audio data required to train voice models from tens of hours to just minutes. Due to this, the process of creating new voices became cheaper, they also learned to speak in a certain style – in a whisper, joyfully or angrily.

Synthesis technology helps to voice content, such as texts in the media , e-books, instructions and navigation elements on websites – online chats, product descriptions and instructions, creates subtitles and allows the virtual assistant to speak.

Sound recognition services

Automatic Speech Recognition (ASR) converts human speech into text using AI algorithms and machine learning. Thanks to it, virtual assistants translate your voice question into text for further processing. The same technology is often used when searching for services or products on websites for people with vision problems.

“ASR can be integrated into various IT systems for recording meetings,” adds Svetlana Safronova. “SaluteSpeech speech recognition technology is already used in our video conferencing service SberJazz: meeting participants can see a transcript of the conversation in real time in the chat and download the full text of the conversation to their device.”

Tools for converting speech to text are used, for example, in Google Docs and Google Keep. Other services include Whisper from OpenAI, Russian Teamlogs and Aiko .

Voice controlled games

Speech technologies have also penetrated the gaming industry. They made changes to the mechanics of character control, where the user needs to whisper, speak, or even shout. The same technology is used in the game Dead Island 2, where voice commands can be given to the character using the Alexa Game Control function .

What hinders the development of speech technologies

Svetlana Safronova highlighted three problems that the industry is currently facing: lack of computing resources, lack of data and personnel.

Computing resources are needed to train neural networks on big data. “We have an advantage in this matter – access to the supercomputers of our partners, which we use in our work. The already trained model needs to be launched with sufficient performance – this again requires modern servers.

To cope with this task, developers create lightweight versions of models that lose a little quality, but they can be run on servers that are more accessible to customers. There is also a separate broad area of ​​embedded technologies; they are launched on “smart” devices with even more stringent restrictions on performance and power consumption, Svetlana said.

The lack of data for speech technologies is explained by the fact that voice recordings are usually confidential information, the expert emphasizes. To get the best quality of service, you need recordings in the conditions in which voice assistants will be used.

For example, an information desk on the street, telephony, communication with voice assistants at home – for each of these conditions you need to collect suitable data. Records from open datasets, in turn, do not always have high-quality markup.

There is also a shortage of highly qualified specialists who can work with speech technologies. Incumbents have to find people with expertise in related fields, such as word processing and computer vision, and train them.

How the industry will develop

In general, interest in voice assistants in Russia may increase exponentially in the next three years. Now in the Russian speech technology market the entry threshold is decreasing and competition is increasing, and this is an important driver for further development, noted Svetlana Safronova.

Simplified no-code products with the simplest possible integration have also appeared. “This means that the winner will be the one who offers the market specific, convenient solutions that are quickly implemented into business processes and are easy to use.

Businesses have matured trust in speech technologies. As a result, the demand for these developments has reached a stage where they are beginning to be actively implemented and used. At the same time, the cost of implementation is decreasing; in the next three to five years, medium and small businesses will become significant customers of speech technologies. The list of solutions will also expand, especially for work processes,” said Svetlana Safronova.


Please enter your comment!
Please enter your name here

Related articles

The countdown has begun for Google I/O 2024: Here are the innovations expected to be introduced

The Google I/O 2024 event is expected to take place on May 14. Innovations coming to Pixel 8a, Pixel...

Google Launches Artificial Intelligence Tool for Users to Practice English

Google is testing a new “Speaking Practice” feature in Search that helps users improve their spoken English skills . The company...

Shopify review: The #1 e-commerce software in 2024?

Shopify is clearly the most complete e-commerce software on the market. No matter your goals, if you simply...

Webflow vs Framer – Which visual development tool is best for your website?

Webflow vs Framer in brief Webflow is ideal for designing complex websites, while Framer is perfect for creating mobile...