CN114390144A

CN114390144A - Intelligent processing method, device and control system for voice incoming call

Info

Publication number: CN114390144A
Application number: CN202111595354.1A
Authority: CN
Inventors: 乔素林; 吴钟健; 唐雪
Original assignee: Huayun Tianxia Nanjing Technology Co ltd
Current assignee: Huayun Tianxia Nanjing Technology Co ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-22

Abstract

The application relates to a voice incoming call intelligent processing method, a voice incoming call intelligent processing device and a voice incoming call intelligent processing control system, wherein a user-defined voice technology template is obtained and configured on a cloud intelligent voice platform; obtaining incoming call information meeting a preset scene, and transferring the incoming call information to the cloud intelligent voice platform; automatically recognizing the incoming call information through the cloud intelligent voice platform to acquire voice recognition information; and intelligently matching the voice recognition information with the voice dialect scenes in the custom dialect template, and selecting the matched dialect for response. The intelligent personal telephone assistant can carry out call forwarding by adopting a cloud intelligent voice platform and carry out intelligent configuration of the telephone operation, does not need to carry out complicated voice stream butt joint with each large operator server, saves the butt joint cost and provides the intelligent telephone assistant with low cost for individuals.

Description

Intelligent processing method, device and control system for voice incoming call

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to a method, an apparatus, and a control system for intelligently processing a voice call.

Background

The technical scheme of the existing incoming call intelligent processing method and system mainly comprises the following steps: transmitting the call content from the operator server to the voice server in the form of media stream; analyzing the conversation content in the form of the media stream; the voice in the media stream is transcribed into a text format; analyzing the content in the text format to obtain a semantic result; node skipping is carried out according to the semantic result and preset logic, and a response result instruction is transmitted to the voice server; the voice server inquires a corresponding voice file according to the response result instruction, the voice file is transmitted to the operator server, and the operator server plays the voice file to the user side, so that the robot answers the user.

In the prior art, the intelligent response function of the voice assistant is realized mainly by adopting a mode of interacting with an operator server, the dependence on operators is strong, and the intelligent response function of the voice assistant is in butt joint with three operators, so that the intelligent response function of the voice assistant has great limitation; the voice server inquires a corresponding voice file according to the response result instruction and broadcasts the voice file, so that the requirement that an individual user freely customizes a response scene is difficult to meet; in addition, the voice server is relatively high in installation and maintenance cost.

Disclosure of Invention

In view of the above, the present disclosure provides a method, an apparatus, and a control system for intelligently processing a voice incoming call, where a cloud-end intelligent voice platform is used to perform call forwarding and intelligent configuration of a telephone, and an intelligent personal telephone assistant does not need to perform complex voice stream docking with each large operator server, so as to save docking cost and provide a low-cost intelligent telephone assistant for an individual.

According to one aspect of the present disclosure, a method for intelligently processing a voice incoming call is provided, which includes the following steps:

s100, obtaining a user-defined voice template, and configuring the user-defined voice template to a cloud intelligent voice platform;

s200, obtaining incoming call information meeting a preset scene, and transferring the incoming call information to the cloud intelligent voice platform;

s300, automatically recognizing the incoming call information through the cloud intelligent voice platform to acquire voice recognition information;

s400, intelligently matching the voice recognition information with the voice dialect scenes in the custom dialect template, and selecting the matched dialect to answer.

In a possible implementation manner, optionally, before step S100, the method further includes:

s001, calling the cloud intelligent voice platform interface through a user terminal to authenticate a user identifier;

s002, recording and storing the user identification after authentication;

and S003, matching the operator according to the user identifier, and performing operator call forwarding configuration on the user identifier based on a matching result.

In a possible implementation manner, optionally, the method further includes:

s004, calling a default dialect template prestored in the cloud intelligent voice platform through the user terminal;

and S005, performing the dialect editing processing on the default dialect template to obtain and store a user-defined dialect template.

In a possible implementation manner, optionally, in step S005, performing a speech editing process on the default speech template to obtain a custom speech template, and storing the custom speech template, includes:

s0051, calling the default speech template, and performing speech editing processing on the default speech template, wherein the editing processing mode is at least one of semantic addition, semantic deletion, semantic insertion or semantic adjustment;

s0052, based on the cloud intelligent voice platform, performing text configuration on the voice template after the voice editing processing to obtain text voice and listening;

s0053, auditioning the text voice, determining a final phonetics template, and storing the final phonetics template as a self-defined phonetics template.

In a possible implementation manner, optionally, in step S300, performing automatic voice recognition on the incoming call information through the cloud-end intelligent voice platform to obtain voice recognition information includes:

s310, receiving the incoming call information through the cloud intelligent voice platform;

s320, matching and selecting the intelligent voice robot according to the user-defined phone skill template, and acquiring voice information of the incoming call information;

s330, according to the voice recognition technology, the voice information is recognized and processed, and the voice recognition information is obtained.

In a possible implementation manner, optionally, in step S400, the intelligently matching the speech recognition information with the speech technology scenes in the custom speech technology template, and selecting a matching speech technology to respond, includes:

s410, matching the voice recognition information with word slots in the custom tactical template, and calculating and obtaining slot values;

s420, presetting a word slot threshold, and comparing the slot value with the word slot threshold to obtain a comparison result;

s430, according to the comparison result, obtaining a response template meeting the word slot threshold value from the user-defined word operation template, and carrying out intelligent response according to the response template.

In a possible implementation manner, optionally, in step S400, the intelligently matching the speech recognition information with the speech technology scenes in the custom speech technology template, and selecting a matching speech technology to respond, further includes:

s440, continuously recording response information through the cloud intelligent voice platform;

s450, performing text conversion on the voice in the response message in a text processing mode to obtain text information;

and S460, sending the text message and the response record in the response message to the cloud intelligent voice platform, and forwarding the text message and the response record in the response message to the user terminal through the cloud intelligent voice platform.

According to another aspect of the present disclosure, a device for implementing the above-mentioned intelligent voice call processing method includes a custom phone template configuration module, a call forwarding module, a voice recognition module, and a matching response module, wherein:

a custom tactical template configuration module: the system comprises a cloud intelligent voice platform, a user-defined voice template and a user-defined voice template, wherein the cloud intelligent voice platform is used for acquiring the user-defined voice template and configuring the user-defined voice template to the cloud intelligent voice platform;

the incoming call switching module: the system comprises a cloud intelligent voice platform, a cloud intelligent voice platform and a server, wherein the cloud intelligent voice platform is used for acquiring incoming call information meeting a preset scene and switching the incoming call information to the cloud intelligent voice platform;

a voice recognition module: the cloud intelligent voice platform is used for automatically recognizing the incoming call information through voice to acquire voice recognition information;

a matching response module: and the voice recognition module is used for intelligently matching the voice recognition information with the voice dialect scenes in the custom dialect template and selecting the matched dialect for response.

In a possible implementation manner, optionally, the method further includes:

a user identification authentication module: the cloud intelligent voice platform interface is called through a user terminal to carry out user identification authentication;

an identification storage module: the system is used for recording and storing the user identification after authentication;

a call forwarding configuration module: the system comprises a user identification, a user interface and a user interface, wherein the user identification is used for matching an operator according to the user identification and carrying out operator call forwarding configuration on the user identification based on a matching result;

a dialogistic template calling module: the system comprises a cloud intelligent voice platform, a user terminal and a cloud intelligent voice platform, wherein the cloud intelligent voice platform is used for storing a default voice template;

a dialect editing module: and the user-defined language operation template is obtained and stored by performing language operation editing processing on the default language operation template.

According to another aspect of the present disclosure, there is also provided a control system including:

a processor;

a memory for storing processor-executable instructions;

the processor is configured to execute the executable instructions to implement the intelligent voice incoming call processing method.

The technical effects of this application:

according to the method, a user-defined voice template is obtained and configured on a cloud intelligent voice platform; obtaining incoming call information meeting a preset scene, and transferring the incoming call information to the cloud intelligent voice platform; automatically recognizing the incoming call information through the cloud intelligent voice platform to acquire voice recognition information; and intelligently matching the voice recognition information with the voice dialect scenes in the custom dialect template, and selecting the matched dialect for response. The intelligent personal telephone assistant can carry out call forwarding by adopting a cloud intelligent voice platform and carry out intelligent configuration of the telephone operation, does not need to carry out complicated voice stream butt joint with each large operator server, saves the butt joint cost and provides the intelligent telephone assistant with low cost for individuals.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a schematic flow chart illustrating an implementation of the intelligent processing method for incoming voice calls according to the present invention;

fig. 2 is a schematic diagram illustrating a hardware implementation of an incoming call reminder through a wechat public number in embodiment 1 of the present invention;

fig. 3 is a schematic diagram illustrating an operation procedure of performing incoming call reminding through a wechat public number in embodiment 1 of the present invention.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Example 1

In the embodiment, a cloud intelligent voice platform is adopted for call forwarding, a program/software 'WeChat public number' configured on a user terminal is adopted for intelligent configuration of dialogues, a telephone line is directly switched to the cloud intelligent voice platform by utilizing a mobile phone call forwarding function when a personal mobile phone cannot be connected, ASR and NLP technologies are adopted, the cloud intelligent voice platform automatically identifies the user voice, the optimal dialogues are selected and the user configuration response text is automatically generated into voice for response in a tts mode; the user can carry out user-defined configuration on a speech technology scene by paying attention to the WeChat public number, when the intelligent personal telephone assistant receives missed calls, the incoming call information can be displayed on the WeChat public number, and the conversation details of the intelligent personal telephone assistant can be displayed by clicking details, wherein the conversation details comprise a conversation recording and a conversation recording text displayed through an ASR technology.

Therefore, the intelligent personal telephone assistant does not need to carry out complicated voice stream butt joint with each large operator server, saves the butt joint cost and provides the intelligent personal telephone assistant with low cost for individuals.

Specifically, as shown in fig. 1, an intelligent processing method for a voice incoming call is provided according to an aspect of the present disclosure, which includes the following steps:

the system where the cloud intelligent voice platform is located is pre-configured with a default conversation template, a user can connect the cloud intelligent voice platform and call the conversation template through a system/software/APP and the like configured on a user terminal, the user-defined conversation template is obtained through user-defined editing processing of the user-defined conversation template, and after the user-defined conversation template is obtained, the user-defined conversation template is configured on the cloud intelligent voice platform.

In this embodiment, as shown in fig. 2 and 3, the cloud intelligent voice platform is called through the wechat public number, so that a default dialect template is provided for the system to modify, the dialect template can be defined by user, the dialect template adopts a text configuration mode, the system can automatically generate voice by tts according to the dialect text, the user can listen to the generated voice on the wechat, and voice templates such as voice tones can be set. The user-defined speech technology template comprises a plurality of speech technology scenes, and each speech technology scene comprises text information and corresponding response information.

the preset scenario may be: when a preset scene occurs, a telephone line is directly switched to the cloud intelligent voice platform, and the cloud intelligent voice platform answers the incoming call in an intelligent voice robot mode according to a default call technology template or a call technology template which is configured by a user in a WeChat end in a user-defined mode.

the cloud intelligent voice platform can perform voice recognition on user voice conversations in the incoming call information, and if ASR and NLP technologies can be adopted to recognize the user voice, corresponding voice recognition information is obtained.

The method comprises the steps that the ASR and NLP technologies are adopted to recognize user voice, after voice recognition information is obtained, the cloud intelligent voice platform can intelligently match the voice recognition information with voice technology scenes in the user-defined technology template, and according to the best user-defined technology template, a preset technology response template is called to carry out intelligent reply until an incoming call user hangs up.

After response, the response information can be converted into a text, a WeChat interface is called together with the call record, the recording and the converted text, the text is sent to the intelligent mobile terminal through the WeChat public number, a user can check the call record through the WeChat public number, can click details to check the call record and the converted text thereof to know the call content, and short messages or phone replies are carried out as required.

Therefore, the user can intelligently answer the phone through the self-defined answer, know the information to be known, and display the incoming message on the WeChat public number after the intelligent personal telephone assistant answers the missed call, and the call details of the intelligent personal telephone assistant can be displayed by clicking the details, including answering the conversation recording and displaying the conversation recording text through the ASR technology. Therefore, the intelligent personal telephone assistant does not need to carry out complicated voice stream butt joint with each large operator server, saves the butt joint cost and provides the intelligent telephone assistant with low cost for individuals.

In this embodiment, before the intelligent service of incoming call, the incoming call identifier of the user needs to be authenticated, and the configuration of the carrier call forwarding function needs to be performed.

s002, recording and storing the user identification after authentication;

The user pays attention to the WeChat public number through the intelligent mobile terminal and applies for opening the intelligent personal telephone assistant service.

After the intelligent personal telephone assistant service is opened, a cloud intelligent voice platform interface can be called through the WeChat public number, a short message verification code is sent to a user registration mobile phone, and the user inputs the verification code on the WeChat public number to confirm that the intelligent personal telephone assistant service is opened; the cloud intelligent voice platform records the mobile phone number, opens the mobile phone call forwarding function to the calling operator interface, and designates the call forwarding number as a cloud intelligent voice platform access number.

After the intelligent personal telephone assistant service is opened, the default conversation template provided by the system can be called through the WeChat public number.

In a possible implementation manner, optionally, the method further includes:

The user can provide default speech template for the system through the WeChat public number to modify and can self-define the speech template, the speech template adopts a text configuration mode, the system can automatically generate speech according to the speech text by tts, the user can listen to the generated speech on the WeChat, and can set sound templates such as speech tones and the like.

After the user calls and checks the default speech template, the user can carry out custom editing on the default speech in the default speech template, after the default speech template is stored, the system can automatically generate voice by tts according to the speech text, the user can listen to the generated voice on the WeChat, and voice templates such as voice tone can be set.

When a calling user calls an intelligent mobile terminal user who opens the intelligent personal telephone assistant service, when the call is missed or inconvenient to answer, the calling is automatically transferred and transferred to the cloud intelligent voice platform, and the cloud intelligent voice platform answers the call by adopting an intelligent voice robot mode according to a default call template or a call template configured by the user in a user-defined mode at a WeChat terminal.

After the intelligent voice robot acquires the voice information of the incoming call information, the cloud intelligent voice platform can identify the voice information of the user according to the configured ASR and NLP technologies to acquire voice identification information.

In order to respond according to the best and most matched custom dialect template, in this embodiment, the speech recognition information is matched with a word slot in the custom dialect template, and a slot value is calculated according to a slot value calculation mode; and calculating to obtain the best matched self-defined dialect template by a slot value comparison mode.

In the cloud intelligent voice platform, a word slot threshold value is preset and used for judging whether the calculated slot value reaches the word slot threshold value, if the word slot in a certain custom voice template reaches the word slot threshold value through comparison calculation, the custom voice template is selected as the best custom voice template to answer the incoming call. The word slot threshold value can be set by itself, in this embodiment, when the similarity exceeds 90% (word slot threshold value), the system calls a pre-configured dialect answer template to perform intelligent reply until the caller hangs up.

In the response process of the intelligent voice robot, the system automatically records, continuously records response information, automatically converts the recording into a conversation text by adopting ASR and NLP technologies, sends the conversation record, the recording and the conversion text to a user terminal application through a cloud intelligent voice platform interface called by the user terminal application (WeChat), and sends the conversation record, the recording and the conversion text to an intelligent mobile terminal through a WeChat public number.

In this embodiment, the wechat public number is only one user terminal application of the present invention, and the act of activating the smart personal telephone assistant may be performed on the wechat public number or on the mobile phone APP or an exclusive website, which is not limited by the present invention.

It should be noted that, although tts automatic speech generation technology and ASR and NLP technology are taken as examples to recognize the user speech, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set the text processing and speech recognition technology according to personal preference and/or practical application scene as long as the speech can be automatically generated according to the text and the text information can be obtained by recognizing the speech.

In this way, the user-defined voice technology template is configured on the cloud intelligent voice platform by acquiring the user-defined voice technology template; obtaining incoming call information meeting a preset scene, and transferring the incoming call information to the cloud intelligent voice platform; automatically recognizing the incoming call information through the cloud intelligent voice platform to acquire voice recognition information; and intelligently matching the voice recognition information with the voice dialect scenes in the custom dialect template, and selecting the matched dialect for response. The intelligent personal telephone assistant can carry out call forwarding by adopting a cloud intelligent voice platform and carry out intelligent configuration of the telephone operation, does not need to carry out complicated voice stream butt joint with each large operator server, saves the butt joint cost and provides the intelligent telephone assistant with low cost for individuals.

Example 2

Based on the implementation principle of embodiment 1, this embodiment provides an apparatus to implement the above method.

In a possible implementation manner, optionally, the method further includes:

For specific module functions, implementation principles, and interaction relationships among the modules of the above modules, refer to embodiment 1 specifically, and are not described herein again. The connection mode between the modules may be a wired or wireless or communication protocol or other communication modules, and is not limited herein.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

Example 3

Still further, according to another aspect of the present disclosure, there is provided a control system for a voice incoming call intelligent processing method, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the intelligent processing method for incoming voice calls described in embodiment 1 above when executing the executable instructions.

The control system of the disclosed embodiments includes a processor and a memory for storing processor-executable instructions. The processor is configured to execute the executable instructions to implement any one of the above-mentioned intelligent voice incoming call processing methods.

Here, it should be noted that the number of processors may be one or more. Meanwhile, in the control system of the embodiment of the present disclosure, an input device and an output device may be further included. The processor, the memory, the input device, and the output device may be connected by a bus, or may be connected by other means, and are not limited specifically herein.

The memory, which is a computer-readable storage medium for a voice incoming call intelligent processing method, may be used to store software programs, computer-executable programs, and various modules, such as: the embodiment of the disclosure provides a program or a module corresponding to an intelligent processing method for incoming voice calls. The processor executes various functional applications of the control system and data processing by executing software programs or modules stored in the memory.

The input device may be used to receive an input number or signal. Wherein the signal may be a key signal generated in connection with user settings and function control of the device/terminal/server. The output means may comprise a display device such as a display screen.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A voice incoming call intelligent processing method is characterized by comprising the following steps:

2. The intelligent processing method for voice incoming calls according to claim 1, before step S100, further comprising:

s002, recording and storing the user identification after authentication;

3. The intelligent processing method for the voice incoming call according to claim 2, further comprising:

4. The intelligent processing method for voice incoming call according to claim 3, wherein in step S005, the performing a speech editing process on the default speech template to obtain a custom speech template and storing the custom speech template comprises:

5. The intelligent processing method for the voice incoming call according to any one of claims 1 to 4, wherein in step S300, the automatically recognizing the incoming call information by the cloud-based intelligent voice platform to obtain voice recognition information includes:

6. The intelligent processing method for the voice incoming call according to claim 5, wherein in step S400, the intelligently matching the voice recognition information with the voice tactical scene in the custom tactical template, and selecting the matched tactical to answer, comprises:

7. The intelligent processing method for the voice incoming call according to claim 6, wherein in step S400, the intelligently matching the voice recognition information with the voice tactical scene in the custom tactical template, and selecting the matched tactical to answer further comprises:

8. An apparatus for implementing the intelligent processing method for voice call of any one of claims 1 to 7, comprising a custom phone template configuration module, a call forwarding module, a voice recognition module and a matching response module, wherein:

9. The apparatus of claim 8, further comprising:

10. A control system, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the executable instructions to implement the intelligent processing method for the incoming voice call according to any one of claims 1 to 7.