CN111666380A - Intelligent calling method, device, equipment and medium - Google Patents

Intelligent calling method, device, equipment and medium Download PDF

Info

Publication number
CN111666380A
CN111666380A CN202010536655.6A CN202010536655A CN111666380A CN 111666380 A CN111666380 A CN 111666380A CN 202010536655 A CN202010536655 A CN 202010536655A CN 111666380 A CN111666380 A CN 111666380A
Authority
CN
China
Prior art keywords
user
voice
answer
utilizing
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010536655.6A
Other languages
Chinese (zh)
Inventor
吉培轩
何荡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010536655.6A priority Critical patent/CN111666380A/en
Publication of CN111666380A publication Critical patent/CN111666380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses an intelligent calling method, an intelligent calling device, intelligent calling equipment and an intelligent calling medium, and relates to the technical fields of voice interaction, natural language processing, deep learning, big data, knowledge maps and intelligent search. The specific implementation scheme is as follows: converting the voice of the user into text information by utilizing a voice recognition technology; performing semantic recognition on the text information by using a natural language processing technology to determine the problem of the user; inquiring answers of the questions from a knowledge base according to the questions of the user; generating answer audio by using a speech synthesis technology according to the text content of the answer, and pushing the answer audio to the user. According to the embodiment of the application, the intelligent call center is utilized to replace an artificial seat to realize intelligent call, answers are provided for users, the efficiency of the call center is improved, and the cost is reduced.

Description

Intelligent calling method, device, equipment and medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a voice technology, and in particular, to an intelligent calling method, apparatus, device, and medium.
Background
The call center, also called as customer service center, is usually a complete integrated information service system that is integrated by telephone access and utilizing multiple functions of communication network and computer network and connected with enterprise into a whole, is used to implement quick processing and unified acceptance of customer requirements, and is the main medium for communication between enterprise and customer.
With the popularization of the concept of customer centers, more and more enterprises pay attention to the communication with consumers, the status and value of the call center become more and more prominent, and the call center becomes an important means for improving the product service quality and enhancing the benefit of the enterprises.
However, the current call center is mainly served by a human seat, which is not only high in cost but also low in efficiency.
Disclosure of Invention
The embodiment of the application provides an intelligent calling method, an intelligent calling device, intelligent calling equipment and an intelligent calling medium, so that the cost of a call center is reduced, and the efficiency of the call center is improved.
In a first aspect, an embodiment of the present application provides an intelligent calling method, including:
converting the voice of the user into text information by utilizing a voice recognition technology;
performing semantic recognition on the text information by using a natural language processing technology to determine the problem of the user;
inquiring answers of the questions from a knowledge base according to the questions of the user;
generating answer audio by using a speech synthesis technology according to the text content of the answer, and pushing the answer audio to the user.
In a second aspect, an embodiment of the present application further provides an intelligent calling device, including:
the voice recognition module is used for converting the voice of the user into text information by utilizing a voice recognition technology;
the natural language processing module is used for performing semantic recognition on the text information by utilizing a natural language processing technology and determining the problem of the user;
the answer query module is used for querying answers of the questions from a knowledge base according to the questions of the user;
and the voice synthesis module is used for generating answer audio by using the text content of the answer through a voice synthesis technology and pushing the answer audio to the user.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the intelligent call method of any embodiment of the present application.
In a fourth aspect, the embodiments of the present application further provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the intelligent call method according to any of the embodiments of the present application.
According to the technical scheme of the embodiment of the application, the intelligent call center is used for replacing an artificial seat to realize intelligent call, so that answers are provided for users, the efficiency of the call center is improved, and the cost is reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become readily apparent from the following description, and other effects of the above alternatives will be described hereinafter in conjunction with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flow chart diagram of an intelligent calling method according to an embodiment of the present application;
fig. 2 is a flow chart diagram of an intelligent calling method according to an embodiment of the application;
FIG. 3 is a schematic diagram of the core capabilities of a call center implementing the intelligent call method of the embodiment of the present application;
FIG. 4 is a schematic diagram of an intelligent calling device according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing the intelligent calling method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic flowchart of an intelligent calling method according to an embodiment of the present application, which is applicable to a situation where a call center system receives a telephone access of a user, confirms a question of the user and provides an answer during a multi-turn conversation with the user, and relates to the technical fields of voice interaction, natural language processing, deep learning, big data, knowledge maps and intelligent search. The method can be executed by an intelligent calling device, which is implemented by software and/or hardware, and is preferably configured in an electronic device, such as a terminal or a server. As shown in fig. 1, the method specifically includes the following steps:
and S101, converting the voice of the user into text information by utilizing a voice recognition technology.
The call center system of the embodiment of the application is used as an intelligent customer service, can provide multiple rounds of conversations with the user, solves the problem of the user and provides answers. When the user is connected with any user, the voice of the user can be acquired, and the voice of the user is converted into text information by utilizing a voice recognition technology, so that a conversation can be conducted according to the text information. In one embodiment, speech recognition may be performed using a pre-trained speech recognition model based on deep learning techniques.
S102, semantic recognition is carried out on the text information by utilizing a natural language processing technology, and the problem of the user is determined.
Specifically, based on the deep learning technology, the semantic recognition can be performed on the text information by using a pre-trained semantic recognition model to determine the problem of the user.
S103, inquiring answers of the questions from a knowledge base according to the questions of the user.
The knowledge base stores knowledge in one or more fields, and the content in the knowledge base can be configured in advance according to the application scene of the call center. Then, the answer corresponding to the user question is obtained from the knowledge base through a mode of query matching and the like.
And S104, generating answer audio by using the text content of the answer through a voice synthesis technology, and pushing the answer audio to the user.
Because the call center needs to realize multiple rounds of conversations, after the answer is determined, the text content of the answer can be converted into voice by using a voice synthesis technology and then pushed to the user, so that the user feels that the answer is communicated with a real 'person', and the service quality of the call center is improved.
In addition, the call center of the embodiment of the application also has the capability of quality control analysis. Specifically, the intelligent calling method of the present application further includes: storing all the voices of the user and all the answer audios pushed to the user and converting the answers into historical calling texts; and performing abnormal recognition on the historical calling text in a keyword matching mode. That is to say, in the speech recognition process, the speech of the user and the speech of the call center can be distinguished through the two-way separation technology, if some illegal contents or unfortunate words and other unfortunate words appear in the conversation between the user and the call center, the illegal contents or the unfortunate words and other unfortunate words can be detected through quality inspection analysis, and whether the problem appears at the user side or at the call center side is determined. Meanwhile, for the situation that the user problem cannot be solved, the problem can be detected by quality inspection analysis, so that the subsequent improvement is facilitated.
According to the technical scheme, the intelligent calling center is used for replacing an artificial seat to realize intelligent calling, answers are provided for users, labor cost is saved, manual waiting is not needed, the efficiency of the calling center is improved, and customer experience and service quality are well improved.
Fig. 2 is a schematic flow chart of an intelligent calling method according to an embodiment of the present application, and the present embodiment further performs optimization based on the above embodiment. As shown in fig. 2, the method specifically includes the following steps:
s201, recognizing acoustic features in the voice by using a pre-trained sound model, classifying the acoustic features, and determining at least one phoneme unit in the voice.
Sound is made up of different phonemes and different phonemes have different acoustic characteristics. Therefore, acoustic features in the speech are recognized by using the acoustic model, and at least one phoneme unit in the speech can be determined by classifying the acoustic features. The acoustic model may be obtained by pre-training using a deep learning method in the prior art, for example, a supervised training method is used, and details are not described here.
S202, performing text conversion on the at least one phoneme unit by using a pre-trained language model, and determining text information of the voice.
Different phonemes correspond to different word units, and the phoneme units can be subjected to text conversion by using the language model to determine the text information of the voice. The language model may be obtained by pre-training using a deep learning method in the prior art, for example, based on a supervised training method, and is not described herein again.
S203, acquiring scene keywords from the text information by using a natural language processing technology, and determining the current conversation scene according to the scene keywords.
Different fields have different conversation scenes, for example, in the bank field, savings cards, credit cards, financial products and the like belong to different scenes. Therefore, in the process of solving the user problem in the call center, a specific scene needs to be identified first, and then the user problem needs to be solved according to the specific scene, so that the accuracy of problem solving is improved.
Specifically, the text information is subjected to natural language processing, scene keywords, such as keywords related to a scene, such as a savings card and a credit card, are identified from the text information, and the keywords can be obtained in advance through big data statistics, stored in a database, and supplemented and refined in real time. And determining the current conversation scene through the recognition of the keywords.
S204, based on the dialogue scene, recognizing the behavior of the user from the text information, performing semantic recognition according to the behavior of the user, and determining the problem of the user.
Different fields have different scenes, and different scenes can comprise various different behaviors, for example, in the scene of an ideal property, the behavior of a user can be subdivided into product learning, product purchasing or after-sales service and the like, specific problems of the user are identified based on different behaviors, and the accuracy is higher. Therefore, based on the current dialogue scene, the natural language processing technology is further used for identifying the user behavior from the text information, semantic identification is carried out according to the user behavior, and the specific problem of the user is determined. For example, if the user action is purchasing a product, the name or model of the product to be purchased may be further identified or confirmed with the user, if the user action is knowing the product, the name of the product to be known and which aspect of the product to be known may be further identified or confirmed with the user, and the like. And if one pair of dialogs cannot be identified, the specific problem of the user can be confirmed through multiple rounds of dialogs.
In addition, the semantic recognition of the text information by using the natural language processing technology to determine the question of the user may further include: and performing semantic recognition on the text information by utilizing a natural language processing technology and combining context information of the current conversation to determine the problem of the user. That is, the accuracy of semantic recognition is improved by incorporating context. For example, in a plurality of rounds of conversations, the text information of the current speech is semantically recognized by the user through the previous rounds of speech data, and since the conversations are performed in a certain context, the current semantic information can be recognized more accurately by combining the context information because the context is generally continuous before and after the conversation.
S205, preprocessing the user problem, wherein the preprocessing at least comprises at least one of word completion, pinyin identification and wrongly written character error correction.
Some noise or inaccuracy may occur in the user problem identified by the natural language processing technique, and therefore, the noise is also removed by preprocessing, for example, by means of word completion, pinyin identification, and wrongly written or mispronounced word error correction, so as to improve the accuracy of the problem.
And S206, inquiring answers from the knowledge base according to the preprocessed questions.
Specifically, the knowledge base may store a knowledge graph, the knowledge graph is a semantic network that reveals relationships between entities, and the relationships between the entities may constitute knowledge. From the pre-processed question, an entity in the question may be determined, and then an answer associated with the entity may be queried from the knowledge-graph. For example, if the problem is: the equity of the Baijin credit card of the construction bank is related to three entities, namely, the construction bank, the Baijin credit card and the equity, so that the entity construction bank can be searched in the knowledge map, the Baijin credit card of the construction bank is further searched through the relation between the entity and the entity, then the equity of the Baijin credit card of the construction bank is further searched, and the content of the equity is used as the answer of the question. The knowledge is stored and inquired through the knowledge map, so that the inquiry efficiency can be improved, and meanwhile, the resources can be saved.
And S207, generating answer audio by using the text content of the answer through a speech synthesis technology, and pushing the answer audio to the user.
In one embodiment, the method further comprises: carrying out voiceprint feature recognition on the voice of the user by utilizing a pre-trained voiceprint model; and acquiring a user portrait of the user according to the identified voiceprint features of the user, wherein the user portrait is acquired in advance by utilizing a big data analysis technology and is stored corresponding to the respective voiceprint features of each user.
Specifically, the sound of different individuals is different, and the voiceprint characteristics are different. Thus, by identifying the voiceprint characteristics, different users can be distinguished. In the application, a voiceprint model is trained in advance by using a deep learning technology, then voiceprint feature recognition is carried out on voice of a user by using the voiceprint model, user figures of different users are obtained in advance through a big data analysis technology, and the user figures and the voiceprint features of each user are correspondingly stored, so that the user figures of the user can be obtained after the voiceprint features of the current user are recognized, and personalized services, such as personalized voice synthesis or personalized intelligent recommendation and other services, can be carried out on different users.
For example, in one embodiment, the generating the answer audio by using the text content of the answer and the speech synthesis technology may include: determining the voice type of the current conversation from a pre-established voice library according to the voiceprint features and the user portrait, wherein the voice type at least comprises male and female voices, voices in different languages, voices in different dialects, voices in different styles and voices in different ages; and generating answer audio conforming to the voice type by using the text content of the answer through a pre-trained voice synthesis model.
Specifically, different users have different preferences for sound, and providing personalized services can make users feel more humanized services and can promote conversations to be performed more smoothly. Wherein personalized sound synthesis may enable different types of speech including, but not limited to, male voices, female voices, voices in different languages, voices in different dialects, voices in different styles, voices in different ages, etc. For example, when the user uses a dialect to communicate, the call center also communicates with the same dialect, and when the user prefers a gentle sound, the gentle sound may be synthesized. The method comprises the steps of establishing a voice library with different voice types in advance, obtaining the voice types from the voice library according to voiceprint characteristics and user figures of users in the current conversation, determining the current emotion of the users through emotion analysis, and obtaining the proper voice types from the voice library according to emotion characteristics, so that voice conforming to the current voice types is generated, and the reality of the conversation and the service quality of a call center are improved.
In another embodiment, the method further comprises: predicting the requirements of the user by utilizing a pre-trained recommendation model according to the user portrait and the problems of the user in the current conversation; and carrying out message pushing on the user according to the predicted user requirement.
The purpose of message pushing comprises realizing intelligent recommendation. For example, at least the gender, the industry and personal preference of the user can be determined through the user portrait, then, the requirement of the user is predicted by combining the user portrait based on the scenes, the user behaviors and the context involved in the current conversation, for example, the current scene is a credit card scene, the user behaviors are the function of knowing a credit card, then, the user can be predicted to have certain interest in financial products, and therefore, the proper financial products can be recommended for the user according to the consumption level of the user, and the corresponding messages are pushed to the user. The predictive analysis and the intelligent recommendation can be realized by using a recommendation model trained in advance by using a deep learning method in the prior art, and are not described herein again.
Fig. 3 is a schematic diagram of core capabilities of a call center implementing the intelligent call method according to the embodiment of the present application. As shown in the figure, the core capabilities of the call center of the embodiment of the present application mainly include speech recognition, natural language processing, speech synthesis, quality control analysis, personalized services, and an intelligent knowledge base. The voice recognition can convert the voice questions of the user into characters through an intelligent voice recognition technology (ASR), and then a background knowledge base is called to push answers to the questions to the client through a natural language processing technology; the natural language processing provides the capabilities of semantic recognition, scene recognition, behavior analysis, context analysis, prediction analysis and the like; the voice synthesis can output the text content to the user in a voice form, and has different pronunciation effects including male and female voices, Chinese and English, dialects, different styles, different age periods and the like; the intelligent knowledge base is combined with a natural language processing technology, intelligent search of knowledge is realized by a semantic recognition engine, and the accuracy is improved by search automatic completion, pinyin recognition, wrongly written characters, error correction and other auxiliary search functions; the quality inspection analysis is to perform quality inspection and translation on the records of the user and the customer service personnel of the call center through an intelligent voice technology and perform abnormal identification on the records; the personalized service is that by combining big data analysis and mining technology, when a user initiates a customer service request, user figures aiming at the user are automatically obtained or formed, the user figures comprise data information of the user, question and answer information, purchasing behavior, product preference, knowledge recommendation, product recommendation and the like of each channel of a customer service system, and customer service question answering and further marketing service expansion can be performed in a targeted manner, so that intelligent marketing is realized.
The technical scheme of the embodiment of the application relates to the technical field of voice interaction, natural language processing, deep learning, big data, knowledge maps and intelligent search, utilizes an intelligent call center, replaces an artificial seat to realize intelligent call, provides answers for a user through a multi-turn conversation technology, saves labor cost, does not need artificial waiting, improves the efficiency of the call center, and well improves customer experience and service quality. In addition, personalized voice synthesis service can be realized for the user, personalized intelligent recommendation is provided, and the conversion rate of products is improved.
Fig. 4 is a schematic structural diagram of an intelligent calling device according to an embodiment of the present application, which is applicable to a situation where a call center system receives a telephone access of a user, confirms a question of the user and provides an answer during multiple rounds of conversations with the user, and relates to the technical fields of voice interaction, natural language processing, deep learning, big data, knowledge maps and intelligent search. The device can realize the intelligent calling method in any embodiment of the application. As shown in fig. 4, the apparatus 300 specifically includes:
the voice recognition module 301 is configured to convert a voice of a user into text information by using a voice recognition technology;
a natural language processing module 302, configured to perform semantic recognition on the text information by using a natural language processing technology, and determine a question of the user;
an answer query module 303, configured to query an answer to the question from a knowledge base according to the question of the user;
and the speech synthesis module 304 is configured to generate answer audio from the text content of the answer by using a speech synthesis technology, and push the answer audio to the user.
Optionally, the speech recognition module includes:
the acoustic feature processing unit is used for recognizing acoustic features in the voice by utilizing a pre-trained sound model, classifying the acoustic features and determining at least one phoneme unit in the voice;
and the text conversion unit is used for performing text conversion on the at least one phoneme unit by utilizing a pre-trained language model to determine the text information of the voice.
Optionally, the natural language processing module includes:
the scene recognition unit is used for acquiring scene keywords from the text information by using a natural language processing technology and determining a current conversation scene according to the scene keywords;
and the semantic recognition unit is used for recognizing the behavior of the user from the text information based on the conversation scene, performing semantic recognition according to the behavior of the user and determining the problem of the user.
Optionally, the natural language processing module is specifically configured to:
and performing semantic recognition on the text information by utilizing a natural language processing technology and combining context information of the current conversation to determine the problem of the user.
Optionally, the answer querying module includes:
the system comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for preprocessing the problems of the user, and the preprocessing at least comprises at least one of word completion, pinyin identification and wrongly written character error correction;
and the query unit is used for querying answers from the knowledge base according to the preprocessed questions.
Optionally, the apparatus further comprises:
the voiceprint recognition module is used for carrying out voiceprint feature recognition on the voice of the user by utilizing a pre-trained voiceprint model;
and the user portrait determining module is used for acquiring the user portrait of the user according to the identified voiceprint characteristics of the user, wherein the user portrait is acquired in advance by utilizing a big data analysis technology and is stored corresponding to the respective voiceprint characteristics of each user.
Optionally, the speech synthesis module includes:
the voice type determining unit is used for determining the voice type of the current conversation from a pre-established voice library according to the voiceprint features and the user portrait, wherein the voice type at least comprises male and female voices, voices in different languages, voices in different dialects, voices in different styles and voices in different ages;
and the voice synthesis unit is used for generating answer audio conforming to the voice type by using the text content of the answer and a pre-trained voice synthesis model.
Optionally, the apparatus further includes an intelligent recommendation module, specifically configured to:
predicting the requirements of the user by utilizing a pre-trained recommendation model according to the user portrait and the problems of the user in the current conversation;
and carrying out message pushing on the user according to the predicted user requirement.
Optionally, the apparatus further includes a quality inspection analysis module, specifically configured to:
saving all the voices of the user, pushing all answer audios to the user and converting the answer audios into historical call texts;
and performing abnormal recognition on the historical calling text in a keyword matching mode.
The intelligent calling device 300 provided by the embodiment of the present application can execute the intelligent calling method provided by any embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method. Reference may be made to the description of any method embodiment of the present application for details not explicitly described in this embodiment.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 401 is taken as an example.
Memory 402 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the intelligent call method provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the intelligent call method provided by the present application.
The memory 402, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the smart call method in the embodiments of the present application (e.g., the speech recognition module 301, the natural language processing module 302, the answer query module 303, and the speech synthesis module 304 shown in fig. 4). The processor 401 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 402, that is, implements the intelligent call method in the above-described method embodiments.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like created according to use of an electronic device implementing the smart call method of the embodiment of the present application. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 402 may optionally include a memory remotely located from the processor 401, and these remote memories may be connected via a network to an electronic device implementing the intelligent call method of embodiments of the present application. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the intelligent calling method of the embodiment of the application may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 5 illustrates an example of a connection by a bus.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing the intelligent call method of the embodiment of the present application, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
According to the technical scheme of this application embodiment, relate to voice interaction, natural language processing, deep learning, big data and intelligent search technical field, utilize intelligent call center, replace artifical seat to realize intelligent calling, pass through many rounds of conversation technique with the user, for the user provides the answer, practiced thrift the human cost, in addition, need not artifical the waiting, improved call center's efficiency, promoted customer experience and quality of service well. In addition, personalized voice synthesis service can be realized for the user, personalized intelligent recommendation is provided, and the conversion rate of products is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (20)

1. An intelligent calling method, comprising:
converting the voice of the user into text information by utilizing a voice recognition technology;
performing semantic recognition on the text information by using a natural language processing technology to determine the problem of the user;
inquiring answers of the questions from a knowledge base according to the questions of the user;
generating answer audio by using a speech synthesis technology according to the text content of the answer, and pushing the answer audio to the user.
2. The method of claim 1, wherein the converting the user's speech to textual information using speech recognition techniques comprises:
recognizing acoustic features in the voice by utilizing a pre-trained sound model, classifying the acoustic features, and determining at least one phoneme unit in the voice;
and performing text conversion on the at least one phoneme unit by utilizing a pre-trained language model to determine the text information of the voice.
3. The method of claim 1, wherein the semantically recognizing the text information using natural language processing techniques to determine the user's question comprises:
acquiring scene keywords from the text information by using a natural language processing technology, and determining a current conversation scene according to the scene keywords;
and identifying the behavior of the user from the text information based on the conversation scene, performing semantic identification according to the behavior of the user, and determining the problem of the user.
4. The method of claim 1, wherein the semantically recognizing the text information using natural language processing techniques to determine the user's question comprises:
and performing semantic recognition on the text information by utilizing a natural language processing technology and combining context information of the current conversation to determine the problem of the user.
5. The method of claim 1, wherein said querying answers to said questions from a knowledge base based on said user's questions comprises:
preprocessing the user problem, wherein the preprocessing at least comprises at least one of word completion, pinyin identification and wrongly written or mispronounced character error correction;
and inquiring answers from the knowledge base according to the preprocessed questions.
6. The method of claim 1, further comprising:
carrying out voiceprint feature recognition on the voice of the user by utilizing a pre-trained voiceprint model;
and acquiring a user portrait of the user according to the identified voiceprint features of the user, wherein the user portrait is acquired in advance by utilizing a big data analysis technology and is stored corresponding to the respective voiceprint features of each user.
7. The method of claim 6, wherein said generating the answer audio using a speech synthesis technique from the text content of the answer comprises:
determining the voice type of the current conversation from a pre-established voice library according to the voiceprint features and the user portrait, wherein the voice type at least comprises male and female voices, voices in different languages, voices in different dialects, voices in different styles and voices in different ages;
and generating answer audio conforming to the voice type by using the text content of the answer through a pre-trained voice synthesis model.
8. The method of claim 6, further comprising:
predicting the requirements of the user by utilizing a pre-trained recommendation model according to the user portrait and the problems of the user in the current conversation;
and carrying out message pushing on the user according to the predicted user requirement.
9. The method of claim 1, further comprising:
saving all the voices of the user, pushing all answer audios to the user and converting the answer audios into historical call texts;
and performing abnormal recognition on the historical calling text in a keyword matching mode.
10. An intelligent calling device comprising:
the voice recognition module is used for converting the voice of the user into text information by utilizing a voice recognition technology;
the natural language processing module is used for performing semantic recognition on the text information by utilizing a natural language processing technology and determining the problem of the user;
the answer query module is used for querying answers of the questions from a knowledge base according to the questions of the user;
and the voice synthesis module is used for generating answer audio by using the text content of the answer through a voice synthesis technology and pushing the answer audio to the user.
11. The apparatus of claim 10, wherein the speech recognition module comprises:
the acoustic feature processing unit is used for recognizing acoustic features in the voice by utilizing a pre-trained sound model, classifying the acoustic features and determining at least one phoneme unit in the voice;
and the text conversion unit is used for performing text conversion on the at least one phoneme unit by utilizing a pre-trained language model to determine the text information of the voice.
12. The apparatus of claim 10, wherein the natural language processing module comprises:
the scene recognition unit is used for acquiring scene keywords from the text information by using a natural language processing technology and determining a current conversation scene according to the scene keywords;
and the semantic recognition unit is used for recognizing the behavior of the user from the text information based on the conversation scene, performing semantic recognition according to the behavior of the user and determining the problem of the user.
13. The apparatus of claim 10, wherein the natural language processing module is specifically configured to:
and performing semantic recognition on the text information by utilizing a natural language processing technology and combining context information of the current conversation to determine the problem of the user.
14. The apparatus of claim 10, wherein the answer query module comprises:
the system comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for preprocessing the problems of the user, and the preprocessing at least comprises at least one of word completion, pinyin identification and wrongly written character error correction;
and the query unit is used for querying answers from the knowledge base according to the preprocessed questions.
15. The apparatus of claim 10, further comprising:
the voiceprint recognition module is used for carrying out voiceprint feature recognition on the voice of the user by utilizing a pre-trained voiceprint model;
and the user portrait determining module is used for acquiring the user portrait of the user according to the identified voiceprint characteristics of the user, wherein the user portrait is acquired in advance by utilizing a big data analysis technology and is stored corresponding to the respective voiceprint characteristics of each user.
16. The apparatus of claim 15, wherein the speech synthesis module comprises:
the voice type determining unit is used for determining the voice type of the current conversation from a pre-established voice library according to the voiceprint features and the user portrait, wherein the voice type at least comprises male and female voices, voices in different languages, voices in different dialects, voices in different styles and voices in different ages;
and the voice synthesis unit is used for generating answer audio conforming to the voice type by using the text content of the answer and a pre-trained voice synthesis model.
17. The apparatus according to claim 15, further comprising an intelligent recommendation module, specifically configured to:
predicting the requirements of the user by utilizing a pre-trained recommendation model according to the user portrait and the problems of the user in the current conversation;
and carrying out message pushing on the user according to the predicted user requirement.
18. The apparatus of claim 1, further comprising a quality analysis module, specifically configured to:
saving all the voices of the user, pushing all answer audios to the user and converting the answer audios into historical call texts;
and performing abnormal recognition on the historical calling text in a keyword matching mode.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the intelligent call method of any of claims 1-9.
20. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the smart call method of any one of claims 1-9.
CN202010536655.6A 2020-06-12 2020-06-12 Intelligent calling method, device, equipment and medium Pending CN111666380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010536655.6A CN111666380A (en) 2020-06-12 2020-06-12 Intelligent calling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010536655.6A CN111666380A (en) 2020-06-12 2020-06-12 Intelligent calling method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN111666380A true CN111666380A (en) 2020-09-15

Family

ID=72387374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010536655.6A Pending CN111666380A (en) 2020-06-12 2020-06-12 Intelligent calling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111666380A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242152A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Voice interaction method and device, electronic equipment and storage medium
CN112307187A (en) * 2020-12-08 2021-02-02 浙江百应科技有限公司 Method based on intelligent customer service auxiliary interaction
CN112667798A (en) * 2021-01-12 2021-04-16 杭州云嘉云计算有限公司 Call center language processing method and system based on AI
CN112967721A (en) * 2021-02-03 2021-06-15 上海明略人工智能(集团)有限公司 Sales lead information identification method and system based on voice identification technology
CN113099054A (en) * 2021-03-30 2021-07-09 中国建设银行股份有限公司 Voice interaction method, device, equipment and computer readable medium
CN113593530A (en) * 2021-07-26 2021-11-02 国网安徽省电力有限公司建设分公司 Safety helmet system based on NLP technology and operation method
CN113971203A (en) * 2021-10-26 2022-01-25 福建云知声智能科技有限公司 Information processing method, information processing apparatus, storage medium, and electronic apparatus
CN115022471A (en) * 2022-05-18 2022-09-06 北京互连众信科技有限公司 Intelligent robot voice interaction system and method
CN115455161A (en) * 2022-09-02 2022-12-09 北京百度网讯科技有限公司 Conversation processing method, conversation processing device, electronic equipment and storage medium
CN116741143A (en) * 2023-08-14 2023-09-12 深圳市加推科技有限公司 Digital-body-based personalized AI business card interaction method and related components
WO2024021986A1 (en) * 2022-07-28 2024-02-01 青岛海尔空调器有限总公司 Method and apparatus for reducing speech response time, and storage medium and speech device
CN117636877A (en) * 2024-01-24 2024-03-01 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction
CN117935865A (en) * 2024-03-22 2024-04-26 江苏斑马软件技术有限公司 User emotion analysis method and system for personalized marketing

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122071A1 (en) * 2012-10-30 2014-05-01 Motorola Mobility Llc Method and System for Voice Recognition Employing Multiple Voice-Recognition Techniques
US20150244867A1 (en) * 2005-12-15 2015-08-27 At&T Intellectual Property I, L.P. Messaging translation services
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning
US20190066696A1 (en) * 2017-08-29 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for verifying information
CN109410911A (en) * 2018-09-13 2019-03-01 何艳玲 Artificial intelligence learning method based on speech recognition
CN110189754A (en) * 2019-05-29 2019-08-30 腾讯科技(深圳)有限公司 Voice interactive method, device, electronic equipment and storage medium
CN110335595A (en) * 2019-06-06 2019-10-15 平安科技(深圳)有限公司 Slotting based on speech recognition asks dialogue method, device and storage medium
CN110751943A (en) * 2019-11-07 2020-02-04 浙江同花顺智能科技有限公司 Voice emotion recognition method and device and related equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150244867A1 (en) * 2005-12-15 2015-08-27 At&T Intellectual Property I, L.P. Messaging translation services
US20140122071A1 (en) * 2012-10-30 2014-05-01 Motorola Mobility Llc Method and System for Voice Recognition Employing Multiple Voice-Recognition Techniques
US20190066696A1 (en) * 2017-08-29 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for verifying information
CN109147768A (en) * 2018-09-13 2019-01-04 云南电网有限责任公司 A kind of audio recognition method and system based on deep learning
CN109410911A (en) * 2018-09-13 2019-03-01 何艳玲 Artificial intelligence learning method based on speech recognition
CN110189754A (en) * 2019-05-29 2019-08-30 腾讯科技(深圳)有限公司 Voice interactive method, device, electronic equipment and storage medium
CN110335595A (en) * 2019-06-06 2019-10-15 平安科技(深圳)有限公司 Slotting based on speech recognition asks dialogue method, device and storage medium
CN110751943A (en) * 2019-11-07 2020-02-04 浙江同花顺智能科技有限公司 Voice emotion recognition method and device and related equipment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112242152B (en) * 2020-10-13 2023-09-19 中移(杭州)信息技术有限公司 Voice interaction method and device, electronic equipment and storage medium
CN112242152A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Voice interaction method and device, electronic equipment and storage medium
CN112307187A (en) * 2020-12-08 2021-02-02 浙江百应科技有限公司 Method based on intelligent customer service auxiliary interaction
CN112667798A (en) * 2021-01-12 2021-04-16 杭州云嘉云计算有限公司 Call center language processing method and system based on AI
CN112967721A (en) * 2021-02-03 2021-06-15 上海明略人工智能(集团)有限公司 Sales lead information identification method and system based on voice identification technology
CN112967721B (en) * 2021-02-03 2024-05-31 上海明略人工智能(集团)有限公司 Sales lead information recognition method and system based on voice recognition technology
CN113099054A (en) * 2021-03-30 2021-07-09 中国建设银行股份有限公司 Voice interaction method, device, equipment and computer readable medium
CN113593530A (en) * 2021-07-26 2021-11-02 国网安徽省电力有限公司建设分公司 Safety helmet system based on NLP technology and operation method
CN113971203A (en) * 2021-10-26 2022-01-25 福建云知声智能科技有限公司 Information processing method, information processing apparatus, storage medium, and electronic apparatus
CN115022471A (en) * 2022-05-18 2022-09-06 北京互连众信科技有限公司 Intelligent robot voice interaction system and method
WO2024021986A1 (en) * 2022-07-28 2024-02-01 青岛海尔空调器有限总公司 Method and apparatus for reducing speech response time, and storage medium and speech device
CN115455161A (en) * 2022-09-02 2022-12-09 北京百度网讯科技有限公司 Conversation processing method, conversation processing device, electronic equipment and storage medium
CN116741143B (en) * 2023-08-14 2023-10-31 深圳市加推科技有限公司 Digital-body-based personalized AI business card interaction method and related components
CN116741143A (en) * 2023-08-14 2023-09-12 深圳市加推科技有限公司 Digital-body-based personalized AI business card interaction method and related components
CN117636877A (en) * 2024-01-24 2024-03-01 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction
CN117636877B (en) * 2024-01-24 2024-04-02 广东铭太信息科技有限公司 Intelligent system operation method and system based on voice instruction
CN117935865A (en) * 2024-03-22 2024-04-26 江苏斑马软件技术有限公司 User emotion analysis method and system for personalized marketing

Similar Documents

Publication Publication Date Title
CN111666380A (en) Intelligent calling method, device, equipment and medium
US9753914B2 (en) Natural expression processing method, processing and response method, device, and system
KR101634086B1 (en) Method and computer system of analyzing communication situation based on emotion information
CN107609092B (en) Intelligent response method and device
US20180225306A1 (en) Method and system to recommend images in a social application
KR20160089152A (en) Method and computer system of analyzing communication situation based on dialogue act information
CN111177355B (en) Man-machine conversation interaction method and device based on search data and electronic equipment
CN110019742B (en) Method and device for processing information
CN107430616A (en) The interactive mode of speech polling re-forms
CN114787814A (en) Reference resolution
CN115309877B (en) Dialogue generation method, dialogue model training method and device
CN111881254A (en) Method and device for generating dialogs, electronic equipment and storage medium
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN112148850A (en) Dynamic interaction method, server, electronic device and storage medium
JP2021163473A (en) Method and apparatus for pushing information, electronic apparatus, storage medium, and computer program
CN113051380A (en) Information generation method and device, electronic equipment and storage medium
CN111538817A (en) Man-machine interaction method and device
US20230206007A1 (en) Method for mining conversation content and method for generating conversation content evaluation model
CN116561284A (en) Intelligent response method, device, electronic equipment and medium
CN113743127B (en) Task type dialogue method, device, electronic equipment and storage medium
CN112506405B (en) Artificial intelligent voice large screen command method based on Internet supervision field
CN112614479B (en) Training data processing method and device and electronic equipment
CN118202344A (en) Deep learning technique for extracting embedded data from documents
CN114153948A (en) Question-answer knowledge base construction method, intelligent interaction method and device
CN114118937A (en) Information recommendation method and device based on task, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination