CN112632244A

CN112632244A - Man-machine conversation optimization method and device, computer equipment and storage medium

Info

Publication number: CN112632244A
Application number: CN202011504875.7A
Authority: CN
Inventors: 陈迎运
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-04-09

Abstract

The application discloses a man-machine conversation optimization method, a man-machine conversation optimization device, computer equipment and a storage medium, which belong to the technical field of artificial intelligence, training corpora are marked to obtain a training sample set, a preset initial recognition model is trained through the training sample set to obtain a user intention recognition model, an intention recognition instruction is received, a conversation text of a current conversation corresponding to the intention recognition instruction is obtained, the current conversation text is led into the user intention recognition model to obtain a user intention recognition result, whether the current conversation needs intervention of an artificial seat or not is determined based on the user intention recognition result, and if necessary, the current conversation is transferred to the artificial seat. In addition, the present application also relates to a block chain technology, and a call record of a current call can be stored in the block chain. The intention of the user in the current call is judged by constructing a user intention identification model, and whether the intervention of a human seat is needed or not is determined based on the intention of the user, so that the interaction experience and the working efficiency of the man-machine call are improved.

Description

Man-machine conversation optimization method and device, computer equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a man-machine conversation optimization method, a man-machine conversation optimization device, computer equipment and a storage medium.

Background

The man-machine conversation is used as an important bridge for communication between enterprises and clients, incoming calls and outgoing calls are accepted through various channels such as voice, text and video, good service experience is provided for the users, and the man-machine conversation plays a key role in maintaining the clients and improving services for the enterprises. In recent years, breakthroughs in Artificial Intelligence technology bring new opportunities for the development of man-machine conversation, and Artificial Intelligence (AI) technology is introduced, so that an AI robot assists an Artificial seat, and the AI robot replaces the Artificial seat to a certain extent, which becomes a development trend of the customer service industry. However, under the influence of the current intelligent speech and natural language processing technology level, the interactive capability of the current AI robot cannot completely replace the manual seat in a complex language environment and different service backgrounds, and the AI robot often needs the participation of the manual seat in the process of processing the user service.

At present, in the process of processing user services by an AI robot, if a user is not satisfied with the services of the AI robot, the user may actively request to transfer a call to a human agent for processing, and the human agent learns user appeal through historical interaction records or by communicating again, and completes the acceptance of subsequent services of the user. However, in the process of processing a call, a user needs to actively propose a manual seat, and needs to describe a service demand to the manual seat again, and at the same time, the manual seat needs to spend time understanding the service demand and is responsible for accepting subsequent services of the user, and the call occupies a long time on the manual seat, which affects the working efficiency of the manual seat and the overall service level of the system, and causes low call processing efficiency.

Disclosure of Invention

The embodiment of the application aims to provide a man-machine conversation optimization method, a man-machine conversation optimization device, computer equipment and a storage medium, so as to solve the technical problem that the existing man-machine conversation scheme cannot achieve seamless switching between a conversation robot and an artificial seat in the conversation process, and the user experience is poor.

In order to solve the above technical problem, an embodiment of the present application provides a method for optimizing a man-machine call, which adopts the following technical solutions:

a man-machine conversation optimization method comprises the following steps:

acquiring a training corpus from a preset historical corpus, and labeling the training corpus to obtain a training sample set, wherein the training corpus is a historical call text generated when a user stored in the historical corpus communicates with a call robot;

training a preset initial recognition model through a training sample set to obtain a trained user intention recognition model;

receiving an intention identification instruction, acquiring a call record of the current call corresponding to the intention identification instruction, and analyzing the call record of the current call in real time to obtain a current call text;

importing the current call text into a user intention recognition model to obtain a user intention recognition result;

and determining whether the current call needs the intervention of a manual agent or not based on the user intention recognition result, and switching the current call to the manual agent if the current call needs the intervention of the manual agent.

Further, obtain the training corpus from presetting historical corpus to label the training corpus, obtain the step of training sample set, specifically include:

acquiring a training corpus from a preset historical corpus, and preprocessing the training corpus;

labeling the preprocessed training corpora, randomly combining the labeled training corpora to generate a training sample set and a verification data set corresponding to the training sample set.

Further, training a preset initial recognition model through a training sample set to obtain a trained user intention recognition model, specifically comprising:

performing word segmentation processing on the training corpus, and performing vector feature conversion processing on all obtained segmented words to obtain corresponding word vectors;

performing convolution calculation on all the word vectors, and extracting characteristic data corresponding to each word vector;

calculating the similarity between the feature data corresponding to each word vector and a preset intention label, sequencing all the calculated similarities, and outputting the recognition result with the maximum similarity as the intention recognition result corresponding to the training corpus;

and iteratively updating the initial recognition model based on the intention recognition result corresponding to the training corpus to obtain a trained user intention recognition model.

Further, the step of performing word segmentation processing on the training corpus and performing vector feature conversion processing on all the obtained segmented words to obtain corresponding word vectors specifically includes:

performing word segmentation processing on the training corpus to obtain a plurality of word segmentation vocabularies;

carrying out duplication removal operation on a plurality of word segmentation vocabularies obtained by word segmentation;

and performing vector characteristic conversion processing on all word segmentation vocabularies left after the duplication removing operation to obtain corresponding word vectors.

Further, the step of iteratively updating the initial recognition model based on the intention recognition result corresponding to the training corpus to obtain a trained user intention recognition model specifically includes:

fitting by using a back propagation algorithm based on the intention recognition result corresponding to the training corpus and a preset standard result to obtain a recognition error;

comparing the identification error with a preset threshold, if the identification error is larger than the preset threshold, iteratively updating the initial identification model until the identification error is smaller than or equal to the preset threshold, and obtaining a user intention identification model;

and outputting the user intention recognition model with the recognition error smaller than or equal to a preset threshold value.

Further, before the steps of receiving the intention identification instruction, obtaining a call record of the current call corresponding to the intention identification instruction, analyzing the call record of the current call in real time, and obtaining a current call text, the method further comprises:

collecting the call record of each artificial seat, and extracting the sound information of each artificial seat according to the collected call record;

and constructing a voice information set of the call robot based on the voice information of each human agent.

Further, after the voice information set of the call robot is constructed based on the voice information of each human agent, the method further comprises the following steps:

receiving a call instruction initiated by a user, and determining a target artificial seat corresponding to the call instruction;

acquiring the voice information of a target artificial seat from the voice information set, and adjusting the model parameters of a pre-trained TTS model based on the voice information of the target artificial seat;

and establishing a call with the user based on the TTS model after the model parameters are adjusted.

In order to solve the above technical problem, an embodiment of the present application further provides an optimization apparatus for man-machine communication, which adopts the following technical solutions:

an apparatus for optimizing a human-machine conversation, comprising:

the system comprises a labeling module, a searching module and a communication module, wherein the labeling module is used for acquiring training corpora from a preset historical corpus and labeling the training corpora to obtain a training sample set, and the training corpora are historical communication texts generated when a user stored in the historical corpus communicates with a communication robot;

the training module is used for training a preset initial recognition model through a training sample set to obtain a trained user intention recognition model;

the analysis model is used for receiving the intention identification instruction, acquiring the call record of the current call corresponding to the intention identification instruction, and analyzing the call record of the current call in real time to obtain a current call text;

the recognition module is used for importing the current call text into the user intention recognition model to obtain a user intention recognition result;

and the judging module is used for determining whether the current call needs the intervention of a human agent or not based on the user intention recognition result, and switching the current call to the human agent if the current call needs the intervention of the human agent.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, carry out the steps of the method of optimising a human-machine conversation as claimed in any one of the preceding claims.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the steps of the method of optimizing a human-machine conversation as in any one of the above.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

the application discloses an optimization method, an optimization device, computer equipment and a storage medium for man-machine conversation, which belong to the technical field of artificial intelligence. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 illustrates an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 illustrates a flow diagram of one embodiment of a method for optimization of human-machine calls in accordance with the present application;

FIG. 3 is a schematic diagram illustrating an embodiment of an apparatus for optimizing a human-machine call according to the present application;

FIG. 4 shows a schematic block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

At present, in the process of processing user services by an AI robot, if a user is not satisfied with the services of the AI robot, the user may actively request to transfer a call to a human agent for processing, and the human agent learns user appeal through historical interaction records or by communicating again, and completes the acceptance of subsequent services of the user. However, in the process of processing a call, a user needs to actively propose a manual seat, and needs to describe a service demand to the manual seat again, and at the same time, the manual seat needs to spend time understanding the service demand and is responsible for accepting subsequent services of the user, and the call occupies a long time on the manual seat, which affects the working efficiency of the manual seat and the overall service level of the system, and causes low call processing efficiency. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. The following describes a method, an apparatus, a computer device, and a storage medium for optimizing a man-machine call disclosed in the present application in detail.

It should be noted that the optimization method for man-machine call provided in the embodiments of the present application is generally executed by a server, and accordingly, the optimization device for man-machine call is generally disposed in the server.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for optimization of human-machine calls in accordance with the present application is shown. The optimization method of the man-machine conversation comprises the following steps:

s201, obtaining a training corpus from a preset historical corpus, and labeling the training corpus to obtain a training sample set, wherein the training corpus is a historical call text generated when a user stored in the historical corpus communicates with a call robot.

Specifically, all historical conversation texts generated by communication between a user and a conversation robot are stored in a preset historical corpus, a training corpus is obtained from the preset historical corpus, the training corpus is labeled, a training sample set is obtained, and a verification data set corresponding to the training sample set is obtained. The training sample set is used for training the user intention recognition model, and the verification data set is used for verifying the trained user intention recognition model.

S202, training a preset initial recognition model through a training sample set to obtain a trained user intention recognition model.

The preset initial recognition model adopts a deep Convolutional Neural network model, and a Convolutional Neural Network (CNN) is a feed forward Neural network (fed Neural network) containing convolution calculation and having a deep structure, and is one of the representative algorithms of deep learning (deep learning). Convolutional neural networks have a feature learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure thereof, and are also called "shift-invariant artificial neural networks". The convolutional neural network is constructed by imitating a visual perception (visual perception) mechanism of a living being, can perform supervised learning and unsupervised learning, and has stable effect and no additional characteristic engineering requirement on data, and the convolutional kernel parameter sharing in a convolutional layer and the sparsity of interlayer connection enable the convolutional neural network to learn grid-like topology (pixels and audio) features with small calculation amount.

Specifically, after a training sample set is obtained, a preset initial recognition model is trained by using the obtained training sample set to obtain a user intention recognition model, wherein the preset initial recognition model adopts a deep convolutional neural network model CNN. The user intention recognition model is used for recognizing the intention of the user in the conversation process of the conversation robot and the user. In a specific embodiment of the application, for example, in a collection scene, the user intention recognition model is used for recognizing whether the user has a payment intention or not in the conversation process.

S203, receiving the intention identification instruction, acquiring the call record of the current call corresponding to the intention identification instruction, and analyzing the call record of the current call in real time to obtain a current call text.

Specifically, when the man-machine call is generated, the server receives the intention identification instruction, obtains a call record of the current call corresponding to the intention identification instruction, analyzes the call record of the current call in real time, performs voice-to-text processing on the call record of the current call to obtain a current call text, and the trained user intention identification model can identify the current intention of the user based on the current call text. If in the collection scene of the man-machine conversation, when a user makes a call and the conversation robot carries out voice communication, the conversation robot sends the voice information of the conversation to the server in real time, and the server analyzes the voice information of the current conversation in real time to obtain the current conversation text.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the optimization method for man-machine communication operates may receive the intention identification instruction through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

And S204, importing the current call text into the user intention identification model to obtain a user intention identification result.

Specifically, the current call text obtained in the above steps is sent to a user intention recognition model for user intention recognition, so as to analyze the user intention and obtain a user intention recognition result. For example, in a scene that the call robot hastens to accept, a sentence "i is not repayment of XX loan" appears many times, the following word segmentation words are obtained by performing word segmentation processing on the sentence and screening the word segmentation processing result: the 'I', 'Do', 'repayment' and 'XX loan' are obtained by inputting the word segmentation word results into a user intention recognition model, matching each word segmentation word with an intention label through the user intention recognition model to obtain intention classification, and obtaining a negative intention of the user for repayment through intention recognition.

And S205, determining whether the current call needs the intervention of a human agent or not based on the user intention recognition result, and switching the current call to the human agent if the current call needs the intervention of the human agent.

Specifically, in the collection prompting scene, after determining that the user has some preset intentions (such as directly hanging up the call or refusing to pay), the call switching request is sent to the server, the server searches for an artificial seat matched with the current call after receiving the call switching request, and then the call robot displays the content communicated with the user to the artificial seat in real time, so that the artificial seat can conveniently know the context information of the whole call, and meanwhile, the artificial seat can judge when to manually cut in according to the content of man-machine communication.

In addition, it should be noted that after the manual seat is switched in to communicate with the customer, the communication robot still can acquire communication contents in real time and display the communication contents to the manual seat for assisting the manual seat to communicate with the user, and an auxiliary prompt of a standard answer is provided for some problems.

The embodiment discloses a man-machine conversation optimization method, which belongs to the technical field of artificial intelligence, wherein a historical corpus is acquired to train a user intention recognition model, the intention of a user in the current conversation is judged through the user intention recognition model, whether the previous conversation needs the intervention of an artificial seat or not is determined based on the intention of the user, and if the intervention of the artificial seat is needed, the current conversation is transferred to the artificial seat. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced.

The training corpus is a historical conversation text generated when a user communicates with the conversation robot and stored in the historical corpus, and preprocessing of the training corpus at least comprises text error correction, text duplication removal, punctuation mark removal, tone word removal and the like.

Specifically, a training corpus is obtained from a historical corpus, the training corpus is preprocessed, the preprocessed training corpus is labeled, the labeled training corpus is randomly combined to obtain a training sample set and a verification data set, and the training sample set and the verification data set are stored in the preset historical corpus.

For example, the labeled corpus may be randomly divided into 10 equal sample subsets, wherein 9 sample subsets are randomly obtained and combined to form a training sample set, and the remaining 1 sample subset is used as a verification data set. And importing the training sample set into the initial recognition model for model training to obtain a trained user intention recognition model, verifying the trained user intention recognition model through a verification data set, and outputting a verified emotion recognition model. In the above embodiment, the user intention recognition model can be quickly obtained by constructing the training sample set and the verification data set, and training and verifying the initial recognition model through the training sample set and the verification data set, respectively.

The preset initial identification model comprises an embedded layer, a convolution layer and a full connection layer. After a training sample set is imported into a CNN model, firstly, word segmentation processing and vector feature conversion processing are carried out on a training corpus of the training sample set in an embedding layer of the CNN to obtain a word vector corresponding to each word segmentation word in the training corpus, then, the word vector corresponding to each word segmentation word in the training corpus is respectively input into a convolution layer of the CNN to carry out feature extraction, finally, the similarity between feature data and a preset intention label is calculated in a full connection layer of the CNN, a recognition result with the maximum similarity is output to serve as an intention recognition result corresponding to the training corpus, and the intention classification is realized through a softmax function. In the above embodiment, the embedded layer, the convolutional layer, and the fully-connected layer of the initial recognition model are trained through the training sample set, so as to obtain the user intention recognition model.

The Word2Vec model can be trained to perform vector feature conversion processing on the Word vocabulary to obtain corresponding Word vectors. Word2vec is a number of correlation models used to generate Word vectors, these models being shallow, two-level neural networks used to train on reconstructing linguistic Word text, and the Word2vec model can be used to map each Word to a vector. The Word2Vec Word embedding mode is one of the input modes of the embedding layer (embedding) of the user intention recognition model, and the embedding layer (embedding) is a general term of a corpus learning and feature learning technology in the CNN model, namely, words or phrases in a vocabulary are mapped to vectors formed by real numbers. In the specific embodiment of the application, all the obtained Word segmentation vocabularies are converted into a vector form in a Word2Vec mode. In particular, Word2Vec can quickly and efficiently express a Word in vector form, and the description is not expanded here.

The back propagation algorithm, namely a back propagation algorithm (BP algorithm), is a learning algorithm suitable for a multi-layer neuron network, and is established on the basis of a gradient descent method and used for error calculation of a deep learning network. The input and output relationship of the BP network is essentially a mapping relationship: an n-input m-output BP neural network performs the function of continuous mapping from n-dimensional euclidean space to a finite field in m-dimensional euclidean space, which is highly non-linear. The learning process of the BP algorithm consists of a forward propagation process and a backward propagation process. In the forward propagation process, input information passes through the hidden layer through the input layer, is processed layer by layer and is transmitted to the output layer, the backward propagation is converted, the partial derivatives of the target function to the weight of each neuron are calculated layer by layer, and the gradient of the target function to the weight vector is formed to be used as the basis for modifying the weight.

Specifically, a loss function of the user intention recognition model is established, fitting calculation is carried out by using a back propagation algorithm based on an intention recognition result and a preset standard result, a recognition error is obtained, the recognition error is compared with a preset error threshold, if the recognition error is larger than the preset error threshold, iterative updating is carried out on the trained user intention recognition model based on the loss function of the user intention recognition model until the recognition error is smaller than or equal to the preset error threshold, and the verified user intention recognition model is obtained. The preset standard result and the preset error threshold value can be set in advance. In the above embodiment, the trained user intention recognition model is verified and iterated through a back propagation algorithm, so as to obtain a user intention recognition model meeting the requirements.

When the initial recognition model is constructed, a corresponding loss function is set, wherein the loss function is cross entropy, when the user intention recognition model is trained, the trained user intention recognition model is iteratively updated through a back propagation algorithm, and both the construction and the training of the user intention recognition model can be completed in a tensoflow library in Python.

The TTS is also called Text To Speech (TTS) technology, can convert any Text information into standard smooth Speech for reading in real time, and is equivalent to mounting an artificial mouth on a machine. The method relates to a plurality of subject technologies such as acoustics, linguistics, digital signal processing, computer science and the like, is a leading-edge technology in the field of Chinese information processing, and solves the main problem of how to convert character information into audible sound information, namely, to enable a machine to speak like a person. What we say "let the machine speak like a human mouth" is essentially different from a conventional sound playback apparatus (system). Conventional sound playback devices (systems), such as tape recorders, "let the machine speak" by prerecording the sound and then playing it back. This approach has significant limitations in terms of content, storage, transmission or convenience, timeliness, and the like. And any text can be converted into the speech with high naturalness at any time through computer speech synthesis, so that the machine can speak like a human.

In the specific implementation manner of the present application, the embodiment is described in combination with a collection prompting scene, and the traditional man-machine combination collection prompting manner is that a collection prompting robot always touches a client first, and a human seat intervenes in communication again under the condition of poor communication, so that the communication time is increased, and the client is unwilling to wait and can directly hang up the phone call. And the sound tone changes appear in the front-back communication, so that the client can obviously feel the participation of the conversation robot to influence the conversation experience, and the answering intention of the user to call is reduced.

Specifically, a call record of each human agent is collected, sound information of each human agent is extracted through the collected call record, and a sound information set of the call robot is constructed based on the sound information of each human agent. When a user initiates a call, receiving a call instruction initiated by the user, determining a target artificial seat corresponding to the call instruction, acquiring voice information of the target artificial seat from a voice information set, adjusting model parameters of a pre-trained TTS model based on the voice information of the target artificial seat, and establishing a call with the user based on the TTS model after model parameter adjustment.

In the above embodiment, the TTS model of the call robot is adjusted by collecting the voice information of the human seat and adjusting the voice information of the human seat, so that the tone used by the call robot is consistent with the tone of the human seat in the call process, and therefore, a user does not feel the situation of switching between the human and the machine in the call process, and the call experience of the user is improved.

In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the optimization method for man-machine conversation operates may be a conversation instruction initiated by a user through a wired connection mode or a wireless connection mode. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

It is emphasized that, in order to further ensure the privacy and security of the call record of the current call, the call record of the current call may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an optimization apparatus for man-machine conversation, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 3, the device for optimizing a man-machine call according to this embodiment includes:

a labeling module 301, configured to obtain a corpus from a preset historical corpus, and label the corpus to obtain a training sample set, where the corpus is a historical call text generated when a user stored in the historical corpus communicates with a call robot;

a training module 302, configured to train a preset initial recognition model through a training sample set, so as to obtain a trained user intention recognition model;

the analysis model 303 is used for receiving the intention identification instruction, acquiring a call record of the current call corresponding to the intention identification instruction, and analyzing the call record of the current call in real time to obtain a current call text;

the recognition module 304 is configured to import the current call text into the user intention recognition model to obtain a user intention recognition result;

a determining module 305, configured to determine whether the current call needs intervention of a human agent based on the user intention recognition result, and if so, forward the current call to the human agent.

Further, the labeling module 301 specifically includes:

the corpus acquiring unit is used for acquiring training corpuses from a preset historical corpus and preprocessing the training corpuses;

and the corpus labeling unit is used for labeling the preprocessed training corpus, randomly combining the labeled training corpus, and generating a training sample set and a verification data set corresponding to the training sample set.

Further, the training module 302 specifically includes:

the vector feature conversion unit is used for performing word segmentation processing on the training corpus and performing vector feature conversion processing on all the obtained segmented words to obtain corresponding word vectors;

the convolution calculation unit is used for performing convolution calculation on all the word vectors and extracting the characteristic data corresponding to each word vector;

the similarity calculation unit is used for calculating the similarity between the feature data corresponding to each word vector and a preset intention label, sequencing all the calculated similarities, and outputting the recognition result with the maximum similarity as the intention recognition result corresponding to the training corpus;

and the iteration updating unit is used for performing iteration updating on the initial recognition model based on the intention recognition result corresponding to the training corpus to obtain a trained user intention recognition model.

Further, the vector feature conversion unit specifically includes:

the word segmentation subunit is used for carrying out word segmentation processing on the training corpus to obtain a plurality of word segmentation vocabularies;

the de-duplication subunit is used for performing de-duplication operation on a plurality of word segmentation vocabularies obtained by word segmentation;

and the vector characteristic conversion subunit is used for performing vector characteristic conversion processing on all the word segmentation vocabularies which are left after the duplication elimination operation to obtain corresponding word vectors.

Further, the iteration updating unit specifically includes:

the fitting subunit is used for fitting by using a back propagation algorithm based on the intention recognition result corresponding to the training corpus and a preset standard result to obtain a recognition error;

the iterative updating subunit is used for comparing the identification error with a preset threshold, and if the identification error is greater than the preset threshold, iteratively updating the initial identification model until the identification error is less than or equal to the preset threshold to obtain a user intention identification model;

and the model output subunit is used for outputting the user intention recognition model with the recognition error smaller than or equal to a preset threshold value.

Further, the optimization device for man-machine conversation further comprises:

the voice information acquisition module is used for acquiring the call record of each artificial seat and extracting the voice information of each artificial seat according to the acquired call record;

and the sound information set construction module is used for constructing a sound information set of the call robot based on the sound information of each human agent.

the call instruction receiving module is used for receiving a call instruction initiated by a user and determining a target artificial seat corresponding to the call instruction;

the TTS model parameter adjusting module is used for acquiring the voice information of the target artificial seat from the voice information set and adjusting the model parameters of the TTS model trained in advance based on the voice information of the target artificial seat;

and the call establishing module is used for establishing a call with the user based on the TTS model after the model parameter adjustment.

The embodiment discloses an optimization device for man-machine conversation, which belongs to the technical field of artificial intelligence, the optimization method for man-machine conversation comprises the steps of acquiring historical corpus to train a user intention recognition model, judging the intention of a user in the current conversation through the user intention recognition model, determining whether the previous conversation needs the intervention of an artificial seat or not based on the intention of the user, and switching the current conversation to the artificial seat if the intervention of the artificial seat is needed. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various application software, such as computer readable instructions of a man-machine call optimization method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, computer readable instructions for executing the optimization method of the man-machine call.

The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.

The embodiment discloses computer equipment, belongs to the technical field of artificial intelligence, and the optimization method of the man-machine conversation comprises the steps of acquiring historical corpus to train a user intention recognition model, judging the intention of a user in the current conversation through the user intention recognition model, determining whether the previous conversation needs the intervention of a human agent or not based on the intention of the user, and switching the current conversation to the human agent if the intervention of the human agent is needed. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced.

The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the optimization method of man-machine conversation as described above.

The embodiment discloses a storage medium, which belongs to the technical field of artificial intelligence, the optimization method of the man-machine conversation comprises the steps of obtaining historical corpus to train a user intention recognition model, judging the intention of a user in the current conversation through the user intention recognition model, determining whether the previous conversation needs the intervention of a human agent or not based on the intention of the user, and switching the current conversation to the human agent if the intervention of the human agent is needed. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced. The application discloses an optimization method, an optimization device, computer equipment and a storage medium for man-machine conversation, which belong to the technical field of artificial intelligence. The method and the device aim at the defects of the existing man-machine conversation scheme, the scheme for realizing seamless switching between conversation robots and manual seat conversation through user intention identification is provided, and interaction experience and working efficiency of man-machine conversation are improved. In addition, this proposal still gathers the voice message of artifical seat, adjusts conversation robot TTS model through the voice message of artifical seat for in the conversation process, guarantee that the tone color that conversation robot used is unanimous with the tone color of artifical seat, consequently can not let the user feel the condition of having the man-machine to switch in the conversation process, improved user's conversation and experienced.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for optimizing a man-machine call, comprising:

training a preset initial recognition model through the training sample set to obtain a trained user intention recognition model;

receiving an intention identification instruction, acquiring a call record of a current call corresponding to the intention identification instruction, and analyzing the call record of the current call in real time to obtain a current call text;

importing the current call text into the user intention recognition model to obtain a user intention recognition result;

and determining whether the current call needs the intervention of a human agent or not based on the user intention recognition result, and if so, switching the current call to the human agent.

2. The method for optimizing human-machine communication according to claim 1, wherein the step of obtaining a corpus from a preset historical corpus and labeling the corpus to obtain a training sample set specifically comprises:

acquiring a training corpus from the preset historical corpus, and preprocessing the training corpus;

3. The method for optimizing human-machine communication according to claim 1, wherein the step of training a preset initial recognition model through the training sample set to obtain a trained user intention recognition model specifically comprises:

4. The method according to claim 3, wherein the step of performing word segmentation processing on the corpus and performing vector feature transformation processing on all the obtained segmented words to obtain corresponding word vectors specifically comprises:

5. The method according to claim 3, wherein the step of iteratively updating the initial recognition model based on the intention recognition result corresponding to the corpus to obtain the trained user intention recognition model specifically comprises:

6. The method for optimizing human-machine communication according to any one of claims 1 to 5, wherein before the steps of receiving the intention identifying instruction, obtaining the communication record of the current communication corresponding to the intention identifying instruction, and analyzing the communication record of the current communication in real time to obtain the current communication text, the method further comprises:

collecting a call record of each artificial seat, and extracting sound information of each artificial seat according to the collected call record;

7. The method for optimizing human-machine conversation according to claim 6, wherein after said constructing the set of voice information of the conversation robot based on the voice information of each human agent, further comprising:

receiving a call instruction initiated by a user, and determining a target human seat corresponding to the call instruction;

acquiring the voice information of the target artificial seat from the voice information set, and adjusting the model parameters of a pre-trained TTS model based on the voice information of the target artificial seat;

and establishing a call with the user based on the TTS model after model parameter adjustment.

8. An apparatus for optimizing a human-machine conversation, comprising:

the system comprises a marking module, a searching module and a processing module, wherein the marking module is used for acquiring a training corpus from a preset historical corpus and marking the training corpus to obtain a training sample set, and the training corpus is a historical conversation text generated when a user stored in the historical corpus communicates with a conversation robot;

the training module is used for training a preset initial recognition model through the training sample set to obtain a trained user intention recognition model;

and the judging module is used for determining whether the current call needs the intervention of a human agent or not based on the user intention identification result, and if the current call needs the intervention of the human agent, the current call is transferred to the human agent.

9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the method of optimizing human-machine conversation according to any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the method for optimizing a human-machine conversation as claimed in any one of claims 1 to 7.