CN114615378A - Call connection method and device, intelligent voice platform and storage medium - Google Patents

Call connection method and device, intelligent voice platform and storage medium Download PDF

Info

Publication number
CN114615378A
CN114615378A CN202210240276.1A CN202210240276A CN114615378A CN 114615378 A CN114615378 A CN 114615378A CN 202210240276 A CN202210240276 A CN 202210240276A CN 114615378 A CN114615378 A CN 114615378A
Authority
CN
China
Prior art keywords
feature
text
historical
target
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210240276.1A
Other languages
Chinese (zh)
Inventor
赖咸立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202210240276.1A priority Critical patent/CN114615378A/en
Publication of CN114615378A publication Critical patent/CN114615378A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application is suitable for the technical field of artificial intelligence, and provides a call connection method, a call connection device, an intelligent voice platform and a storage medium, wherein the method is applied to the intelligent voice platform and comprises the following steps: extracting current text features of current text information input by a user and historical text features of historical text information by adopting a neural network model; the neural network model comprises an attention mechanism layer and a classification layer; fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features; inputting target text characteristics into a classification layer to obtain a target result; and if the target result is the result of establishing the call connection between the user and the manual seat, establishing the call connection between the user and the manual seat. By adopting the method, the time cost of the user in communication can be reduced.

Description

Call connection method and device, intelligent voice platform and storage medium
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to a call connection method and device, an intelligent voice platform and a storage medium.
Background
At present, with the development of intelligent voice technology and text semantic recognition technology, when a business return visit is performed, an intelligent voice robot is generally adopted to establish a call connection with a client so as to perform the return visit on the client.
The intelligent voice robot is preset with a plurality of questions and corresponding answers in each dialogue scene, and can intelligently reply after identifying the question that the user wants to consult. And then, when the intelligent voice robot cannot answer the customer questions comprehensively, the call is transferred to the manual seat based on the manual service actively selected by the customer.
However, in practical situations, there is a limitation in setting a dialog scene, and the intelligent voice robot only performs semantic parsing on a text input by a user at the current time to intelligently select a corresponding reply content. However, in the process of vector processing of the text, it is difficult to accurately identify the problem that the user wants to consult, so that the user will spend a lot of time to communicate with the intelligent voice robot. Moreover, when the intelligent voice robot cannot answer the questions of the customer comprehensively, the time cost of the user is wasted by carrying out multiple rounds of communication with the user.
Disclosure of Invention
The embodiment of the application provides a call connection method, a call connection device, an intelligent voice platform and a storage medium, and can solve the problem that a large amount of time cost of a user is wasted when the existing intelligent voice robot processes the user problem.
In a first aspect, an embodiment of the present application provides a call connection method, which is applied to an intelligent voice platform, and the method includes:
extracting current text features of current text information input by a user and historical text features of historical text information by adopting a neural network model; the neural network model comprises an attention mechanism layer and a classification layer;
fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features;
inputting target text characteristics into a classification layer to obtain a target result;
and if the target result is the result of establishing the call connection between the user and the artificial seat, establishing the call connection between the user and the artificial seat.
In a second aspect, an embodiment of the present application provides a call connection device, which is applied to an intelligent voice platform, and the device includes:
the extraction module is used for extracting the current text characteristics of the current text information input by the user and the historical text characteristics of the historical text information by adopting a neural network model; the neural network model comprises an attention mechanism layer and a classification layer;
the processing module is used for fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features;
the input module is used for inputting the target text characteristics into the classification layer to obtain a target result;
and the switching module is used for establishing the call connection between the user and the artificial seat if the target result is the result of establishing the call connection between the user and the artificial seat.
In a third aspect, an embodiment of the present application provides an intelligent speech platform, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to the first aspect is implemented.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when running on an intelligent speech platform, causes the intelligent speech platform to perform the method of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: the intelligent voice platform can extract the current text features of the current text information input by the user according to an internal neural network model, and process the current text features and the historical text features of each piece of historical text information by utilizing an attention mechanism layer so as to determine the influence value of each historical text feature on the current text features respectively. Furthermore, when the neural network model processes the historical text features and the current text features, more detailed feature information related to the current text features can be extracted from each historical text feature according to the influence value. Therefore, the accuracy of recognizing the semantics of the current text information by the neural network model is improved. And when the intelligent voice platform determines that the target result is the result of establishing the call connection between the user and the artificial seat, the call can be intelligently switched to the artificial seat, so that the problems encountered by the user can be solved in time, the time cost of the user in communication is reduced, the user can be switched in a non-sensitive manner, and the use experience of the user is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart illustrating an implementation of a call connection method according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating an implementation manner of S102 of a call connection method according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating an implementation manner of S1021 in a call connection method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an implementation manner of S1022 in a call connection method according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating an implementation manner of S103 of a call connection method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a call connection device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an intelligent speech platform according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a call connection method according to an embodiment of the present application, where the method includes the following steps:
s101, extracting current text characteristics of current text information input by a user and historical text characteristics of historical text information by an intelligent voice platform by adopting a neural network model; the neural network model includes an attention mechanism layer and a classification layer.
In an embodiment, the main body for executing the method is an intelligent voice platform. The intelligent voice platform is provided with a plurality of questions and corresponding answers under each conversation process in advance, so that the questions which the user wants to consult can be intelligently identified and then replied. Meanwhile, when the user questions cannot be comprehensively answered, the call can be switched to the artificial seat based on the artificial service selected by the user or when the user is intelligently judged to have the intention of switching the artificial service.
In an embodiment, the current text information is information input by the user at the current moment, and may be a problem input on the intelligent voice platform when the user consults a specific service. The intelligent voice platform is preset with a plurality of conversation processes, and each conversation process corresponds to one service. Specifically, the conversation process includes a plurality of preset consultation questions and answers corresponding to each consultation question. And then, the intelligent voice platform determines a corresponding conversation process according to the service scene selected by the user. Then, the current text information input by the user at the current moment is processed according to the neural network model, so that the answer which the user wants to consult is intelligently output from the conversation process.
It should be noted that the intelligent voice platform may store the text information input by the user in real time, so as to obtain the historical text information before the current time. It can be understood that the intelligent voice platform can also store the dialogue time of each text message at the same time, so that when a plurality of historical text messages exist, the historical text messages can be distinguished one by one according to the dialogue time of each historical text message.
It should be noted that, the user may also have a conversation with the intelligent voice platform through voice, and then the intelligent voice platform may convert the voice content spoken by the user into text content for subsequent processing based on the existing voice recognition technology. Then, when the user's question is replied, not only the answer in the form of text can be returned to the user, but also the answer can be replied to the user in the form of voice broadcast, and the answer is not limited.
It should be added that, when there are a plurality of historical text information, there should be a plurality of historical text features, and each historical text feature can be expressed as:F(i)=(f1,f2,...,fi-1) Wherein i is the current time, F (i) is a plurality of historical text feature sets before the time i, fi-1Is the historical text feature at the i-1 th moment.
In an embodiment, the neural network model includes, but is not limited to, a transform neural network model, and a long-term and short-term memory neural network model. The neural network model comprises an attention mechanism layer used for processing the current text features and the historical text features, and a classification layer used for performing classification prediction on the current text features and the historical text features.
In an embodiment, the neural network model should also include a processing layer that performs feature processing on the current textual information. The processing layer can perform word segmentation on the current text information according to a preset word vector library, and obtain a word vector of each word segmentation. And then, fusing the word vectors to obtain the current text characteristics of the current text information.
S102, the intelligent voice platform fuses the historical text features and the current text features based on the attention mechanism layer to generate target text features.
In an embodiment, the attention mechanism layer may change the weight distribution corresponding to the historical text features and the current text features, respectively, so as to fuse the historical text features and the current text features according to the corresponding weights, thereby obtaining the target text features.
In a specific embodiment, when the historical text information has a plurality of pieces, and each piece of historical text information corresponds to a historical text feature, referring to fig. 2, the intelligent speech platform may generate the target text feature through the following sub-steps S1021 to S1022:
and S1021, the intelligent voice platform respectively calculates the influence value of each historical text characteristic on the current text characteristic based on the attention mechanism layer.
In one embodiment, the attention mechanism layer may obtain the target text feature input by the next layer by performing weighted average on hidden information in all the historical text features. Namely, the influence degree of each historical text feature on the current text feature can be calculated by the attention mechanism layer, so that the neural network model can extract more detailed feature information from each historical text feature according to the influence degree.
Specifically, referring to fig. 3, in S1021, calculating the influence value of each historical text feature on the current text feature based on the attention mechanism layer, the following sub-steps S1211 to S1214 may be specifically implemented:
s1211, the intelligent voice platform determines the maximum value of the feature dimension from feature dimensions corresponding to each historical text feature and the current text feature respectively.
And S1212, respectively unifying feature dimensions respectively corresponding to each historical text feature and the current text feature according to the feature dimension maximum value by the intelligent voice platform.
S1213, the intelligent voice platform respectively calculates the similarity of the historical text features and the current text features after each unified feature dimension.
S1214, aiming at the feature similarity corresponding to any one historical text feature, the intelligent voice platform calculates the influence value of the historical text feature on the current text feature according to the feature similarity and the sum of the feature similarities of all the historical text features.
In an embodiment, since the historical text information corresponding to the historical text features is generally different from the current text information corresponding to the current text features, when the neural network model performs feature processing on the text information and the historical text information respectively, the number of the corresponding participles is different, and the word vector corresponding to each participle is also different. Based on this, the feature dimensions corresponding to the final current text feature and the historical text feature respectively should also be different.
Therefore, for the convenience of subsequent processing, the neural network model can also respectively unify the feature dimensions of each historical text feature and the current text feature. Specifically, the word vector of each word segmentation can be stored in the intelligent voice platform in advance. For any text information, after the word segmentation is carried out on the text information, the text characteristics consisting of a plurality of word vectors can be obtained. Illustratively, the text features may be in the form of [330, 467, 212, 5331,.., 419], etc., and each number may be represented as a word vector of a participle.
Based on this, when the number of the segmented words corresponding to each text message is inconsistent, the word vector dimensions contained in the text features of the text messages should be different. It can be understood that the similarity calculation cannot be performed because the feature dimensions of the current text feature and the historical text features are different. Thus, the intelligent speech platform may unify the feature dimensions of each text feature. For example, the maximum value of the feature dimension is determined from the feature dimension corresponding to each historical text feature and the feature dimension corresponding to the current text feature. And then, for the text features with the feature dimension lower than the maximum value of the feature dimension, the text features can be supplemented in a form of a numerical value 0 or 1 at the tail of the text features to increase the dimension of word vector data of the text features, and further unify the feature dimensions corresponding to each historical text feature and the current text feature respectively.
Then, when calculating the similarity, because the feature dimensions corresponding to the historical text features and the current text features are consistent, the similarity calculation method based on the hash method, the euclidean distance or the cosine similarity calculation, and the like can be adopted for calculation, and the calculation is not limited to this.
In an embodiment, after obtaining the similarity, the intelligent speech platform may calculate an influence value of each historical text feature on the current text feature by using the following formula:
Figure BDA0003541235050000071
wherein, WijThe influence value of the jth historical text characteristic on the current text characteristic is obtained; n is a radical ofijThe feature similarity of the jth historical text feature and the current text feature is obtained; n is the number of the historical text features, and i is the current text feature.
It should be noted that, when the above formula is used to calculate the influence value of each historical text feature on the current text feature, if the corresponding similarity value is higher, the finally calculated influence value is also higher. I.e., indicating that the historical text features have a greater degree of influence on the current text features. Therefore, the neural network can focus more attention on the historical text features with the largest influence value so as to extract more detail feature information related to the current text features.
And S1022, the intelligent voice platform performs feature fusion on each historical text feature and the current text feature according to the influence value to obtain a target text feature.
In an embodiment, after obtaining the influence value of each historical text feature and the current text feature, each historical text feature may be weighted, and the weighted historical text feature and the current text feature are fused to obtain the target text feature. At this time, the target text feature not only contains the detail feature information extracted from the plurality of historical text features, but also includes the main feature information of the current text feature.
In a specific embodiment, referring to fig. 4, the intelligent speech platform may perform feature fusion on each historical text feature and the current text feature according to the following sub-steps S1221 to S1222 to obtain a target text feature, specifically:
s1221, the intelligent voice platform weights the historical text features according to the influence values to obtain target historical text features.
S1222, the intelligent voice platform fuses the target historical text features and the current text features to generate target text features.
In an embodiment, the weighting process of the plurality of historical text features according to the influence value is explained in the above S103, which is not described again. It can be understood that, in general, for the dialogue scene of the consultation service, the questions that the user wants to consult are gradually progressive. That is, the swing text information input at the current time usually has a certain relationship with the historical text information input before.
Therefore, the neural network model performs weighting processing on each historical text based on the influence value, so that after the target historical text characteristics finally contain detail characteristic information in all historical text characteristics, the current text characteristics and the target historical text characteristics can be further fused, the target text characteristics can further strengthen the relation between the consulted problems of the user in a continuous time, and the accuracy of the neural network model in identifying the user intention is improved.
In an embodiment, the target historical text feature and the current text feature are fused, so that after the target historical text feature is spliced to the current text feature, vector splicing processing is performed to obtain the target text feature; the target text feature may also be obtained by splicing the target historical text feature to a position before the current text feature, which is not limited in this respect.
In this embodiment, in order to better obtain the intention of the user at the current time, the target historical text feature and the current text feature may be spliced in the following manner:
fz=Wi*tanh(fi)+(1-Wx)*tanh(fx)
wherein f iszAs target text features, fiAs a current text feature, fxFor target historical text features, WiFor the preset weight value of the current text feature, tanh is the activation function, and is the hadamard product.
It is to be added that the purpose of fusing the current text feature and the target historical text feature by using the above formula is as follows: the neural network model can enable the current text features and the target historical text features to generate countermeasures, so that interaction of the current text features and the target historical text features during mutual fusion can be reasonably enhanced, and the accuracy of the classification layer for identifying the intention of the user according to the target text features is improved.
S103, inputting the target text characteristics into the classification layer by the intelligent voice platform to obtain a target result.
And S104, if the target result is the result of establishing the call connection between the user and the artificial seat, the intelligent voice platform establishes the call connection between the user and the artificial seat.
In an embodiment, the target result includes a result of establishing a call connection between the user and the human agent and a result of not establishing a call connection between the user and the human agent. It can be appreciated that the intelligent voice platform can transfer the intelligent voice call to the human agent when the target result is a result of establishing a call connection of the user with the human agent.
Specifically, the intelligent voice platform may generate a transit task sheet and allocate the transit task sheet to an idle human agent. And then, when the manual seat is connected with the call of the user, the current text information and the historical text information can be pushed to a terminal interface used by the manual seat, so that a worker can know the problems encountered by the user.
It should be noted that, when the target result is a result of not establishing a call connection between the user and the human agent, the intelligent voice platform may further determine a preset problem with the highest matching degree with the current text information from a preset conversation process. Namely, the intelligent voice platform comprises a plurality of preset problems, and the classification layer can also identify the matching similarity between the current text information and each preset problem based on the target text characteristics. And then, sending the answer corresponding to the preset question with the highest matching similarity to a terminal interface of the user. It can be appreciated that the intelligent voice platform also needs to respond to the questions consulted by the user based on the preset conversation process intelligence, since the conversation connection does not need to be switched to the human agent.
In another embodiment, when the intelligent voice platform establishes a call connection between the human operator seat and the user, a preset sound changing module may be further used to change the sound of the worker in the human operator seat, so as to change the tone and the tone of the sound of the worker to be consistent with the tone and the tone of the preset voice in the intelligent voice platform. Therefore, the problems encountered by the user can be solved in time, and the user can be switched in a non-inductive manner, so that the use experience of the user is improved.
In this embodiment, the intelligent speech platform may extract a current text feature of the current text information input by the user according to an internal neural network model, and process the current text feature and a historical text feature of each piece of historical text information by using an attention mechanism layer to determine an influence value of each historical text feature on the current text feature. Furthermore, when the neural network model processes the historical text features and the current text features, more detailed feature information related to the current text features can be extracted from each historical text feature according to the influence value. Therefore, the accuracy of recognizing the semantics of the current text information by the neural network model is improved. And when the intelligent voice platform determines that the target result is the result of establishing the call connection between the user and the artificial seat, the call can be intelligently switched to the artificial seat, so that the problems encountered by the user can be solved in time, the time cost of the user in communication is reduced, the user can be switched in a non-sensitive manner, and the use experience of the user is improved.
In a specific embodiment, the classification layer comprises a two-classifier for identifying whether a user has a result of switching a human agent and a multi-classifier for identifying text information from a preset conversation process; referring to fig. 5, the intelligent voice platform may obtain the target result through the following sub-steps S1031 to S1033, specifically:
and S1031, inputting the target text features into a classifier by the intelligent voice platform to obtain a probability value for establishing the call connection between the user and the human seat. And the number of the first and second groups,
s1032, the intelligent voice platform inputs the target text features into the multiple classifiers to obtain the matching similarity between the current text information and each preset problem in the conversation process.
And S1033, if the probability value is greater than the first preset value and the maximum value in the matching similarity is smaller than the second preset value, the intelligent voice platform determines that the target result is a result of establishing the call connection between the user and the artificial seat.
In an embodiment, two classifiers may be provided in the classification layer in the neural network model, and when the neural network model is trained, the same neural network structure may be adopted to process training data until a target text feature is obtained; then, inputting the target text characteristics into a classifier to obtain a first training result; inputting the first training result and the actual result of the training data into a loss function, and calculating the first training loss of the model; model parameters in the two classifiers are then iteratively updated according to the first training loss. Similarly, the updating of the model parameters in the multiple classifiers may also be: inputting the target text features into a preset multi-classifier to obtain a second training result; then, inputting the second training result and the actual result of the training data into a loss function, and calculating the second training loss of the model; model parameters in the multi-classifier are then iteratively updated according to the second training loss.
It will be appreciated that there are only two results from the two classifiers, namely the result of switching agents, and the result of not switching agents. Therefore, if the actual result of the training data is the result of transferring the human agent, the value of the actual result may be set to 1; and then, calculating a loss function according to the probability of predicting the result belonging to the switching artificial seat. Similarly, since the multi-classifier predicts the matching similarity between the current text information and each preset problem, when the actual result of the training data is the result belonging to the preset problem a, the value corresponding to the result belonging to the preset problem a may be set to 1, and the values corresponding to the results belonging to the remaining preset problems may be set to 0, respectively; and then, calculating a loss function according to the probability corresponding to the result predicted to belong to the A preset problem.
In an embodiment, since the neural network model is provided with two classifiers, the final output result may include a result of whether a call connection between the user and the human agent needs to be established, and a matching similarity with each preset question in the preset dialog flow. Based on this, in order to further improve the accuracy of identifying the target result, it may be set that the target result is a result of call connection between the user and the human agent that needs to be established only when the probability value of establishing call connection between the user and the human agent is greater than the first preset value and the maximum value of the matching similarity is smaller than the second preset value. Otherwise, the preset conversation process is still used for intelligent communication with the user.
In an embodiment, the first preset value and the second preset value may be set by a worker according to an actual situation, which is not limited herein.
In an embodiment, the two classifiers may be sigmoid classifiers, and the multi-classifier may be softmax classifiers. The sigmoid classifier is more suitable for a classification scene of two classifications, and therefore, the sigmoid classifier is more suitable for classifying whether to establish a call connection between a user and an artificial seat. And the softmax classifier is more suitable for a multi-classification scene, so that the softmax classifier is more suitable for predicting the matching similarity of the current text information and each preset question.
Referring to fig. 6, fig. 6 is a block diagram of a call connection device according to an embodiment of the present disclosure. Each module included in the call connection device in this embodiment is used to execute each step in the embodiments corresponding to fig. 1 to fig. 5. Please refer to fig. 1 to 5 and fig. 1 to 5 for related descriptions. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 6, the call connection apparatus 600 may include: an extraction module 610, a processing module 620, an input module 630, and a transfer module 640, wherein:
an extracting module 610, configured to extract, by using a neural network model, a current text feature of current text information input by a user and a historical text feature of historical text information; the neural network model includes an attention mechanism layer and a classification layer.
And the processing module 620 is configured to fuse the historical text features and the current text features based on the attention mechanism layer to generate target text features.
And an input module 630, configured to input the target text feature into the classification layer, so as to obtain a target result.
The switching module 640 is configured to establish a call connection between the user and the human agent if the target result is a result of establishing the call connection between the user and the human agent.
In one embodiment, the historical text information has a plurality of pieces, and each piece of historical text information corresponds to one historical text feature: the processing module 620 is further configured to:
respectively calculating the influence value of each historical text characteristic on the current text characteristic based on the attention mechanism layer; and according to the influence value, performing feature fusion on each historical text feature and the current text feature to obtain a target text feature.
In an embodiment, the processing module 620 is further configured to:
determining the maximum value of the feature dimension from the feature dimensions respectively corresponding to each historical text feature and each current text feature; respectively unifying the feature dimensions corresponding to each historical text feature and the current text feature according to the maximum value of the feature dimensions; respectively calculating the similarity of the historical text features and the current text features after the feature dimensions are unified; and aiming at the feature similarity corresponding to any one historical text feature, calculating the influence value of the historical text feature on the current text feature according to the feature similarity and the sum of the feature similarities of all the historical text features.
In one embodiment, the processing module 620 is further configured to calculate an influence value of the historical text feature on the current text feature by the following formula:
Figure BDA0003541235050000131
wherein, WijThe influence value of the jth historical text feature on the current text feature is obtained; n is a radical of hydrogenijThe feature similarity of the jth historical text feature and the current text feature is obtained; n is the number of the historical text features, and i is the current text feature.
In an embodiment, the processing module 620 is further configured to:
weighting the plurality of historical text features according to the influence values to obtain target historical text features; and fusing the target historical text features and the current text features to generate target text features.
In one embodiment, the processing module 620 is further configured to fuse the target historical textual feature and the current textual feature by:
fz=Wi*tanh(fi)+(1-Wi)*tanh(fx)
wherein f iszAs target text features, fiAs a current text feature, fxFor target historical text features, WiFor the preset weight value of the current text feature, tanh is the activation function, and is the hadamard product.
In one embodiment, the classification layer comprises a two-classifier for identifying whether a user has a result of switching a human agent and a multi-classifier for identifying text information from a preset conversation process; the input module 630 is further configured to:
inputting the target text characteristics into a classifier to obtain a probability value for establishing a call connection between a user and an artificial seat; inputting the target text features into the multiple classifiers to obtain the matching similarity between the current text information and each preset problem in the conversation process; and if the probability value is greater than the first preset value and the maximum value in the matching similarity is smaller than the second preset value, determining the target result as a result of establishing the call connection between the user and the artificial seat.
It should be understood that, in the structural block diagram of the call connection device shown in fig. 6, each module is used to execute each step in the embodiment corresponding to fig. 1 to fig. 5, and each step in the embodiment corresponding to fig. 1 to fig. 5 has been explained in detail in the above embodiment, specifically please refer to the relevant description in the embodiments corresponding to fig. 1 to fig. 5 and fig. 1 to fig. 5, which is not repeated herein.
Fig. 7 is a block diagram illustrating an intelligent speech platform according to an embodiment of the present application. As shown in fig. 7, the intelligent speech platform 700 of this embodiment includes: a processor 710, a memory 720, and computer programs 730, such as programs for a call connection method, stored in the memory 720 and executable on the processor 710. The processor 710 executes the computer program 730 to implement the steps in the embodiments of the call connection methods, such as S101 to S104 shown in fig. 1. Alternatively, the processor 710, when executing the computer program 730, implements the functions of the modules in the embodiment corresponding to fig. 6, for example, the functions of the modules 610 to 640 shown in fig. 6, and refer to the related description in the embodiment corresponding to fig. 6 specifically.
Illustratively, the computer program 730 may be divided into one or more modules, and the one or more modules are stored in the memory 720 and executed by the processor 710 to implement the call connection method provided by the embodiment of the present application. One or more of the modules may be a series of computer program instruction segments capable of performing specific functions that describe the execution of the computer program 730 in the intelligent speech platform 700. For example, the computer program 730 may implement the call connection method provided by the embodiment of the present application.
Intelligent speech platform 700 may include, but is not limited to, a processor 710, a memory 720. Those skilled in the art will appreciate that fig. 7 is merely an example of the intelligent speech platform 700 and does not constitute a limitation of the intelligent speech platform 700 and may include more or fewer components than shown, or combine certain components, or different components, e.g., the intelligent speech platform may also include input-output devices, network access devices, buses, etc.
The processor 710 may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 720 may be an internal storage unit of the intelligent voice platform 700, such as a hard disk or a memory of the intelligent voice platform 700. The memory 720 may also be an external storage device of the smart voice platform 700, such as a plug-in hard disk, a smart memory card, a flash memory card, etc. provided on the smart voice platform 700. Further, memory 720 may also include both internal storage units and external storage devices of smart speech platform 700.
The embodiment of the application provides an intelligent voice platform, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein when the processor executes the computer program, the call connection method in the above embodiments is realized.
The embodiment of the present application provides a computer-readable storage medium, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the call connection method in the above embodiments is implemented.
The embodiment of the present application provides a computer program product, which when running on an intelligent voice platform, enables the intelligent voice platform to execute the call connection method in the above embodiments.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A call connection method is characterized by being applied to an intelligent voice platform and comprising the following steps:
extracting current text features of current text information input by a user and historical text features of historical text information by adopting a neural network model; the neural network model comprises an attention mechanism layer and a classification layer;
fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features;
inputting the target text characteristics into the classification layer to obtain a target result;
and if the target result is the result of establishing the call connection between the user and the artificial seat, establishing the call connection between the user and the artificial seat.
2. The call connection method according to claim 1, wherein the plurality of historical text messages are provided, and each historical text message corresponds to one historical text feature:
fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features, wherein the target text features comprise:
respectively calculating the influence value of each historical text feature on the current text feature based on the attention mechanism layer;
and according to the influence value, performing feature fusion on each historical text feature and the current text feature to obtain the target text feature.
3. The call connection method according to claim 2, wherein the calculating the influence value of each of the historical text features on the current text feature based on the attention mechanism layer comprises:
determining a feature dimension maximum value from feature dimensions respectively corresponding to each historical text feature and the current text feature;
respectively unifying the feature dimensions respectively corresponding to each historical text feature and the current text feature according to the feature dimension maximum value;
respectively calculating the similarity of the historical text features and the current text features after each unified feature dimension;
and aiming at the feature similarity corresponding to any one of the historical text features, calculating the influence value of the historical text feature on the current text feature according to the feature similarity and the sum of the feature similarities of all the historical text features.
4. The call connection method according to claim 3, wherein the influence value of the historical text feature on the current text feature is calculated by the following formula:
Figure FDA0003541235040000021
wherein, WijThe influence value of the jth historical text feature on the current text feature is obtained; n is a radical ofijThe feature similarity of the jth historical text feature and the current text feature is obtained; n is the number of the historical text features, and i is the current text feature.
5. The call connection method according to any one of claims 2 to 4, wherein the obtaining the target text feature by performing feature fusion on each of the historical text features and the current text feature according to the influence value comprises:
weighting a plurality of historical text features according to the influence value to obtain target historical text features;
and fusing the target historical text features and the current text features to generate the target text features.
6. The call connection method according to claim 5, wherein the target historical text feature and the current text feature are fused by the following formula:
fz=Wi*tanh(fi)+(1-Wi)*tanh(fx)
wherein f iszFor the target text feature, fiAs a current text feature, fxFor target historical text features, WiAnd for the preset weight value of the current text feature, tanh is an activation function, and tan is a Hadamard product.
7. The call connection method according to any one of claims 1 to 4 or 6, wherein the classification layer comprises a classifier for identifying whether the user has a result of forwarding the human agent, and a classifier for identifying the text message from a preset dialog flow;
inputting the target text features into the classification layer to obtain a target result, wherein the target result comprises:
inputting the target text features into the two classifiers to obtain a probability value for establishing the call connection between the user and the artificial seat; and the number of the first and second groups,
inputting the target text features into the multi-classifier to obtain the matching similarity between the current text information and each preset problem in the conversation process;
and if the probability value is greater than a first preset value and the maximum value in the matching similarity is smaller than a second preset value, determining the target result as a result of establishing the communication connection between the user and the artificial seat.
8. The utility model provides a conversation connecting device which characterized in that is applied to intelligent voice platform, the device includes:
the extraction module is used for extracting the current text characteristics of the current text information input by the user and the historical text characteristics of the historical text information by adopting a neural network model; the neural network model comprises an attention mechanism layer and a classification layer;
the processing module is used for fusing the historical text features and the current text features based on the attention mechanism layer to generate target text features;
the input module is used for inputting the target text characteristics into the classification layer to obtain a target result;
and the switching module is used for establishing the call connection between the user and the artificial seat if the target result is the result of establishing the call connection between the user and the artificial seat.
9. An intelligent speech platform comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210240276.1A 2022-03-10 2022-03-10 Call connection method and device, intelligent voice platform and storage medium Pending CN114615378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210240276.1A CN114615378A (en) 2022-03-10 2022-03-10 Call connection method and device, intelligent voice platform and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210240276.1A CN114615378A (en) 2022-03-10 2022-03-10 Call connection method and device, intelligent voice platform and storage medium

Publications (1)

Publication Number Publication Date
CN114615378A true CN114615378A (en) 2022-06-10

Family

ID=81863200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210240276.1A Pending CN114615378A (en) 2022-03-10 2022-03-10 Call connection method and device, intelligent voice platform and storage medium

Country Status (1)

Country Link
CN (1) CN114615378A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190064314A (en) * 2017-11-30 2019-06-10 삼성에스디에스 주식회사 Method for processing a dialog task for an intelligent dialog agent and apparatus thereof
CN112613308A (en) * 2020-12-17 2021-04-06 中国平安人寿保险股份有限公司 User intention identification method and device, terminal equipment and storage medium
CN112632244A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Man-machine conversation optimization method and device, computer equipment and storage medium
WO2021218086A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Call control method and apparatus, computer device, and storage medium
CN113873088A (en) * 2021-10-29 2021-12-31 平安科技(深圳)有限公司 Voice call interaction method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190064314A (en) * 2017-11-30 2019-06-10 삼성에스디에스 주식회사 Method for processing a dialog task for an intelligent dialog agent and apparatus thereof
WO2021218086A1 (en) * 2020-04-28 2021-11-04 平安科技(深圳)有限公司 Call control method and apparatus, computer device, and storage medium
CN112613308A (en) * 2020-12-17 2021-04-06 中国平安人寿保险股份有限公司 User intention identification method and device, terminal equipment and storage medium
CN112632244A (en) * 2020-12-18 2021-04-09 平安普惠企业管理有限公司 Man-machine conversation optimization method and device, computer equipment and storage medium
CN113873088A (en) * 2021-10-29 2021-12-31 平安科技(深圳)有限公司 Voice call interaction method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110704641B (en) Ten-thousand-level intention classification method and device, storage medium and electronic equipment
CN111737987B (en) Intention recognition method, device, equipment and storage medium
CN112507704A (en) Multi-intention recognition method, device, equipment and storage medium
CN111353028B (en) Method and device for determining customer service call cluster
CN111382270A (en) Intention recognition method, device and equipment based on text classifier and storage medium
CN112653798A (en) Intelligent customer service voice response method and device, computer equipment and storage medium
KR20190072823A (en) Domain specific dialogue acts classification for customer counseling of banking services using rnn sentence embedding and elm algorithm
CN111581388B (en) User intention recognition method and device and electronic equipment
CN114678014A (en) Intention recognition method, device, computer equipment and computer readable storage medium
CN109635079A (en) A kind of determination method, apparatus, computer equipment and storage medium that user is intended to
CN114706945A (en) Intention recognition method and device, electronic equipment and storage medium
CN112765357A (en) Text classification method and device and electronic equipment
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN111402864A (en) Voice processing method and electronic equipment
CN113806501B (en) Training method of intention recognition model, intention recognition method and equipment
CN113449840A (en) Neural network training method and device and image classification method and device
CN112667792A (en) Man-machine conversation data processing method and device, computer equipment and storage medium
CN111858936A (en) Intention identification method and device, identification equipment and readable storage medium
CN114615378A (en) Call connection method and device, intelligent voice platform and storage medium
CN115204178A (en) Text sorting matching method, system, device and storage medium
CN114841588A (en) Information processing method, device, electronic equipment and computer readable medium
CN112765356A (en) Training method and system of multi-intention recognition model
CN111723198A (en) Text emotion recognition method and device and storage medium
CN116776870B (en) Intention recognition method, device, computer equipment and medium
CN117493530B (en) Resource demand analysis method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination