CN117240964B - Voice recognition method in call process - Google Patents

Voice recognition method in call process Download PDF

Info

Publication number
CN117240964B
CN117240964B CN202311529582.8A CN202311529582A CN117240964B CN 117240964 B CN117240964 B CN 117240964B CN 202311529582 A CN202311529582 A CN 202311529582A CN 117240964 B CN117240964 B CN 117240964B
Authority
CN
China
Prior art keywords
current
call
content
voice
previous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311529582.8A
Other languages
Chinese (zh)
Other versions
CN117240964A (en
Inventor
兰俊毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Boshicom Information Co ltd
Original Assignee
Fujian Boshicom Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Boshicom Information Co ltd filed Critical Fujian Boshicom Information Co ltd
Priority to CN202311529582.8A priority Critical patent/CN117240964B/en
Publication of CN117240964A publication Critical patent/CN117240964A/en
Application granted granted Critical
Publication of CN117240964B publication Critical patent/CN117240964B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice recognition method in a call process, which comprises the following steps: in the conversation process, identifying the current emotion tendency and the current purchase intention carried by the calling-out client in the current voice content at the current moment in real time, judging whether the current emotion tendency and the previous emotion tendency at the previous moment or the current purchase intention and the previous purchase intention at the previous moment are consistent, if so, generating and playing the current response content according to the current emotion tendency, the current purchase intention and the current voice content, otherwise, acquiring the previous voice content of the calling-out client at the previous moment and the previous response content of the intelligent customer service aiming at the previous voice content, and determining the current influence factors inconsistent before and after according to the current voice content, the previous response content and the previous voice content; the current answer content is generated and played according to the current emotional tendency, the current purchase intention, the current voice content and the current influencing factors. The invention can improve the accuracy of intelligent voice recognition.

Description

Voice recognition method in call process
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition method in a call process.
Background
Speech recognition technology is technology that allows a machine to convert speech signals into corresponding text or commands through recognition and understanding. The method mainly comprises three aspects of a feature extraction technology, a pattern matching criterion and a model training technology. The fields of speech recognition technology industry, home appliances, communication, automotive electronics, medical treatment, home services, consumer electronics, etc. are fully utilized, such as speech assistants on mobile phones, speech control of home appliances, etc.
In an intelligent customer service system, a voice recognition technology is used as a basis, and communication content of an outbound customer is known through the voice recognition technology so as to carry out targeted reply. In the prior art, a plurality of dialects are usually set, and the most suitable dialects are matched according to the reply before the calling-out client, and the matching is usually fixed, for example, the calling-out client mentions that "I feel too expensive", and the matched dialects are: and (5) illustrating the tariff condition and the corresponding preferential condition of the product. However, this way of speech matching is prone to bias, resulting in sometimes missed replies, so that the calling-out clients have no further communication intent.
That is, the accuracy of speech recognition during a call in the prior art is still further improved.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a voice recognition method in the call process, which can improve the accuracy of intelligent voice recognition.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for recognizing speech in a call, including:
s1, in the conversation process, identifying the current emotion tendency and the current purchase intention of an outbound customer in the current voice content at the current moment in real time, judging whether the current emotion tendency and the previous emotion tendency at the previous moment or the current purchase intention and the previous purchase intention at the previous moment are consistent, if so, generating and playing a current response content according to the current emotion tendency, the current purchase intention and the current voice content, otherwise, executing the step S2;
s2, acquiring the last voice content of the outbound customer at the last moment and the last response content of the intelligent customer service aiming at the last voice content, and determining the current influence factors inconsistent before and after according to the current voice content, the last response content and the last voice content;
and S3, generating and playing the current response content according to the current emotion tendency, the current purchase intention, the current voice content and the current influence factors.
The invention has the beneficial effects that: compared with the prior art, the method and the device have the advantages that emotion recognition and purchase intention recognition of the outbound client are added in the conversation process, and when emotion and purchase intention are changed, the influence factors of the change of the outbound client are recognized through continuous voice conversation, so that the expression intention of the outbound client is better understood, and then response content is adjusted according to the influence factors, so that the response of intelligent customer service better meets the requirements of the client, and therefore, the accuracy of intelligent voice recognition can be improved.
Optionally, determining the current influencing factor inconsistent with the current voice content, the last response content and the last voice content in the step S2 includes the following steps:
performing voice recognition by taking the last voice content and the last response content as a dialogue to obtain an intention understanding effect;
judging whether the intention understanding effect is negative, if so, extracting keywords from the previous voice content and the current voice content respectively, judging whether the previous voice content and the current voice content extract the same keywords, if so, extracting the keywords from the previous reply content as the current keywords, if not, extracting the keywords from the previous reply content as the current keywords, if yes, judging whether the previous reply content and the current voice content extract the same keywords, if so, extracting the same keywords as the current keywords, and if not, extracting the keywords with the highest weight in the previous reply content as the current keywords;
and taking the current keyword as a current influencing factor inconsistent in the front and back.
According to the description, the invention firstly determines whether the reply of the intelligent customer service accords with the expression intention of the calling customer according to the last voice content and the last response content, and when the understanding degree of the expression intention is positive or negative, different keyword determination modes are respectively used for determining the current influence factors inconsistent before and after, so that the accuracy of the influence factors is ensured.
Optionally, the time of the last time and the current time is determined from a dialogue between the outbound client and the intelligent customer service.
Optionally, the method further comprises the steps of:
and when the purchase intention of the calling-out client is positive at least twice continuously and the emotion tendency is negative at least twice continuously after the positive, switching the calling-out client to the artificial customer service in real time, and displaying the dialogue between the calling-out client and the intelligent customer service before the artificial intervention in a text form for the artificial customer service.
According to the above description, when the emotion of the calling-out client is intended to purchase but the emotion of the calling-out client is not optimistic, manual customer service intervention is needed, and the emotion of the calling-out client is adjusted in time, so that the effect of promoting the bill is achieved.
Optionally, the emotional tendency and the purchase intention of the outbound client are respectively marked on the corresponding text of the outbound client while being displayed in a text form for the manual customer service.
Optionally, the method further comprises the steps of:
when the call of the outbound client is hung up in a state that communication is not completed, global call quality, global emotion tendency and global purchase intention of the outbound client in the voice call process are obtained, whether the global call quality is lower than a normal call quality threshold value is judged, if yes, whether one of the global emotion tendency and the global purchase intention is positive is judged, and if yes, the outbound client is marked as a user to be redialed and the time to be redialed is attached.
According to the above description, the overall call quality, emotion tendency and purchase intention of the call are comprehensively judged, so that the situation that potential customers are lost due to poor call quality is timely recovered.
Optionally, determining the time to redial includes:
taking global call quality under different call scenes as a data set, marking the call scenes of the data set, dividing the data set into a training set and a test set, inputting the training set into a neural network model for training, testing the trained neural network model by the test set, and outputting the trained neural network model when the test result accords with the expected training effect to obtain a call scene identification model;
inputting the global call quality of the outbound client in the voice call process into a call scene recognition model to obtain the current call scene of the outbound client;
and determining the time to be redialed according to the current call scene of the outbound client.
According to the description, the call scene of the calling-out client is predicted to better determine the redial time, so that the call promotion success rate is improved.
Optionally, the method further comprises the steps of:
in the call process, acquiring the real-time call quality of the outbound client, if the real-time call quality is smaller than the lowest call quality threshold value in a preset time interval, hanging up after the call network speed problem is prompted by voice, and sending a short message to explain the hanging-up reason.
According to the above description, in the embodiment, when the call quality is less than the lowest call quality threshold, that is, the user is not considered to be in a scene of being unable to communicate, the user hangs up in time and sends a short message to explain the reason of hanging up, thereby improving the communication experience of the outbound clients.
Optionally, if the real-time call quality is less than the minimum call quality threshold is:
and if the real-time call quality is less than the lowest call quality threshold value for a plurality of times in the preset time interval.
Optionally, the number of times in which the real-time call quality is less than the minimum call quality threshold is multiple times in the preset time interval is associated with the number of times of real-time call quality in the preset time interval.
Drawings
Fig. 1 is a main flow diagram of a voice recognition method in a call process according to an embodiment of the present invention;
fig. 2 is a schematic overall flow chart of a voice recognition method in a call process according to an embodiment of the invention;
fig. 3 is a schematic flowchart of step S5 according to an embodiment of the present invention;
Detailed Description
In order that the above-described aspects may be better understood, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example 1
The embodiment is applied to a scene of popularizing products, and is characterized in that an intelligent customer service automatically dials an outbound customer, and finally confirms an intention customer list to carry out next product popularizing after carrying out automatic voice communication with the outbound customer. In the process, when the intention client list obtained through intelligent customer service is fed back to the seller, the feedback that the seller performs secondary popularization is not stable, so that the emotion change and the purchase intention change of the client are monitored in real time in the conversation process, thereby timely adjusting the conversation technology, improving the understanding accuracy of the voice content of the client, improving the accuracy of intelligent voice recognition and ensuring the accuracy of the intention client list.
Referring to fig. 1 to 2, a voice recognition method in a call process includes the steps of:
s1, in the conversation process, identifying the current emotion tendency and the current purchase intention of the calling-out client in the current voice content at the current moment in real time, judging whether the current emotion tendency and the previous emotion tendency at the previous moment or the current purchase intention and the previous purchase intention at the previous moment are consistent or not, if so, executing the step S2, otherwise;
in this embodiment, the time between the previous time and the current time is determined according to a session between the outbound client and the intelligent customer service, that is, the time occupied by the session is one time, and at this time, the previous time is the previous session, and the current time is the current session.
In this embodiment, when voice communication is performed with the outbound client, the emotion tendency and the purchase intention of the outbound client are identified, and compared with other prior art methods in which emotion tendency or purchase intention is identified separately, by analyzing the past voice communication case, it is found that some outbound clients have purchase intention on the product, but the answer based on intelligent customer service is not satisfied, so that negative emotion is generated, and the emotion of some outbound clients is very good, but the commodity promotion is rejected politely. Therefore, the emotion tendency and the purchase intention cannot truly reflect the real demands of the outbound clients alone, and cannot be equal, wherein the emotion tendency and the purchase intention are used for representing the communication experience of the user and the interest degree of the product, but the emotion tendency and the purchase intention are more prone to the communication experience, and the interest degree of the product is more prone to the communication experience, so that the embodiment considers the two aspects together, and can more accurately determine the communication experience of the outbound clients and the interest degree of the product.
It should be noted that, in the invention, the prior art is adopted for identifying the emotion tendency and the purchase intention, and after the emotion tendency and the purchase intention of the user are obtained by using the prior art, the influence factors of the change of the calling-out client are identified by combining the emotion tendency and the purchase intention and the change of the context, so that the expression intention of the calling-out client can be better understood, namely, the meaning to be expressed by the voice content of the calling-out client in the conversation process can be more accurately identified. For the understanding of the present invention, the prior art for emotional tendency and intent to purchase is also described as follows:
the identification of emotional tendency may be accomplished using emotional characteristic data. Wherein the emotional characteristics are divided into local characteristics and global characteristics. The local feature is a feature extracted from one voice frame or a part of voice frames of the voice data, and the global feature is a statistical result of features extracted from all voice frames of the voice data, reflecting the global characteristics of the whole voice data. Emotional characteristics may include, but are not limited to, prosodic characteristics, spectrum-based related characteristics, timbre characteristics, and i-vetor characteristics, among others. Therefore, at least one characteristic of prosody, timbre, sound spectrum or the like of the voice data can be analyzed by using the characteristic extraction algorithm, so that emotion characteristic data corresponding to the voice data can be obtained. For example, features are extracted from a spectrogram of speech data using a CNN algorithm. And of course, the voice data can be directly marked and then a machine learning model can be trained, so that the extraction of the characteristic data is realized by the trained machine learning model.
Recognition of purchase intent may also be similar to recognition of emotional tendency, only by changing the training data. Of course, there is also a simpler method, i.e. by a parameter evaluation method, which is also theoretically equivalent to a machine learning model, because the training of the machine learning model is itself to perform feature extraction and weight distribution on the input data, which is different from the parameter evaluation method. Returning to the parameter evaluation method, first, a positive vocabulary database and a negative vocabulary database are saved. And then, identifying four indexes of the dialogue times, the occurrence frequency of the positive words, the occurrence frequency of the negative frequencies and the price inquiry condition of the voice data. Finally, the obtained information is normalized and multiplied by the weight value to obtain an intention value, wherein the normalized value is lower when the occurrence frequency of the negative frequency is higher, and the normalized value is higher when the occurrence frequency of the other indexes is higher; wherein the weight values of the four indexes are respectively 0.3, 0.3 and 0.1; the intention value is within 0 to 1, and different intention value intervals represent different purchase intention, in this embodiment, 0 to 0.36 is negative, 0.64 to 1 is positive, and the middle is neutral, and of course, four, five and other value intervals can be thinned to perform finer identification according to the situation. For example, the number of dialogues, the occurrence frequency of the positive vocabulary, the occurrence frequency of the negative vocabulary and the price inquiry situation are respectively 3 times, 2 times, 0 times and 0 times, the standardized numerical values are 0.6, 0.8, 1 and 0, the final obtained intention numerical value is 0.72, and the purchase intention is the positive level.
In this embodiment, there are at least three levels of emotional tendency and purchase intent: positive, neutral and negative, wherein positive is denoted by 1, neutral is denoted by 0, negative is denoted by-1, whereby all possibilities of emotional tendency and intention to purchase are as follows: 1&1, 1&0, 1& -1, 0&1, 0&0, 0& -1, -1&1, -1&0 and-1 & -1. In a simple judgment method, when at least one of emotion tendency and purchase intention is 1, the judgment method belongs to the intended client, and otherwise, the judgment method does not belong to the intended client.
At this time, it is judged whether the current emotional tendency is the same as the previous emotional tendency at the previous time or whether the current purchase intention is the same as the previous purchase intention at the previous time, if so, it is indicated that the outbound client and the intelligent customer service are kept consistent before and after in the communication process, whether the outbound client is negative from beginning to end or positive from beginning to end, the former indicates that the outbound client is averse to the telephone sales mode itself, so that the outbound client is not an intended client, and the latter indicates that the outbound client is the intended client, so that the current answer content is generated and played according to the current emotional tendency, the current purchase intention and the current voice content.
For example, the last voice content is "this cost case", the last answer content is "our original price of this product is 199", what is currently purchased through our channel is 139", and the current voice content is" good, i know about thank you ". At this time, the current emotional tendency, the current purchase intention, the previous emotional tendency or the previous purchase intention are all neutral, so that the emotional tendency and the purchase intention are consistent, the current response content is directly generated and played according to the current emotional tendency, the current purchase intention and the current voice content, for example, the current voice strategy is to schedule the interest of the user, at this time, the current response content is generated, namely, mr is generated, you can know the function … … of the product, namely, the characteristic function of the product is described.
In another scenario, the last voice content is "this cost situation", the last response content is "our original price of this product is 199", the current voice content is 139 "and the current voice content is" good bar "which is a little more expensive and can no longer be cheaper. At this time, the previous emotion tendency or the previous purchase intention is neutral, the current emotion tendency is biased negative, but the current purchase intention is biased positive, and therefore, the emotion tendency and the purchase intention are changed in the front-back direction, and step S2 is executed.
S2, acquiring the last voice content of the outbound customer at the last moment and the last response content of the intelligent customer service aiming at the last voice content, and determining the current influence factors inconsistent before and after according to the current voice content, the last response content and the last voice content;
in this embodiment, the determining, in step S2, the current influencing factor that is inconsistent before and after according to the current voice content, the last response content, and the last voice content includes the following steps:
s21, performing voice recognition by taking the last voice content and the last response content as a dialogue to obtain an intention understanding effect;
the method comprises the steps of taking the last voice content and the last response content as a dialogue to conduct voice recognition so as to judge dialogue fluency between the last response content and the last voice content, and judging whether the response content of the intelligent customer service is responded based on the intention expressed in understanding the voice content of the outbound customer. For example, the last voice content is "this cost condition", the last answer content is "our price of this product is 199, and the current purchase through our channel is 139", the explanation is intended to understand the effect positive. If the last answer content is "your good", then we have a good price. At this time, although the last answer content is also a question of the fee, the voice content of the outbound client inquires about the specific fee, the last answer content should be a description of the specific fee, and the intention is considered to be negative when the above-mentioned question occurs.
S22, judging whether the intention understanding effect is negative, if so, respectively extracting keywords from the previous voice content and the current voice content, judging whether the previous voice content and the current voice content extract the same keywords, if so, extracting keywords from the previous answer content as the current keywords, if not, extracting keywords from the previous answer content as the current keywords, if yes, judging whether the previous answer content and the current voice content extract the same keywords, if so, extracting the same keywords as the current keywords, and if not, extracting keywords with the highest weight in the previous answer content as the current keywords;
when the voice content is negative, the previous voice content is "the cost condition," the previous answer content is "you good, mr. our, the price is very good," the current voice content is "I ask you how much money of the product," and the previous voice content and the current voice content extract the same keyword as "cost. If the last voice content is "this cost condition", the last answer content is "your own, mr. And our function … …" of this product, the current voice content is "you don't understand my words", the last voice content and the current voice content extract the same keyword, and the keyword "cost" existing in the last voice content and not existing in the last answer content is obtained.
When the current voice content is "too expensive", the same keyword is extracted from the current voice content and the previous answer content to be "cost".
The keyword with the highest weight in the previous answer content is taken as the current keyword, the term capable of expressing the meaning of sentences is the keyword, for example, the keyword of 139 which is purchased through a channel at present is the cost, namely the keyword of 199 which is the original price of the product.
S23, taking the current keywords as current influence factors inconsistent in the front-back direction.
Thus, the current keyword of "cost" is used as the current influencing factor of the inconsistent.
And S3, generating and playing the current response content according to the current emotional tendency, the current purchase intention, the current voice content and the current influencing factors.
At this point, further explanation of the fee is added on an original basis, depending on the setting for the fee at the time of the preset, such as 109 for further preference, or 139 for other products with the same function, and so on.
And S4, when the purchase intention of the outbound client is positive at least twice continuously and the emotion tendency is negative at least twice continuously after the positive, switching the outbound client to the artificial customer service in real time, and displaying the dialogue between the outbound client and the intelligent customer service before the artificial intervention in a text form for the artificial customer service.
When the emotion of the calling-out client is intentionally purchased but the emotion is not optimistic during communication, the emotion of the calling-out client is timely adjusted through intervention of manual customer service, so that the effect of promoting the bill is achieved.
In this embodiment, while displaying the manual customer service in text form, the emotional tendency and the purchase intention of the outbound customer are respectively marked on the corresponding text of the outbound customer. Thus, the manual customer service can know the demand of the calling-out clients in the first time, and the calling-out clients are quickly calmed.
In this embodiment, in order to improve the communication efficiency of the intelligent customer service, the method further includes the following steps:
acquiring historical call processes marked as communication failures, analyzing all the historical call processes marked as communication failures to obtain communication failure factors of each historical call process, and marking the communication failure factor with the largest proportion as a first factor;
and adding the product description of the first factor into the first product description of the intelligent customer service.
Wherein the determination regarding the first factor also requires that the following conditions are met: the ratio of the communication failure factor with the largest ratio is more than 50% greater than the ratio of the second communication failure factor, which means that the first factor is rejected, for example, the first three of the communication failure factors are 28.6%, 25.4% and 19.1%, the ratio of the communication failure factor with the largest ratio is more than 12.6%, and the first factor is not present in this case, so that no modification is required for the first product introduction, and if the first three of the communication failure factors are 41.3%, 22.5% and 17.9%, the ratio of the communication failure factor with the largest ratio is more than 83.6%, and the product description of the communication failure factor with the largest ratio is added to the first product introduction of the intelligent service, for example, the product introduction with the largest ratio of 41.3% is added because of the "expense".
The first product introduction is the first introduction of the product description by the intelligent customer service after the previous opening of the call in the call process of the intelligent customer service and the calling-out customer. At this time, if the outbound customer sensitive to the cost is called out, the call may be hung up directly, so that an intermediate communication process is saved.
Therefore, the communication failure factor with the largest proportion is added to the first product introduction of the intelligent customer service, and the sensitive information of the user can be described in advance, so that one-time effective communication can be completed timely and quickly, the middle ineffective communication is saved, and the communication efficiency of the intelligent customer service is improved.
In summary, the embodiment also adds emotion recognition and purchase intention recognition of the outbound client in the conversation process, and recognizes the influence factors of the change of the outbound client through continuous voice dialogue when the emotion and purchase intention are changed, so as to better understand the expression intention of the outbound client, and then adjusts the response content according to the influence factors, so that the response of the intelligent customer service better meets the requirements of the client, and therefore, the invention can improve the accuracy of intelligent voice recognition.
Example two
As can be seen from fig. 3 on the basis of the foregoing embodiment 1, the present embodiment further includes the steps of:
s5, when the call is hung up by the outbound client in a state that communication is incomplete, acquiring global call quality, global emotion tendency and global purchase intention of the outbound client in the voice call process, judging whether the global call quality is lower than a normal call quality threshold, if yes, judging whether one of the global emotion tendency and the global purchase intention is positive, and if yes, marking the outbound client as a user to be redialed and attaching time to be redialed.
In this embodiment, referring to fig. 3, step S5 specifically includes:
s51, when the calling-out client hangs up the telephone in the state that the communication is not completed, or in the process of communication, acquiring the real-time communication quality of the calling-out client, and in a preset time interval, if the real-time communication quality is smaller than the lowest communication quality threshold, hanging up after the voice prompt of the communication network speed problem, and sending a short message to explain the hanging-up reason.
The method comprises the step of automatically hanging up an outbound customer and hanging up an intelligent customer service. When the call quality is smaller than the lowest call quality threshold, namely, the user is not considered to be in a scene of being unable to communicate, the user hangs up in time and sends a short message to explain the reason of hanging up, so that the communication experience of the outbound client is improved. It should be noted that, if the call is hung up after the description of the outbound client, the application scenario is determined according to the description of the outbound client, for example, the client says that me is in a meeting, without making subsequent judgment.
In this embodiment, if the signal strength of the outbound client can be obtained, the call quality is based on the signal strength of the outbound client, where the normal call quality threshold is-100 dBm and the lowest call quality threshold is-120 dBm. And if not, the stability of the signal received from the outbound client is subject to.
In this embodiment, if the real-time call quality is less than the minimum call quality threshold, it is:
if the real-time call quality is less than the lowest call quality threshold value for a plurality of times in the preset time interval.
Wherein, the plurality of times in the case that the real-time call quality of the plurality of times in the preset time interval is smaller than the lowest call quality threshold value is associated with the number of times of the real-time call quality in the preset time interval. For example, the preset time interval is 5 seconds, and the sampling frequency is 1 second, the number of times of real-time call quality in the preset time interval is 5 times, and at this time, the number of times is 3 times, so in the embodiment, the number of times represented by the number of times is at least greater than half of the number of times of real-time call quality in the preset time interval.
And S52, acquiring global call quality, global emotion tendency and global purchase intention of the outbound client in the voice call process, judging whether the global call quality is lower than a normal call quality threshold, if so, judging whether one of the global emotion tendency and the global purchase intention is positive, and if so, executing step S53.
Wherein, in step S52, it is determined that the outbound customer has a sense of popularizing the product before hanging up, and one of the global emotion tendency and the global purchase intention is positive, which indicates that there is an opportunity, and thus, a redial is required to communicate with the outbound customer.
S53, taking global call quality under different call scenes as a data set, marking the call scenes of the data set, dividing the data set into a training set and a testing set, inputting the training set into a neural network model for training, testing the trained neural network model by the testing set, and outputting the trained neural network model when the test result accords with the expected training effect to obtain a call scene identification model.
The call quality in different scenes can be simulated by collecting places with poor signals, and call quality of the scene where a user is located can be clearly defined in the past data, so that global call quality including the smooth communication scene and the unsmooth communication scene such as an elevator, a basement and a tunnel is obtained, a data set is formed, each global call quality in the data set is marked with a mark corresponding to the scene, and then the data set is divided into a training set and a test set according to the proportion of 7:3 to 9:1, for example, the data set is divided according to the proportion of 8:2 in the embodiment, so that 80% of the data set is used as the training set, and 20% of the data set is used as the test set.
In this embodiment, the neural network model is a convolutional neural network, and the network model sequentially includes an input layer, a convolutional layer, a pooling layer, a full-connection layer, and an output layer from input to output, the convolutional layer performs local feature extraction on input parameters of the input layer, the pooling layer reduces feature parameters, the full-connection layer obtains global features, and the SoftMax in the output layer classifies the input parameters according to the features to obtain attribution categories of the input parameters.
When the convolutional neural network is trained, because each data in the training set comprises the global call quality as an input parameter and a call scene mark comprising a attribution category, the convolutional neural network can adjust the parameters in each layer, so that the attribution category output after the input global call quality passes through the layers is a marked call scene, after the training is finished, the initial parameters in each layer originally become the adjusted parameters, and the convolutional neural network after the parameters are adjusted can identify the global call quality.
At this time, the global call quality of each data in the test set is input as input data into the convolutional neural network after single training, then the call scene output by the convolutional neural network is obtained, the test result is compared with the call scene marked by the user to obtain a test result, the test result of each data in the test set is summarized, the accuracy of the convolutional neural network after single training is obtained, for example, the expected training effect is that the accuracy reaches 98%, the accuracy of the convolutional neural network after single training is 96%, the convolutional neural network needs to be trained again until the test accuracy reaches or exceeds 98%, and the convolutional neural network after the last training is used as a trained neural network model to obtain the call scene recognition model.
These datasets are trained to obtain models that can be identified based on call quality.
S54, inputting the global call quality of the outbound client in the voice call process into a call scene recognition model to obtain the current call scene of the outbound client.
S55, determining the time to be redialed according to the current call scene of the calling client.
In this embodiment, the call scene includes but is not limited to the scenes of an elevator, a basement, a tunnel, etc., wherein the time to be redialed of the elevator is shorter, and is generally set between 1 and 3 minutes, and the basement and the tunnel are set between 8 and 15 minutes.
S56, marking the calling-out client as the user to be redialed and attaching the time to be redialed.
Therefore, the redialing is carried out after the calling-out client leaves the application scene with poor call quality, and the situation that the potential client is lost due to poor call quality is timely recovered.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. are for convenience of description only and do not denote any order. These terms may be understood as part of the component name.
Furthermore, it should be noted that in the description of the present specification, the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to a specific feature, structure, material, or characteristic described in connection with the embodiment or example being included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art upon learning the basic inventive concepts. Therefore, the appended claims should be construed to include preferred embodiments and all such variations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention should also include such modifications and variations provided that they come within the scope of the following claims and their equivalents.

Claims (9)

1. A voice recognition method in the conversation process is characterized by comprising the following steps:
s1, in the conversation process, identifying the current emotion tendency and the current purchase intention of an outbound customer in the current voice content at the current moment in real time, judging whether the current emotion tendency is consistent with the previous emotion tendency at the previous moment and judging whether the current purchase intention is consistent with the previous purchase intention at the previous moment, if so, generating and playing a current response content according to the current emotion tendency, the current purchase intention and the current voice content, otherwise, executing the step S2, wherein the consistency is in the same level;
s2, acquiring the last voice content of an outbound customer at the last moment and the last response content of an intelligent customer service aiming at the last voice content, and determining the current influence factors of the inconsistency before and after according to the current voice content, the last response content and the last voice content, wherein the inconsistency is in different levels;
s3, generating and playing current response content according to the current emotion tendencies, the current purchase intentions, the current voice content and the current influencing factors;
in the step S2, determining the current influencing factor inconsistent with the current voice content, the last response content and the last voice content includes the following steps:
performing voice recognition by taking the last voice content and the last response content as a dialogue to obtain an intention understanding effect;
judging whether the intention understanding effect is negative, if so, extracting keywords from the previous voice content and the current voice content respectively, judging whether the previous voice content and the current voice content extract the same keywords, if so, extracting the keywords from the previous reply content as the current keywords, if not, extracting the keywords from the previous reply content as the current keywords, if yes, judging whether the previous reply content and the current voice content extract the same keywords, if so, extracting the same keywords as the current keywords, and if not, extracting the keywords with the highest weight in the previous reply content as the current keywords;
and taking the current keyword as a current influencing factor inconsistent in the front and back.
2. A method of speech recognition during a call according to claim 1, wherein the time of the last time and the current time is determined from a dialogue between the calling-out client and the intelligent customer service.
3. The method for speech recognition during a call of claim 1, further comprising the steps of:
and when the purchase intention of the calling-out client is positive at least twice continuously and the emotion tendency is negative at least twice continuously after the positive, switching the calling-out client to the artificial customer service in real time, and displaying the dialogue between the calling-out client and the intelligent customer service before the artificial intervention in a text form for the artificial customer service.
4. A method of speech recognition during a call according to claim 3, wherein the emotional tendency and the purchase intention of the outbound customer are respectively marked on the corresponding text of the outbound customer while being displayed in the form of text for human customer service.
5. A method of speech recognition during a call according to any one of claims 1 to 4, further comprising the steps of:
when the call of the outbound client is hung up in a state that communication is not completed, global call quality, global emotion tendency and global purchase intention of the outbound client in the voice call process are obtained, whether the global call quality is lower than a normal call quality threshold value is judged, if yes, whether one of the global emotion tendency and the global purchase intention is positive is judged, and if yes, the outbound client is marked as a user to be redialed and the time to be redialed is attached.
6. The method for voice recognition during a call according to claim 5, wherein determining the time to redial comprises:
taking global call quality under different call scenes as a data set, marking the call scenes of the data set, dividing the data set into a training set and a test set, inputting the training set into a neural network model for training, testing the trained neural network model by the test set, and outputting the trained neural network model when the test result accords with the expected training effect to obtain a call scene identification model;
inputting the global call quality of the outbound client in the voice call process into a call scene recognition model to obtain the current call scene of the outbound client;
and determining the time to be redialed according to the current call scene of the outbound client.
7. The method for speech recognition during a call of claim 5, further comprising the steps of:
in the call process, acquiring the real-time call quality of the outbound client, if the real-time call quality is smaller than the lowest call quality threshold value in a preset time interval, hanging up after the call network speed problem is prompted by voice, and sending a short message to explain the hanging-up reason.
8. The method for voice recognition during a call according to claim 7, wherein if the real-time call quality is less than a minimum call quality threshold value is:
and if the real-time call quality is less than the lowest call quality threshold value for a plurality of times in the preset time interval.
9. The method according to claim 7, wherein the number of times the real-time call quality is less than a minimum call quality threshold is associated with the number of times the real-time call quality is present in the predetermined time interval.
CN202311529582.8A 2023-11-16 2023-11-16 Voice recognition method in call process Active CN117240964B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311529582.8A CN117240964B (en) 2023-11-16 2023-11-16 Voice recognition method in call process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311529582.8A CN117240964B (en) 2023-11-16 2023-11-16 Voice recognition method in call process

Publications (2)

Publication Number Publication Date
CN117240964A CN117240964A (en) 2023-12-15
CN117240964B true CN117240964B (en) 2024-02-27

Family

ID=89084809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311529582.8A Active CN117240964B (en) 2023-11-16 2023-11-16 Voice recognition method in call process

Country Status (1)

Country Link
CN (1) CN117240964B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002197252A (en) * 2000-12-25 2002-07-12 Casio Comput Co Ltd Information collection system, information collection service method and computer readable storage medium
CA2988282A1 (en) * 2016-12-22 2018-06-22 Capital One Services, Llc Systems and methods for customer sentiment prediction and depiction
CN109767791A (en) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 A kind of voice mood identification and application system conversed for call center
KR20200025532A (en) * 2018-08-30 2020-03-10 주민성 An system for emotion recognition based voice data and method for applications thereof
CN111026843A (en) * 2019-12-02 2020-04-17 北京智乐瑟维科技有限公司 Artificial intelligent voice outbound method, system and storage medium
CN113688221A (en) * 2021-09-08 2021-11-23 中国平安人寿保险股份有限公司 Model-based dialect recommendation method and device, computer equipment and storage medium
CN115171690A (en) * 2022-06-28 2022-10-11 Oppo广东移动通信有限公司 Control method, device and equipment of voice recognition equipment and storage medium
CN116030788A (en) * 2023-02-23 2023-04-28 福建博士通信息股份有限公司 Intelligent voice interaction method and device
CN116303978A (en) * 2023-05-17 2023-06-23 福建博士通信息股份有限公司 Potential user mining method based on voice analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002197252A (en) * 2000-12-25 2002-07-12 Casio Comput Co Ltd Information collection system, information collection service method and computer readable storage medium
CA2988282A1 (en) * 2016-12-22 2018-06-22 Capital One Services, Llc Systems and methods for customer sentiment prediction and depiction
KR20200025532A (en) * 2018-08-30 2020-03-10 주민성 An system for emotion recognition based voice data and method for applications thereof
CN109767791A (en) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 A kind of voice mood identification and application system conversed for call center
CN111026843A (en) * 2019-12-02 2020-04-17 北京智乐瑟维科技有限公司 Artificial intelligent voice outbound method, system and storage medium
CN113688221A (en) * 2021-09-08 2021-11-23 中国平安人寿保险股份有限公司 Model-based dialect recommendation method and device, computer equipment and storage medium
CN115171690A (en) * 2022-06-28 2022-10-11 Oppo广东移动通信有限公司 Control method, device and equipment of voice recognition equipment and storage medium
CN116030788A (en) * 2023-02-23 2023-04-28 福建博士通信息股份有限公司 Intelligent voice interaction method and device
CN116303978A (en) * 2023-05-17 2023-06-23 福建博士通信息股份有限公司 Potential user mining method based on voice analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
兰州银行客服中心服务质量提升策略研究;张彦平;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》;20230415;J152-287 *

Also Published As

Publication number Publication date
CN117240964A (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN112804400B (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
US10522144B2 (en) Method of and system for providing adaptive respondent training in a speech recognition application
CN107818798A (en) Customer service quality evaluating method, device, equipment and storage medium
CN109451188B (en) Method and device for differential self-help response, computer equipment and storage medium
US8538755B2 (en) Customizable method and system for emotional recognition
US10789943B1 (en) Proxy for selective use of human and artificial intelligence in a natural language understanding system
CN107886949A (en) A kind of content recommendation method and device
CN106611604B (en) Automatic voice superposition detection method based on deep neural network
CN110379441B (en) Voice service method and system based on countermeasure type artificial intelligence network
US7222074B2 (en) Psycho-physical state sensitive voice dialogue system
CN105869626A (en) Automatic speech rate adjusting method and terminal
CN106486120A (en) Interactive voice response method and answering system
CN113194210B (en) Voice call access method and device
CN116631412A (en) Method for judging voice robot through voiceprint matching
CN117240964B (en) Voice recognition method in call process
CN116303978B (en) Potential user mining method based on voice analysis
CN114267340A (en) Method, device, storage medium and equipment for evaluating service quality of 4S shop
CN116030788B (en) Intelligent voice interaction method and device
CN101460994A (en) Speech differentiation
CN113099043A (en) Customer service control method, apparatus and computer-readable storage medium
WO2020068808A1 (en) System and method for optimizing operation of a conversation management system
CN117153151B (en) Emotion recognition method based on user intonation
CN116308735A (en) Financial data prediction method, device, electronic equipment and storage medium
CN117354421A (en) Intelligent voice analysis method and system
CN117594062A (en) Voice response method, device and medium applied to real-time session and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant