CN112885379A - Customer service voice evaluation method, system, device and storage medium - Google Patents
Customer service voice evaluation method, system, device and storage medium Download PDFInfo
- Publication number
- CN112885379A CN112885379A CN202110116652.1A CN202110116652A CN112885379A CN 112885379 A CN112885379 A CN 112885379A CN 202110116652 A CN202110116652 A CN 202110116652A CN 112885379 A CN112885379 A CN 112885379A
- Authority
- CN
- China
- Prior art keywords
- voice
- customer service
- speech
- intonation
- detection model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 64
- 238000001514 detection method Methods 0.000 claims abstract description 82
- 238000000034 method Methods 0.000 claims abstract description 31
- 230000002996 emotional effect Effects 0.000 claims description 26
- 238000000605 extraction Methods 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000007689 inspection Methods 0.000 abstract description 17
- 230000008451 emotion Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000007935 neutral effect Effects 0.000 description 6
- 206010049976 Impatience Diseases 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003203 everyday effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 208000021663 Female sexual arousal disease Diseases 0.000 description 1
- 208000006262 Psychological Sexual Dysfunctions Diseases 0.000 description 1
- 208000001431 Psychomotor Agitation Diseases 0.000 description 1
- 206010038743 Restlessness Diseases 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012372 quality testing Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5175—Call or contact centers supervision arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides a customer service voice evaluation method, a system, equipment and a storage medium, wherein the method comprises the following steps: collecting customer service voice to be evaluated; dividing the customer service voice to be evaluated into a plurality of voice paragraphs; inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph; and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph. The invention provides a tone detection model for detecting the tone of a speech cavity of customer service speech, which can automatically identify whether negative tones exist in the customer service speech and automatically evaluate the customer service speech according to a detection result without manual evaluation, so that the evaluation accuracy is higher, and the quality inspection efficiency can be greatly improved.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a customer service voice evaluation method, a customer service voice evaluation system, customer service voice evaluation equipment and a storage medium.
Background
As an OTA (Online Travel Agency) platform, the service quality of a call center customer service is important for important hubs of customers and businesses. The quality inspection is an important link for controlling the quality of service, and before the quality inspection of customer service, the quality inspection is generally carried out by an evaluation department in a manual method. The call center can generate massive calls every day, compared with the prior art, the quality inspection method has the advantages that human resources for quality inspection are limited, if the amount of the extracted samples is small, the extracted samples have high randomness and are difficult to represent the actual service quality of customer service, and the cost is increased by increasing the number of the samples. In short, manual quality inspection can only be performed by sampling inspection, and it is difficult to track and specifically analyze the performance of customer service. Meanwhile, the subjectivity of manual quality inspection is strong, and the possibility of standard inconsistency or error exists.
Quality management is an important block for operation management of the customer service center, and quality inspection is a standard for defining the quality of service of the customer service center. The examination of the service quality of the customer service is generally hooked with the satisfaction degree of the customer, but the satisfaction degree of the customer is sometimes related to whether the appeal proposed by the customer is met or not, and the relevance of the satisfaction degree of the customer service and the service quality of the customer service is small. At the moment, the service quality of the customer service is difficult to measure through the customer satisfaction degree, and the place needing improvement in the customer service is difficult to know in the mode. Therefore, relatively objective quality inspection standards are needed to control the quality of service of employees and indicate problems for the employees.
The tone of speech in the speech cavity is very important for the customer service staff in the call center, and the customer service staff should feel lovely and enthusiastic, mild and present during communication and should not show frigidity and impatience. However, the customer service personnel face guests and merchants throughout the year, and often deal with similar problems, and may generate lassitude, so that enthusiasm is lacked in communication, and the specific expression is that the intonation flat plate is not fluctuated, so that the guests have an undivided feeling; meanwhile, the customer service is diversified, and when the guest has negative emotion, the customer service cannot easily keep a positive tone.
The examination of the tone of the customer service language cavity is completed manually, and the manual quality inspection often has several problems: 1. the subjectivity is strong, and the quality testing personnel can not understand the assessment standard uniformly. 2. The randomness is strong, the call center can generate massive calls every day, and each call of each customer service person cannot be checked in a manual mode. The method can only be completely finished by a random sampling mode in a biased way, and has great randomness. Therefore, such quality control cannot help customers to find and improve problems through long-term tracking even if the performance of the customers is comprehensively known.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a customer service voice evaluation method, a system, equipment and a storage medium, which can automatically analyze customer service voice and improve the quality inspection efficiency of the customer service voice.
The embodiment of the invention provides a customer service voice evaluation method, which comprises the following steps:
collecting customer service voice to be evaluated;
dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
In some embodiments, segmenting the customer service voice to be evaluated into a plurality of voice paragraphs comprises:
recognizing the customer service voice based on an automatic voice recognition technology to obtain a voice text corresponding to the customer service voice;
and segmenting the customer service voice based on the voice text, and removing the voice paragraphs which do not contain the voice text.
In some embodiments, the inputting each speech segment into a trained intonation detection model respectively to detect whether a specific negative intonation exists in the speech segment includes the following steps:
extracting emotional features of the voice paragraphs;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
In some embodiments, extracting the emotional features of the speech passage comprises the following steps:
extracting LLDs (likelihood of being confused) characteristics of the audio data of the speech passage;
based on the LLDs features, HSFs features are extracted.
In some embodiments, the intonation detection model includes a plurality of classifiers, each of which corresponds to a corresponding negative intonation in a one-to-one manner, and each of the classifiers is configured to output a probability value that the corresponding negative intonation is included in the speech passage.
In some embodiments, the intonation detection model includes a feature extraction layer, a target task classification layer, and a speaker recognition layer;
the feature extraction layer is used for extracting features of emotional features of the voice paragraphs and then inputting the emotional features into the target task classification layer and the speaker recognition layer respectively, the target task classification layer outputs probability values that the voice paragraphs comprise specific negative tones, and the speaker recognition layer is used for outputting probability values that the voice paragraphs correspond to speakers.
In some embodiments, the intonation detection model is trained using counterlearning, and the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer, wherein the gradient inversion layer keeps the transmission weight unchanged during forward propagation and reverses the gradient during backward propagation.
In some embodiments, obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model for each of the voice paragraphs includes the following steps:
and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
In some embodiments, obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model for each of the voice paragraphs includes the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
In some embodiments, said calculating a percentage of speech segments comprising the particular negative intonation that fall within said customer service speech comprises:
calculating the ratio of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice; or
The ratio of the number of sentences including the speech passage of the particular negative intonation to the total number of sentences of the customer service speech is calculated.
The embodiment of the invention also provides a customer service voice evaluation system, which is used for realizing the customer service voice evaluation method, and the system comprises:
the voice acquisition module is used for acquiring customer service voice to be evaluated;
the voice segmentation module is used for segmenting the customer service voice to be evaluated into a plurality of voice paragraphs;
the intonation detection module is used for respectively inputting each voice paragraph into the trained intonation detection model and detecting whether a specific negative intonation exists in the voice paragraph;
and the voice evaluation module is used for obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
An embodiment of the present invention further provides a customer service voice evaluation device, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the customer service voice assessment method via execution of the executable instructions.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the customer service voice evaluation method when being executed by a processor.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The customer service voice evaluation method, the customer service voice evaluation system, the customer service voice evaluation equipment and the storage medium have the following beneficial effects:
the invention provides a tone detection model for detecting the tone of a speech cavity of customer service speech, which can automatically identify whether negative tones exist in the customer service speech and automatically evaluate the customer service speech according to a detection result without manual evaluation, so that the evaluation accuracy is higher, and the quality inspection efficiency can be greatly improved.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow chart of a customer service voice assessment method according to an embodiment of the present invention;
FIG. 2 is a process diagram of a customer service voice assessment method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a intonation detection model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a customer service voice evaluation system according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a customer service voice evaluation device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present invention provides a customer service voice evaluation method, including the following steps:
s100: collecting customer service voice to be evaluated;
s200: dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
s300: inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
s400: and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
The invention provides a tone detection model for detecting the tone of the speech cavity of the customer service speech by adopting the customer service speech evaluation method, after the customer service speech is collected in the step S100, the customer service speech is segmented in the step S200, whether negative tones exist in the speech paragraphs of the customer service speech is automatically identified in the step S300, then the customer service speech is automatically evaluated according to the detection result in the step S400, manual evaluation is not needed, the evaluation accuracy is higher, the quality inspection efficiency can be greatly improved, the service quality of the customer service is ensured, and the customer service is helped to find the problems existing in the service.
As shown in fig. 2, in step S100, the customer service voice to be evaluated is collected, and the audio data of the customer service voice may be obtained specifically according to the identification information eid of the customer service staff and the call identification information callid.
As shown in fig. 2, in this embodiment, the step S200: the method for dividing the customer service voice to be evaluated into a plurality of voice paragraphs comprises the following steps:
the customer service voice is recognized based on an automatic voice recognition technology ASR to obtain a voice text corresponding to the customer service voice, and the voice recognition technology adopted by the invention can be the existing voice recognition technology in the prior art and is used for recognizing the content of the audio data of the customer service voice;
and segmenting the customer service voice based on the voice text. In particular, each speech passage may correspond to a phrase.
Since there is a case where a small segment of audio is mostly silent or noisy, it is necessary to perform endpoint Detection on all audio after cutting by using VAD (Voice Activity Detection) to filter out noise and remove a speech segment that does not contain speech text.
In this embodiment, the step S300: inputting each speech paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the speech paragraph, wherein the method comprises the following steps:
extracting the emotional features of the speech paragraphs, wherein the extraction of the emotional features can also adopt a deep learning model, such as a convolutional neural network model comprising a plurality of convolutional layers;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
When the intonation detection model is trained, customer service voice serving as a sample also needs to be collected, then segmentation is carried out to obtain a plurality of voice paragraphs, and the voice paragraphs which do not contain voice texts are removed. And the speech passage needs to be manually labeled. For example, when three negative tones of speech are required to be detected, namely, intonation, no fluctuation of speech and tone sinking, the audio of each speech paragraph is labeled with three labels, namely, whether the intonation, the no fluctuation of speech and the tone sinking exist respectively.
As shown in fig. 2, in this embodiment, extracting the emotional features of the speech segment includes the following steps:
extracting LLDs (Low Level descriptors) features of the audio data of the speech paragraphs, wherein the LLDs refer to some low-Level features designed manually, are generally calculated on a frame of speech and are used for representing the features of the frame of speech;
based on the features of the LLDs, extracting (high Level Statistics functions) features of HSFs, wherein the HSFs are features obtained by performing Statistics on the basis of the LLDs, such as mean values, maximum values and the like. Based on the LLDs characteristics, 6373-dimensional HSFs are extracted and obtained to be used as speech emotion characteristics and input into a tone detection model.
As shown in fig. 2, in this embodiment, the intonation detection model includes a plurality of classifiers, the plurality of classifiers respectively correspond to a plurality of specific negative intonations in a one-to-one manner, and each classifier is configured to output a probability value that the corresponding specific negative intonation is included in the speech passage. For example, in this embodiment, three specific negative tones are set: impatience, restlessness, sunken intonation and no fluctuation of speaking. Correspondingly, three classifiers are set for detecting whether the speech paragraphs contain impatience, intonation sinking and no-fluctuation speech.
As shown in fig. 3, in this embodiment, the intonation detection model includes a feature extraction layer, a target task classification layer, and a speaker recognition layer. The feature extraction layer may include a plurality of convolution layers conv, a plurality of batch layers BatchNormalization, and an output layer dense. And the feature extraction layer is used for extracting features of the emotional features of the voice paragraphs and then respectively inputting the features into the target task classification layer and the speaker recognition layer. The target task classification layer outputs a probability value that the voice paragraph comprises a specific negative tone, and when the target task classification layer is used for detecting multiple specific negative tones, the target task classification layer comprises multiple corresponding classifiers. The speaker recognition layer is used for outputting probability values of the voice paragraphs corresponding to the speakers, namely, the detection of the speakers is realized. In this embodiment, the target task classification layer and the speaker recognition layer respectively include an output layer dense classifier and a softmax classifier, and the classification result of the speech cavity intonation and the classification result of the speaker are respectively obtained.
The invention aims to identify the tone of the speech cavity of the customer service, and detect whether the customer service has the situations of intonation, tone sinking and no fluctuation of speaking in the service process, which is a task of speech emotion identification neighborhood. The existing speech emotion corpus is very diverse and has larger difference. Because different emotions have different characteristics in a voice signal, the types of characteristics and emotion recognition methods used in the existing research are very various. In this embodiment, the set of ComParE features is used to extract the emotion features of the speech passage, and is set by INTERSPEECH 2013, Clailing sciences Challenge. These emotional characteristics include speaker-related information, and traditionally, the neutral speech of each speaker is used to normalize the emotional speech to eliminate the influence of the speaker information.
Emotional features are generally based on prosodic features, mainly including pitch, energy and duration-dependent prosody. The fundamental tone is generated by the vibration of the vocal cords, the frequency of the fundamental tone of male voice is between 100-200 HZ, while the frequency of female voice is between 200-350 HZ. Therefore, the emotional characteristics comprise information related to some speakers. In order to eliminate the influence brought by the information of the speaker, one way is to normalize the emotional characteristics by using the neutral emotional characteristics of the speaker. Such as normalizing the emotion using the F0 mean for neutral speech for each speaker and the F0 mean for overall neutral emotional speech. This approach, while eliminating the influence of the speaker to some extent, also requires neutral emotional speech of the person to be predicted at the time of prediction.
Based on the above, in order to eliminate the influence brought by the information of the speaker, the invention adopts antagonistic learning to train the voice detection model. As shown in fig. 3, the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer GRL, which keeps the transmission weights unchanged during forward propagation and reverses the gradient during backward propagation.
As shown in fig. 2, in this embodiment, the step S400: and obtaining an evaluation result of the customer service voice according to the detection result of the tone detection model on each voice paragraph, and judging whether the customer service voice is a problem voice based on the detection result of each voice paragraph and a judgment rule of the whole phone.
For example, in response to a negative tone of intonation, which is not sound enough, a guest is given a negative impression that the guest speech is considered as a problem speech if the detection result of at least one of the speech segments includes the specific negative tone.
And for other negative tones, the judgment can be carried out according to the proportion of the negative tones in the whole phone. For example, corresponding to negative emotions such as no fluctuation of speaking or dip of intonation, the step S400: the method comprises the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
In this embodiment, the calculating the percentage of the speech segment including the particular negative intonation in the customer service speech includes:
calculating the proportion of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice, namely the proportion of the voice duration; or
The ratio of the number of sentences including the speech passage of the specific negative tone to the total number of sentences of the customer service speech, i.e., the ratio of the number of sentences is calculated.
Therefore, the invention aims at the problems of more important speech cavity intonation in manual quality inspection: tone impatience, no fluctuation of speaking and tone subsidence, data labeling and training of corresponding models are carried out. Furthermore, the confrontation learning method adopted by the invention connects the speaker classification module by the gradient inversion layer, so that the gradient of the speaker identification module is multiplied by-1 during reverse transmission, and the design plays a role in confrontation, and information extracted by the feature extraction layer is less related to the speaker. The method used by the invention does not need to collect neutral voice of the target speaker to normalize the emotional voice during prediction, and is convenient to use. The method can replace manual quality control to detect the speech cavity intonation problem of the customer service, and has strong practicability.
As shown in fig. 4, an embodiment of the present invention further provides a customer service voice evaluation system, which is configured to implement the customer service voice evaluation method, and the system includes:
the voice acquisition module M100 is used for acquiring customer service voice to be evaluated;
a voice segmentation module M200, configured to segment the customer service voice to be evaluated into a plurality of voice paragraphs;
a intonation detection module M300, configured to input each speech paragraph into a trained intonation detection model, respectively, and detect whether a specific negative intonation exists in the speech paragraph;
and the voice evaluation module M400 is configured to obtain an evaluation result of the customer service voice according to the detection result of the intonation detection model for each voice paragraph.
The invention provides a tone detection model for detecting the tone of the speech cavity of the customer service speech by adopting the customer service speech evaluation system, after the customer service speech is collected by the speech collection module M100, the customer service speech is segmented by the speech segmentation module M200, whether negative tones exist in the speech paragraphs of the customer service speech is automatically identified by the tone detection module M300, then the customer service speech is automatically evaluated by the speech evaluation module M400 according to the detection result without manual evaluation, the evaluation accuracy is higher, the quality inspection efficiency can be greatly improved, the service quality of the customer service is ensured, and the customer service is helped to find the problems existing in the service.
In this embodiment, the speech segmentation module M200 segments the customer service speech to be evaluated into a plurality of speech paragraphs, including: the customer service voice is recognized based on an automatic voice recognition technology ASR to obtain a voice text corresponding to the customer service voice, and the voice recognition technology adopted by the invention can be the existing voice recognition technology in the prior art and is used for recognizing the content of the audio data of the customer service voice; and segmenting the customer service voice based on the voice text. In particular, each speech passage may correspond to a phrase.
Since there is a case where a small segment of audio is mostly silent or noisy, it is necessary to perform end point detection on all audio after cutting by VAD to filter out noise and remove speech segments not containing speech text.
In this embodiment, the customer service speech evaluation method further includes an emotion feature extraction module, configured to extract emotion features of the speech paragraphs, where the intonation detection module M300 inputs the emotion features of each speech paragraph into a trained intonation detection model respectively, and detects whether a specific negative intonation exists in the speech paragraph.
Further, the extracting the emotional features of the speech paragraphs by the emotional feature extraction module comprises: extracting LLDs (low level features) of the audio data of the speech paragraphs, wherein the LLDs refer to some low-level features designed by hand, are generally calculated on a frame of speech and are used for representing the features of the frame of speech; based on the features of the LLDs, extracting the features of HSFs, wherein the HSFs are features obtained by performing statistics on the basis of the LLDs, such as mean values, maximum values and the like. The 6373-dimensional HSFs are extracted and obtained as the speech emotion characteristics based on the LLDs characteristics.
In an implementation manner of this embodiment, the obtaining, by the speech evaluation module M400, an evaluation result of the customer service speech according to a detection result of the intonation detection model for each speech paragraph includes: and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
In another implementation manner of this embodiment, the obtaining, by the speech evaluation module M400, the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech paragraph includes: for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice; and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
The embodiment of the invention also provides customer service voice evaluation equipment, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the customer service voice assessment method via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 600 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the customer service voice assessment method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In the customer service voice evaluation device, the program in the memory is executed by the processor to realize the steps of the customer service voice evaluation method, so the computer storage medium can also obtain the technical effect of the customer service voice evaluation method.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the customer service voice evaluation method when being executed by a processor. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the section of the customer service voice assessment method above of this specification, when the program product is executed on the terminal device.
Referring to fig. 6, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The program in the computer storage medium realizes the steps of the customer service voice evaluation method when being executed by the processor, so the computer storage medium can also obtain the technical effect of the customer service voice evaluation method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (13)
1. A customer service voice evaluation method is characterized by comprising the following steps:
collecting customer service voice to be evaluated;
dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
2. The customer service voice evaluation method according to claim 1, wherein the step of dividing the customer service voice to be evaluated into a plurality of voice paragraphs comprises the steps of:
recognizing the customer service voice based on an automatic voice recognition technology to obtain a voice text corresponding to the customer service voice;
and segmenting the customer service voice based on the voice text, and removing the voice paragraphs which do not contain the voice text.
3. The method according to claim 1, wherein said inputting each speech segment into a trained intonation detection model to detect whether there is a particular negative intonation in said speech segment comprises the steps of:
extracting emotional features of the voice paragraphs;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
4. The customer service speech assessment method according to claim 3, wherein extracting the emotional features of the speech passage comprises the following steps:
extracting LLDs (likelihood of being confused) characteristics of the audio data of the speech passage;
based on the LLDs features, HSFs features are extracted.
5. The method according to claim 3, wherein the utterance detection model comprises a plurality of classifiers, the plurality of classifiers respectively correspond to a plurality of negative specific utterances in a one-to-one manner, and each classifier is configured to output a probability value of the corresponding negative specific utterance included in the speech passage.
6. The customer service speech assessment method according to claim 3, wherein the intonation detection model comprises a feature extraction layer, a target task classification layer and a speaker recognition layer;
the feature extraction layer is used for extracting features of emotional features of the voice paragraphs and then inputting the emotional features into the target task classification layer and the speaker recognition layer respectively, the target task classification layer outputs probability values that the voice paragraphs comprise specific negative tones, and the speaker recognition layer is used for outputting probability values that the voice paragraphs correspond to speakers.
7. The method as claimed in claim 6, wherein the utterance detection model is trained by counterlearning, the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer, the gradient inversion layer keeps the transmission weight unchanged during forward propagation and reverses the gradient during backward propagation.
8. The method according to claim 1, wherein obtaining the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech passage comprises the following steps:
and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
9. The method according to claim 1, wherein obtaining the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech passage comprises the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
10. The method of claim 9, wherein said calculating a percentage of speech segments in the customer service speech that include the particular negative intonation comprises:
calculating the ratio of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice; or
The ratio of the number of sentences including the speech passage of the particular negative intonation to the total number of sentences of the customer service speech is calculated.
11. A customer service voice evaluation system for implementing the customer service voice evaluation method according to any one of claims 1 to 10, the system comprising:
the voice acquisition module is used for acquiring customer service voice to be evaluated;
the voice segmentation module is used for segmenting the customer service voice to be evaluated into a plurality of voice paragraphs;
the intonation detection module is used for respectively inputting each voice paragraph into the trained intonation detection model and detecting whether a specific negative intonation exists in the voice paragraph;
and the voice evaluation module is used for obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
12. A customer service voice evaluation apparatus, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the customer service voice assessment method of any of claims 1 to 10 via execution of the executable instructions.
13. A computer-readable storage medium storing a program which, when executed by a processor, performs the steps of the customer service voice assessment method of any of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110116652.1A CN112885379A (en) | 2021-01-28 | 2021-01-28 | Customer service voice evaluation method, system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110116652.1A CN112885379A (en) | 2021-01-28 | 2021-01-28 | Customer service voice evaluation method, system, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112885379A true CN112885379A (en) | 2021-06-01 |
Family
ID=76053595
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110116652.1A Pending CN112885379A (en) | 2021-01-28 | 2021-01-28 | Customer service voice evaluation method, system, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112885379A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593529A (en) * | 2021-07-09 | 2021-11-02 | 北京字跳网络技术有限公司 | Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium |
WO2023100998A1 (en) * | 2021-12-03 | 2023-06-08 | パナソニックIpマネジメント株式会社 | Voice registration device and voice registration method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107452405A (en) * | 2017-08-16 | 2017-12-08 | 北京易真学思教育科技有限公司 | A kind of method and device that data evaluation is carried out according to voice content |
CN109753566A (en) * | 2019-01-09 | 2019-05-14 | 大连民族大学 | The model training method of cross-cutting sentiment analysis based on convolutional neural networks |
-
2021
- 2021-01-28 CN CN202110116652.1A patent/CN112885379A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107452405A (en) * | 2017-08-16 | 2017-12-08 | 北京易真学思教育科技有限公司 | A kind of method and device that data evaluation is carried out according to voice content |
CN109753566A (en) * | 2019-01-09 | 2019-05-14 | 大连民族大学 | The model training method of cross-cutting sentiment analysis based on convolutional neural networks |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113593529A (en) * | 2021-07-09 | 2021-11-02 | 北京字跳网络技术有限公司 | Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium |
WO2023100998A1 (en) * | 2021-12-03 | 2023-06-08 | パナソニックIpマネジメント株式会社 | Voice registration device and voice registration method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10878823B2 (en) | Voiceprint recognition method, device, terminal apparatus and storage medium | |
CN109859772B (en) | Emotion recognition method, emotion recognition device and computer-readable storage medium | |
CN109686383B (en) | Voice analysis method, device and storage medium | |
CN110648691B (en) | Emotion recognition method, device and system based on energy value of voice | |
CN112217947B (en) | Method, system, equipment and storage medium for transcribing text by customer service telephone voice | |
CN101930735A (en) | Speech emotion recognition equipment and speech emotion recognition method | |
EP4078579A1 (en) | Emotion detection in audio interactions | |
CN113205814B (en) | Voice data labeling method and device, electronic equipment and storage medium | |
CN113420556B (en) | Emotion recognition method, device, equipment and storage medium based on multi-mode signals | |
Swain et al. | Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition | |
CN111370030A (en) | Voice emotion detection method and device, storage medium and electronic equipment | |
CN113807103B (en) | Recruitment method, device, equipment and storage medium based on artificial intelligence | |
CN112885379A (en) | Customer service voice evaluation method, system, device and storage medium | |
CN110782902A (en) | Audio data determination method, apparatus, device and medium | |
CN112489623A (en) | Language identification model training method, language identification method and related equipment | |
CN114360557A (en) | Voice tone conversion method, model training method, device, equipment and medium | |
CN106157974A (en) | Text recites quality assessment device and method | |
CN112911072A (en) | Call center volume identification method and device, electronic equipment and storage medium | |
CN114373452A (en) | Voice abnormity identification and evaluation method and system based on deep learning | |
CN110797032A (en) | Voiceprint database establishing method and voiceprint identification method | |
CN114913859B (en) | Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium | |
Koolagudi et al. | Dravidian language classification from speech signal using spectral and prosodic features | |
Dave et al. | Speech recognition: A review | |
CN114627896A (en) | Voice evaluation method, device, equipment and storage medium | |
CN110782916B (en) | Multi-mode complaint identification method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210601 |
|
RJ01 | Rejection of invention patent application after publication |