CN112885379A - Customer service voice evaluation method, system, device and storage medium - Google Patents

Customer service voice evaluation method, system, device and storage medium Download PDF

Info

Publication number
CN112885379A
CN112885379A CN202110116652.1A CN202110116652A CN112885379A CN 112885379 A CN112885379 A CN 112885379A CN 202110116652 A CN202110116652 A CN 202110116652A CN 112885379 A CN112885379 A CN 112885379A
Authority
CN
China
Prior art keywords
voice
customer service
speech
intonation
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110116652.1A
Other languages
Chinese (zh)
Inventor
杜诗宣
任君
罗超
邹宇
李巍
严丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Travel Network Technology Shanghai Co Ltd
Original Assignee
Ctrip Travel Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Travel Network Technology Shanghai Co Ltd filed Critical Ctrip Travel Network Technology Shanghai Co Ltd
Priority to CN202110116652.1A priority Critical patent/CN112885379A/en
Publication of CN112885379A publication Critical patent/CN112885379A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5175Call or contact centers supervision arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a customer service voice evaluation method, a system, equipment and a storage medium, wherein the method comprises the following steps: collecting customer service voice to be evaluated; dividing the customer service voice to be evaluated into a plurality of voice paragraphs; inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph; and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph. The invention provides a tone detection model for detecting the tone of a speech cavity of customer service speech, which can automatically identify whether negative tones exist in the customer service speech and automatically evaluate the customer service speech according to a detection result without manual evaluation, so that the evaluation accuracy is higher, and the quality inspection efficiency can be greatly improved.

Description

Customer service voice evaluation method, system, device and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a customer service voice evaluation method, a customer service voice evaluation system, customer service voice evaluation equipment and a storage medium.
Background
As an OTA (Online Travel Agency) platform, the service quality of a call center customer service is important for important hubs of customers and businesses. The quality inspection is an important link for controlling the quality of service, and before the quality inspection of customer service, the quality inspection is generally carried out by an evaluation department in a manual method. The call center can generate massive calls every day, compared with the prior art, the quality inspection method has the advantages that human resources for quality inspection are limited, if the amount of the extracted samples is small, the extracted samples have high randomness and are difficult to represent the actual service quality of customer service, and the cost is increased by increasing the number of the samples. In short, manual quality inspection can only be performed by sampling inspection, and it is difficult to track and specifically analyze the performance of customer service. Meanwhile, the subjectivity of manual quality inspection is strong, and the possibility of standard inconsistency or error exists.
Quality management is an important block for operation management of the customer service center, and quality inspection is a standard for defining the quality of service of the customer service center. The examination of the service quality of the customer service is generally hooked with the satisfaction degree of the customer, but the satisfaction degree of the customer is sometimes related to whether the appeal proposed by the customer is met or not, and the relevance of the satisfaction degree of the customer service and the service quality of the customer service is small. At the moment, the service quality of the customer service is difficult to measure through the customer satisfaction degree, and the place needing improvement in the customer service is difficult to know in the mode. Therefore, relatively objective quality inspection standards are needed to control the quality of service of employees and indicate problems for the employees.
The tone of speech in the speech cavity is very important for the customer service staff in the call center, and the customer service staff should feel lovely and enthusiastic, mild and present during communication and should not show frigidity and impatience. However, the customer service personnel face guests and merchants throughout the year, and often deal with similar problems, and may generate lassitude, so that enthusiasm is lacked in communication, and the specific expression is that the intonation flat plate is not fluctuated, so that the guests have an undivided feeling; meanwhile, the customer service is diversified, and when the guest has negative emotion, the customer service cannot easily keep a positive tone.
The examination of the tone of the customer service language cavity is completed manually, and the manual quality inspection often has several problems: 1. the subjectivity is strong, and the quality testing personnel can not understand the assessment standard uniformly. 2. The randomness is strong, the call center can generate massive calls every day, and each call of each customer service person cannot be checked in a manual mode. The method can only be completely finished by a random sampling mode in a biased way, and has great randomness. Therefore, such quality control cannot help customers to find and improve problems through long-term tracking even if the performance of the customers is comprehensively known.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a customer service voice evaluation method, a system, equipment and a storage medium, which can automatically analyze customer service voice and improve the quality inspection efficiency of the customer service voice.
The embodiment of the invention provides a customer service voice evaluation method, which comprises the following steps:
collecting customer service voice to be evaluated;
dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
In some embodiments, segmenting the customer service voice to be evaluated into a plurality of voice paragraphs comprises:
recognizing the customer service voice based on an automatic voice recognition technology to obtain a voice text corresponding to the customer service voice;
and segmenting the customer service voice based on the voice text, and removing the voice paragraphs which do not contain the voice text.
In some embodiments, the inputting each speech segment into a trained intonation detection model respectively to detect whether a specific negative intonation exists in the speech segment includes the following steps:
extracting emotional features of the voice paragraphs;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
In some embodiments, extracting the emotional features of the speech passage comprises the following steps:
extracting LLDs (likelihood of being confused) characteristics of the audio data of the speech passage;
based on the LLDs features, HSFs features are extracted.
In some embodiments, the intonation detection model includes a plurality of classifiers, each of which corresponds to a corresponding negative intonation in a one-to-one manner, and each of the classifiers is configured to output a probability value that the corresponding negative intonation is included in the speech passage.
In some embodiments, the intonation detection model includes a feature extraction layer, a target task classification layer, and a speaker recognition layer;
the feature extraction layer is used for extracting features of emotional features of the voice paragraphs and then inputting the emotional features into the target task classification layer and the speaker recognition layer respectively, the target task classification layer outputs probability values that the voice paragraphs comprise specific negative tones, and the speaker recognition layer is used for outputting probability values that the voice paragraphs correspond to speakers.
In some embodiments, the intonation detection model is trained using counterlearning, and the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer, wherein the gradient inversion layer keeps the transmission weight unchanged during forward propagation and reverses the gradient during backward propagation.
In some embodiments, obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model for each of the voice paragraphs includes the following steps:
and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
In some embodiments, obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model for each of the voice paragraphs includes the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
In some embodiments, said calculating a percentage of speech segments comprising the particular negative intonation that fall within said customer service speech comprises:
calculating the ratio of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice; or
The ratio of the number of sentences including the speech passage of the particular negative intonation to the total number of sentences of the customer service speech is calculated.
The embodiment of the invention also provides a customer service voice evaluation system, which is used for realizing the customer service voice evaluation method, and the system comprises:
the voice acquisition module is used for acquiring customer service voice to be evaluated;
the voice segmentation module is used for segmenting the customer service voice to be evaluated into a plurality of voice paragraphs;
the intonation detection module is used for respectively inputting each voice paragraph into the trained intonation detection model and detecting whether a specific negative intonation exists in the voice paragraph;
and the voice evaluation module is used for obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
An embodiment of the present invention further provides a customer service voice evaluation device, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the customer service voice assessment method via execution of the executable instructions.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the customer service voice evaluation method when being executed by a processor.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The customer service voice evaluation method, the customer service voice evaluation system, the customer service voice evaluation equipment and the storage medium have the following beneficial effects:
the invention provides a tone detection model for detecting the tone of a speech cavity of customer service speech, which can automatically identify whether negative tones exist in the customer service speech and automatically evaluate the customer service speech according to a detection result without manual evaluation, so that the evaluation accuracy is higher, and the quality inspection efficiency can be greatly improved.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
FIG. 1 is a flow chart of a customer service voice assessment method according to an embodiment of the present invention;
FIG. 2 is a process diagram of a customer service voice assessment method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a intonation detection model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a customer service voice evaluation system according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a customer service voice evaluation device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present invention provides a customer service voice evaluation method, including the following steps:
s100: collecting customer service voice to be evaluated;
s200: dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
s300: inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
s400: and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
The invention provides a tone detection model for detecting the tone of the speech cavity of the customer service speech by adopting the customer service speech evaluation method, after the customer service speech is collected in the step S100, the customer service speech is segmented in the step S200, whether negative tones exist in the speech paragraphs of the customer service speech is automatically identified in the step S300, then the customer service speech is automatically evaluated according to the detection result in the step S400, manual evaluation is not needed, the evaluation accuracy is higher, the quality inspection efficiency can be greatly improved, the service quality of the customer service is ensured, and the customer service is helped to find the problems existing in the service.
As shown in fig. 2, in step S100, the customer service voice to be evaluated is collected, and the audio data of the customer service voice may be obtained specifically according to the identification information eid of the customer service staff and the call identification information callid.
As shown in fig. 2, in this embodiment, the step S200: the method for dividing the customer service voice to be evaluated into a plurality of voice paragraphs comprises the following steps:
the customer service voice is recognized based on an automatic voice recognition technology ASR to obtain a voice text corresponding to the customer service voice, and the voice recognition technology adopted by the invention can be the existing voice recognition technology in the prior art and is used for recognizing the content of the audio data of the customer service voice;
and segmenting the customer service voice based on the voice text. In particular, each speech passage may correspond to a phrase.
Since there is a case where a small segment of audio is mostly silent or noisy, it is necessary to perform endpoint Detection on all audio after cutting by using VAD (Voice Activity Detection) to filter out noise and remove a speech segment that does not contain speech text.
In this embodiment, the step S300: inputting each speech paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the speech paragraph, wherein the method comprises the following steps:
extracting the emotional features of the speech paragraphs, wherein the extraction of the emotional features can also adopt a deep learning model, such as a convolutional neural network model comprising a plurality of convolutional layers;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
When the intonation detection model is trained, customer service voice serving as a sample also needs to be collected, then segmentation is carried out to obtain a plurality of voice paragraphs, and the voice paragraphs which do not contain voice texts are removed. And the speech passage needs to be manually labeled. For example, when three negative tones of speech are required to be detected, namely, intonation, no fluctuation of speech and tone sinking, the audio of each speech paragraph is labeled with three labels, namely, whether the intonation, the no fluctuation of speech and the tone sinking exist respectively.
As shown in fig. 2, in this embodiment, extracting the emotional features of the speech segment includes the following steps:
extracting LLDs (Low Level descriptors) features of the audio data of the speech paragraphs, wherein the LLDs refer to some low-Level features designed manually, are generally calculated on a frame of speech and are used for representing the features of the frame of speech;
based on the features of the LLDs, extracting (high Level Statistics functions) features of HSFs, wherein the HSFs are features obtained by performing Statistics on the basis of the LLDs, such as mean values, maximum values and the like. Based on the LLDs characteristics, 6373-dimensional HSFs are extracted and obtained to be used as speech emotion characteristics and input into a tone detection model.
As shown in fig. 2, in this embodiment, the intonation detection model includes a plurality of classifiers, the plurality of classifiers respectively correspond to a plurality of specific negative intonations in a one-to-one manner, and each classifier is configured to output a probability value that the corresponding specific negative intonation is included in the speech passage. For example, in this embodiment, three specific negative tones are set: impatience, restlessness, sunken intonation and no fluctuation of speaking. Correspondingly, three classifiers are set for detecting whether the speech paragraphs contain impatience, intonation sinking and no-fluctuation speech.
As shown in fig. 3, in this embodiment, the intonation detection model includes a feature extraction layer, a target task classification layer, and a speaker recognition layer. The feature extraction layer may include a plurality of convolution layers conv, a plurality of batch layers BatchNormalization, and an output layer dense. And the feature extraction layer is used for extracting features of the emotional features of the voice paragraphs and then respectively inputting the features into the target task classification layer and the speaker recognition layer. The target task classification layer outputs a probability value that the voice paragraph comprises a specific negative tone, and when the target task classification layer is used for detecting multiple specific negative tones, the target task classification layer comprises multiple corresponding classifiers. The speaker recognition layer is used for outputting probability values of the voice paragraphs corresponding to the speakers, namely, the detection of the speakers is realized. In this embodiment, the target task classification layer and the speaker recognition layer respectively include an output layer dense classifier and a softmax classifier, and the classification result of the speech cavity intonation and the classification result of the speaker are respectively obtained.
The invention aims to identify the tone of the speech cavity of the customer service, and detect whether the customer service has the situations of intonation, tone sinking and no fluctuation of speaking in the service process, which is a task of speech emotion identification neighborhood. The existing speech emotion corpus is very diverse and has larger difference. Because different emotions have different characteristics in a voice signal, the types of characteristics and emotion recognition methods used in the existing research are very various. In this embodiment, the set of ComParE features is used to extract the emotion features of the speech passage, and is set by INTERSPEECH 2013, Clailing sciences Challenge. These emotional characteristics include speaker-related information, and traditionally, the neutral speech of each speaker is used to normalize the emotional speech to eliminate the influence of the speaker information.
Emotional features are generally based on prosodic features, mainly including pitch, energy and duration-dependent prosody. The fundamental tone is generated by the vibration of the vocal cords, the frequency of the fundamental tone of male voice is between 100-200 HZ, while the frequency of female voice is between 200-350 HZ. Therefore, the emotional characteristics comprise information related to some speakers. In order to eliminate the influence brought by the information of the speaker, one way is to normalize the emotional characteristics by using the neutral emotional characteristics of the speaker. Such as normalizing the emotion using the F0 mean for neutral speech for each speaker and the F0 mean for overall neutral emotional speech. This approach, while eliminating the influence of the speaker to some extent, also requires neutral emotional speech of the person to be predicted at the time of prediction.
Based on the above, in order to eliminate the influence brought by the information of the speaker, the invention adopts antagonistic learning to train the voice detection model. As shown in fig. 3, the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer GRL, which keeps the transmission weights unchanged during forward propagation and reverses the gradient during backward propagation.
As shown in fig. 2, in this embodiment, the step S400: and obtaining an evaluation result of the customer service voice according to the detection result of the tone detection model on each voice paragraph, and judging whether the customer service voice is a problem voice based on the detection result of each voice paragraph and a judgment rule of the whole phone.
For example, in response to a negative tone of intonation, which is not sound enough, a guest is given a negative impression that the guest speech is considered as a problem speech if the detection result of at least one of the speech segments includes the specific negative tone.
And for other negative tones, the judgment can be carried out according to the proportion of the negative tones in the whole phone. For example, corresponding to negative emotions such as no fluctuation of speaking or dip of intonation, the step S400: the method comprises the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
In this embodiment, the calculating the percentage of the speech segment including the particular negative intonation in the customer service speech includes:
calculating the proportion of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice, namely the proportion of the voice duration; or
The ratio of the number of sentences including the speech passage of the specific negative tone to the total number of sentences of the customer service speech, i.e., the ratio of the number of sentences is calculated.
Therefore, the invention aims at the problems of more important speech cavity intonation in manual quality inspection: tone impatience, no fluctuation of speaking and tone subsidence, data labeling and training of corresponding models are carried out. Furthermore, the confrontation learning method adopted by the invention connects the speaker classification module by the gradient inversion layer, so that the gradient of the speaker identification module is multiplied by-1 during reverse transmission, and the design plays a role in confrontation, and information extracted by the feature extraction layer is less related to the speaker. The method used by the invention does not need to collect neutral voice of the target speaker to normalize the emotional voice during prediction, and is convenient to use. The method can replace manual quality control to detect the speech cavity intonation problem of the customer service, and has strong practicability.
As shown in fig. 4, an embodiment of the present invention further provides a customer service voice evaluation system, which is configured to implement the customer service voice evaluation method, and the system includes:
the voice acquisition module M100 is used for acquiring customer service voice to be evaluated;
a voice segmentation module M200, configured to segment the customer service voice to be evaluated into a plurality of voice paragraphs;
a intonation detection module M300, configured to input each speech paragraph into a trained intonation detection model, respectively, and detect whether a specific negative intonation exists in the speech paragraph;
and the voice evaluation module M400 is configured to obtain an evaluation result of the customer service voice according to the detection result of the intonation detection model for each voice paragraph.
The invention provides a tone detection model for detecting the tone of the speech cavity of the customer service speech by adopting the customer service speech evaluation system, after the customer service speech is collected by the speech collection module M100, the customer service speech is segmented by the speech segmentation module M200, whether negative tones exist in the speech paragraphs of the customer service speech is automatically identified by the tone detection module M300, then the customer service speech is automatically evaluated by the speech evaluation module M400 according to the detection result without manual evaluation, the evaluation accuracy is higher, the quality inspection efficiency can be greatly improved, the service quality of the customer service is ensured, and the customer service is helped to find the problems existing in the service.
In this embodiment, the speech segmentation module M200 segments the customer service speech to be evaluated into a plurality of speech paragraphs, including: the customer service voice is recognized based on an automatic voice recognition technology ASR to obtain a voice text corresponding to the customer service voice, and the voice recognition technology adopted by the invention can be the existing voice recognition technology in the prior art and is used for recognizing the content of the audio data of the customer service voice; and segmenting the customer service voice based on the voice text. In particular, each speech passage may correspond to a phrase.
Since there is a case where a small segment of audio is mostly silent or noisy, it is necessary to perform end point detection on all audio after cutting by VAD to filter out noise and remove speech segments not containing speech text.
In this embodiment, the customer service speech evaluation method further includes an emotion feature extraction module, configured to extract emotion features of the speech paragraphs, where the intonation detection module M300 inputs the emotion features of each speech paragraph into a trained intonation detection model respectively, and detects whether a specific negative intonation exists in the speech paragraph.
Further, the extracting the emotional features of the speech paragraphs by the emotional feature extraction module comprises: extracting LLDs (low level features) of the audio data of the speech paragraphs, wherein the LLDs refer to some low-level features designed by hand, are generally calculated on a frame of speech and are used for representing the features of the frame of speech; based on the features of the LLDs, extracting the features of HSFs, wherein the HSFs are features obtained by performing statistics on the basis of the LLDs, such as mean values, maximum values and the like. The 6373-dimensional HSFs are extracted and obtained as the speech emotion characteristics based on the LLDs characteristics.
In an implementation manner of this embodiment, the obtaining, by the speech evaluation module M400, an evaluation result of the customer service speech according to a detection result of the intonation detection model for each speech paragraph includes: and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
In another implementation manner of this embodiment, the obtaining, by the speech evaluation module M400, the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech paragraph includes: for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice; and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
The embodiment of the invention also provides customer service voice evaluation equipment, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the customer service voice assessment method via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 600 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the customer service voice assessment method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In the customer service voice evaluation device, the program in the memory is executed by the processor to realize the steps of the customer service voice evaluation method, so the computer storage medium can also obtain the technical effect of the customer service voice evaluation method.
The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program realizes the steps of the customer service voice evaluation method when being executed by a processor. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the section of the customer service voice assessment method above of this specification, when the program product is executed on the terminal device.
Referring to fig. 6, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The program in the computer storage medium realizes the steps of the customer service voice evaluation method when being executed by the processor, so the computer storage medium can also obtain the technical effect of the customer service voice evaluation method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (13)

1. A customer service voice evaluation method is characterized by comprising the following steps:
collecting customer service voice to be evaluated;
dividing the customer service voice to be evaluated into a plurality of voice paragraphs;
inputting each voice paragraph into a trained intonation detection model respectively, and detecting whether a specific negative intonation exists in the voice paragraph;
and obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
2. The customer service voice evaluation method according to claim 1, wherein the step of dividing the customer service voice to be evaluated into a plurality of voice paragraphs comprises the steps of:
recognizing the customer service voice based on an automatic voice recognition technology to obtain a voice text corresponding to the customer service voice;
and segmenting the customer service voice based on the voice text, and removing the voice paragraphs which do not contain the voice text.
3. The method according to claim 1, wherein said inputting each speech segment into a trained intonation detection model to detect whether there is a particular negative intonation in said speech segment comprises the steps of:
extracting emotional features of the voice paragraphs;
and inputting the emotional characteristics of the voice paragraphs into a trained intonation detection model to obtain a negative intonation detection result output by the intonation detection model.
4. The customer service speech assessment method according to claim 3, wherein extracting the emotional features of the speech passage comprises the following steps:
extracting LLDs (likelihood of being confused) characteristics of the audio data of the speech passage;
based on the LLDs features, HSFs features are extracted.
5. The method according to claim 3, wherein the utterance detection model comprises a plurality of classifiers, the plurality of classifiers respectively correspond to a plurality of negative specific utterances in a one-to-one manner, and each classifier is configured to output a probability value of the corresponding negative specific utterance included in the speech passage.
6. The customer service speech assessment method according to claim 3, wherein the intonation detection model comprises a feature extraction layer, a target task classification layer and a speaker recognition layer;
the feature extraction layer is used for extracting features of emotional features of the voice paragraphs and then inputting the emotional features into the target task classification layer and the speaker recognition layer respectively, the target task classification layer outputs probability values that the voice paragraphs comprise specific negative tones, and the speaker recognition layer is used for outputting probability values that the voice paragraphs correspond to speakers.
7. The method as claimed in claim 6, wherein the utterance detection model is trained by counterlearning, the feature extraction layer is connected to the speaker recognition layer through a gradient inversion layer, the gradient inversion layer keeps the transmission weight unchanged during forward propagation and reverses the gradient during backward propagation.
8. The method according to claim 1, wherein obtaining the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech passage comprises the following steps:
and for a specific negative tone, if the detection result of at least one voice paragraph in the voice paragraphs includes the specific negative tone, the customer service voice is determined as the problem voice.
9. The method according to claim 1, wherein obtaining the evaluation result of the customer service speech according to the detection result of the intonation detection model for each speech passage comprises the following steps:
for a specific negative tone, calculating the proportion of the voice segment comprising the specific negative tone in the customer service voice;
and if the calculated occupation ratio is larger than a preset occupation ratio threshold value, the customer service voice is determined as the problem voice.
10. The method of claim 9, wherein said calculating a percentage of speech segments in the customer service speech that include the particular negative intonation comprises:
calculating the ratio of the voice duration of the voice paragraph comprising the specific negative tone in the overall duration of the customer service voice; or
The ratio of the number of sentences including the speech passage of the particular negative intonation to the total number of sentences of the customer service speech is calculated.
11. A customer service voice evaluation system for implementing the customer service voice evaluation method according to any one of claims 1 to 10, the system comprising:
the voice acquisition module is used for acquiring customer service voice to be evaluated;
the voice segmentation module is used for segmenting the customer service voice to be evaluated into a plurality of voice paragraphs;
the intonation detection module is used for respectively inputting each voice paragraph into the trained intonation detection model and detecting whether a specific negative intonation exists in the voice paragraph;
and the voice evaluation module is used for obtaining the evaluation result of the customer service voice according to the detection result of the intonation detection model on each voice paragraph.
12. A customer service voice evaluation apparatus, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the customer service voice assessment method of any of claims 1 to 10 via execution of the executable instructions.
13. A computer-readable storage medium storing a program which, when executed by a processor, performs the steps of the customer service voice assessment method of any of claims 1 to 10.
CN202110116652.1A 2021-01-28 2021-01-28 Customer service voice evaluation method, system, device and storage medium Pending CN112885379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110116652.1A CN112885379A (en) 2021-01-28 2021-01-28 Customer service voice evaluation method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110116652.1A CN112885379A (en) 2021-01-28 2021-01-28 Customer service voice evaluation method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN112885379A true CN112885379A (en) 2021-06-01

Family

ID=76053595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110116652.1A Pending CN112885379A (en) 2021-01-28 2021-01-28 Customer service voice evaluation method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN112885379A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593529A (en) * 2021-07-09 2021-11-02 北京字跳网络技术有限公司 Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium
WO2023100998A1 (en) * 2021-12-03 2023-06-08 パナソニックIpマネジメント株式会社 Voice registration device and voice registration method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452405A (en) * 2017-08-16 2017-12-08 北京易真学思教育科技有限公司 A kind of method and device that data evaluation is carried out according to voice content
CN109753566A (en) * 2019-01-09 2019-05-14 大连民族大学 The model training method of cross-cutting sentiment analysis based on convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452405A (en) * 2017-08-16 2017-12-08 北京易真学思教育科技有限公司 A kind of method and device that data evaluation is carried out according to voice content
CN109753566A (en) * 2019-01-09 2019-05-14 大连民族大学 The model training method of cross-cutting sentiment analysis based on convolutional neural networks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593529A (en) * 2021-07-09 2021-11-02 北京字跳网络技术有限公司 Evaluation method and device for speaker separation algorithm, electronic equipment and storage medium
WO2023100998A1 (en) * 2021-12-03 2023-06-08 パナソニックIpマネジメント株式会社 Voice registration device and voice registration method

Similar Documents

Publication Publication Date Title
US10878823B2 (en) Voiceprint recognition method, device, terminal apparatus and storage medium
CN109859772B (en) Emotion recognition method, emotion recognition device and computer-readable storage medium
CN109686383B (en) Voice analysis method, device and storage medium
CN110648691B (en) Emotion recognition method, device and system based on energy value of voice
CN112217947B (en) Method, system, equipment and storage medium for transcribing text by customer service telephone voice
CN101930735A (en) Speech emotion recognition equipment and speech emotion recognition method
EP4078579A1 (en) Emotion detection in audio interactions
CN113205814B (en) Voice data labeling method and device, electronic equipment and storage medium
CN113420556B (en) Emotion recognition method, device, equipment and storage medium based on multi-mode signals
Swain et al. Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition
CN111370030A (en) Voice emotion detection method and device, storage medium and electronic equipment
CN113807103B (en) Recruitment method, device, equipment and storage medium based on artificial intelligence
CN112885379A (en) Customer service voice evaluation method, system, device and storage medium
CN110782902A (en) Audio data determination method, apparatus, device and medium
CN112489623A (en) Language identification model training method, language identification method and related equipment
CN114360557A (en) Voice tone conversion method, model training method, device, equipment and medium
CN106157974A (en) Text recites quality assessment device and method
CN112911072A (en) Call center volume identification method and device, electronic equipment and storage medium
CN114373452A (en) Voice abnormity identification and evaluation method and system based on deep learning
CN110797032A (en) Voiceprint database establishing method and voiceprint identification method
CN114913859B (en) Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium
Koolagudi et al. Dravidian language classification from speech signal using spectral and prosodic features
Dave et al. Speech recognition: A review
CN114627896A (en) Voice evaluation method, device, equipment and storage medium
CN110782916B (en) Multi-mode complaint identification method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601

RJ01 Rejection of invention patent application after publication