CN109767791B

CN109767791B - Voice emotion recognition and application system for call center calls

Info

Publication number: CN109767791B
Application number: CN201910217722.5A
Authority: CN
Inventors: 林僚; 梁冬明; 张超婧; 韦建福; 蒋莉芳
Original assignee: China Asean Information Harbor Co ltd
Current assignee: China Asean Information Harbor Co ltd
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2021-03-30
Anticipated expiration: 2039-03-21
Also published as: CN109767791A

Abstract

A speech emotion recognition and application system for call center calls comprises a speech processing module, a speech recognition module and a speech recognition module, wherein the speech processing module is used for extracting speech information and preprocessing the speech information; the voice keyword detection module is used for identifying the voice data of the voice characteristic analysis submodule into emotion keywords and theme keywords and acquiring emotion data information and reacted problem data information; the emotion model set module is used for dynamically capturing and tracking the emotion state of the caller; the emotion classification module is used for judging the emotion category of the voice in the call to be detected; and the business application module is used for providing response assistance for customer service personnel and management assistance for management personnel. The voice emotion recognition and application system for call of the call center can enable customer service staff to accurately know the emotion of a customer, provide an effective response scheme and accurately assess the customer service staff.

Description

Voice emotion recognition and application system for call center calls

Technical Field

The invention relates to the technical field of audio data processing, in particular to a speech emotion recognition and application system for call center calls.

Background

In modern enterprises, a call center bears important responsibility for maintaining enterprise customer relations and business marketing, and has important significance for monitoring the voice service quality of the call center. Emotion analysis is carried out on call voice, emotion states of customer service and customers in customer service calls can be identified, and therefore quality of service is effectively tracked and monitored. In the existing call center communication emotion recognition scheme, voice is mostly converted into text content through a voice recognition technology, and emotion analysis is carried out according to the text. On one hand, the method depends on the accuracy and robustness of the voice recognition model, and a certain conversion error rate exists in the process of converting the voice recognition model into characters; on the other hand, after the voice is converted into the text, the emotional information contained in the voice is lost, and the change in the strength, tone, speed and the like of the voice can effectively reflect the emotion of a person. Both factors affect the accuracy of speech emotion recognition.

At present, schemes for classifying based on voice also exist in call emotion recognition of a call center, the schemes only adopt a single model basically when an emotion classification model is used, due to the diversity of callers, scenes and call contents, the emotion recognition effect of the single model is often difficult to achieve good stability, emotion recognition can be carried out only, and key information reflected by a client in a call with problems cannot be found in time. The service quality of customer service personnel who call in the call center is often scored manually by the customer, and the scoring mode is often inaccurate, so that the customer scores according to the customer service personnel completely depending on the subjectivity of the customer or the requirement of the customer service personnel on the scoring of the customer, sometimes, the customer does not score the call in order to save time, which is not beneficial to the development of enterprises and cannot really know the customer. Because the customer service staff is not professional technical staff, the customer service staff sometimes cannot well deal with the product defects proposed by the customers; in addition, in the face of clients with diversified emotions, the customer service staff are difficult to deal with in an effective processing mode, especially for novice customer service staff.

Disclosure of Invention

In order to solve the problems, the invention provides a voice emotion recognition and application system for call center calls, which can enable customer service staff to accurately know the emotion of a customer, provide an effective response scheme and accurately assess the customer service staff.

In order to achieve the purpose, the invention adopts the technical scheme that:

a speech emotion recognition and application system for call center calls comprises a speech processing module, a speech keyword detection submodule, an emotion model set module, an emotion classification module, a business application module and a database module;

the voice processing module comprises a voice extraction submodule and a voice feature analysis submodule, and the voice extraction submodule is used for acquiring voice in a call to be detected; the voice feature analysis submodule is used for receiving voice data of the voice extraction submodule and processing the voice of the voice extraction submodule in a pre-emphasis, windowing and framing and end point detection mode so as to obtain the voice rhythm, the voice quality and the frequency spectrum features of the voice extraction submodule;

the voice keyword detection module is used for receiving voice data of the voice characteristic analysis submodule and identifying emotion keywords and theme keywords by establishing a keyword library so as to obtain emotion data information of a client and data information of a reacted problem in the voice extraction submodule;

the emotion model set module is used for storing a plurality of different emotion classification model sets and receiving data information of the voice characteristic analysis submodule so as to dynamically capture and track the emotion state of a caller;

the emotion classification module is used for acquiring data information of the voice keyword detection module and the emotion model set module and judging emotion types of voices in the call to be detected by adopting a classification model;

the business application module comprises a customer information display submodule, an emotion display submodule, a response prompt submodule, an assessment data submodule and an enterprise problem analysis submodule, wherein the customer information display submodule is in communication connection with the product sales recording platform and is used for displaying the purchase information of a customer on the product sales recording platform according to the telephone number of the customer; the emotion display sub-module is used for receiving the data information of the voice keyword detection sub-module and the emotion classification module and displaying emotion tendency information of the client in the current call in real time; the response prompting submodule comprises a response scheme database and a response prompting frame, wherein the response scheme database is used for storing product related information, processing schemes for dealing with different emotion types, response terms and data information of a problem processing flow; the response prompt box is used for automatically generating and displaying a response prompt scheme by combining the voice keyword detection sub-module, the emotion classification module and the response scheme database data through a machine learning algorithm; the examination data submodule is used for examining the quality of the customer service according to the data of the emotion classification module; the enterprise problem analysis submodule is used for analyzing the product condition according to the data of the voice keyword detection module;

and the database module is used for storing and sending the data of the voice keyword detection module, the emotion model set module, the emotion classification module and the business application module.

Further, the voice feature analysis sub-module obtains different features of short-time energy, average amplitude, short-time average zero crossing rate, pitch frequency, formants, mel frequency spectrum features, linear prediction cepstrum coefficients, mel cepstrum coefficients and speech spectrogram in a time domain or a frequency domain from the voice rhythm, tone quality and frequency spectrum features, and analyzes and calculates statistics of maximum values, minimum values, ranges, frames where the maximum values are located, frames where the minimum values are located, mean values, linear approximation slopes, linear approximation offsets, linear approximation secondary deviations, standard deviations, skewness, kurtosis, first-order differences and second-order differences of the different features.

Furthermore, the voice keyword detection module comprises an emotion weight database and a keyword extraction submodule, wherein the emotion weight database is used for establishing and storing an emotion weight database of the keywords; the keyword extraction submodule is used for matching the data of the voice characteristic analysis submodule with the data of the emotion weight database through an acoustic model, a language model, a pronunciation dictionary and a decoder to identify emotion keywords and theme keywords and counting the frequency of the emotion keywords and the frequency of the theme keywords in voice, and the keyword extraction submodule can also endow a weight value for the emotional tendency of the emotion keywords according to the data of the emotion weight database and score each emotion of the voice by combining and analyzing the weight value and the frequency of the emotion keywords; the emotion model set module is used for scoring the speech emotion tendency of the emotion characteristics of the speech processing module in each model by training a hidden Markov model, a Gaussian mixture model, a support vector machine model, an artificial neural network, a convolutional neural network model and a long-short term memory network model and combining the models.

Further, the emotion classification module judges the emotion category of the voice obtained by the voice feature analysis submodule by adopting a voting method, a scoring method and a combination method according to different model data provided by the voice keyword detection module and the emotion model set module;

the voting method comprises the steps of obtaining emotion classification results of each model in the keyword extraction submodule and the emotion model set module, counting the number of the models of which the current voice is judged to be a certain emotion, and taking the emotion category with the most votes as an identification result;

the scoring method comprises the steps of obtaining scoring values in the keyword extraction submodule and the emotion model set module, forming new features of the scoring values, inputting the new features into the trained decision tree, SVM and neural network classification model for calculation, and outputting emotion recognition results.

The combination method combines the scoring values in the keyword extraction submodule and the emotion model set module with the voice feature data of the voice feature analysis submodule to form new voice features, and the new voice features are trained and classified and calculated through a decision tree, an SVM and a neural network classification model to obtain emotion recognition results.

Furthermore, the database module establishes a connection channel between a data storage end and a Web end by using WebSocket, and provides instant data service for data transmission among the voice keyword detection module, the emotion classification module and the business application module.

Furthermore, the assessment data submodule is used for analyzing and calculating the situations of all call emotions of each customer service person in a certain period by acquiring the data of the emotion classification module, and can also automatically generate a statistical table and a statistical graph according to the situations of the call emotions.

Further, the enterprise problem analysis sub-module is configured to obtain the topic keyword in the voice keyword detection module, and analyze and calculate the topic keyword in the voice keyword detection module to collect and count the key problems reflected by the customer.

Furthermore, the business module obtains corresponding data information through a search calculation engine, and the search calculation engine provides data access support through a multi-source heterogeneous data access component and a metadata management and access module.

The invention has the beneficial effects that:

1. compared with a continuous voice recognition technology, the voice keyword detection module extracts keywords from the data of the voice processing module, the voice keyword detection does not need to recognize the whole voice stream, only needs to construct a interested keyword list, has better flexibility, has lower requirements on grammar, environmental noise and the like, and is more suitable for complex conversation scenes, and the voice keyword detection submodule can also recognize emotion keywords and theme keywords so as to facilitate the subsequent process to analyze the emotion tendency of a client and the problems to be solved; the emotion model set module dynamically captures and tracks the emotion state of the client through various training models; the emotion classification module integrates the voice keyword detection module and the emotion model set module, trains a plurality of machine learning models and deep learning models, exerts respective advantages of the multiple models, improves the accuracy and robustness of model recognition, and ensures the accuracy of emotion recognition in voice; the service module acquires data of the voice keyword detection submodule, the emotion model set module and the emotion classification module, can be accessed to a product sale recording platform, and is associated with a sold product through a client telephone, so that customer service personnel can know the product purchased by the client, thereby mastering the information of the client and facilitating the communication; meanwhile, the system has the functions of real-time response scheme, customer service performance assessment and enterprise problem analysis, provides the emotion of the customer in real time for customer service personnel, and gives real-time, accurate and standard response prompt, so that the customer service personnel can accurately process the problems of the customer.

2. The voice characteristic analysis submodule acquires different characteristics of short-time energy, average amplitude, short-time average zero-crossing rate, fundamental tone frequency, formants, Mel frequency spectrum characteristics, linear prediction cepstrum coefficients, Mel cepstrum coefficients and voice spectrogram in a time domain or a frequency domain from the characteristics of the voice rate, the voice quality and the frequency spectrum, and calculates related statistics, so that abundant and comprehensive voice characteristics are extracted, limitation of single type or dimension characteristics on expression of emotion information is avoided, and necessary means is provided for emotion recognition.

3. The keyword extraction submodule 22 has a function of continuous training learning, and can supplement, train and learn new keywords, so that the system has good expansibility, and a large amount of network vocabularies often appear from the mouths of customers due to rapid development of a network.

4. The examination data submodule analyzes and calculates all communication emotion conditions of each customer service staff in a certain period, and examines the customer service staff according to the proportion of each emotion type, so that the error of the traditional examination is avoided, and the quality of the customer service staff can be effectively improved. The enterprise problem analysis submodule can count the frequency of the problems of each product according to the voice keyword detection module, so that managers can know the operation condition of the enterprise products, problems can be found timely, and the products can be optimized in a targeted mode.

Drawings

Fig. 1 is a block diagram of a speech emotion recognition and application system for a call center call according to a preferred embodiment of the present invention.

In the figure, 1-a voice processing module, 11-a voice extraction sub-module, 12-a voice feature analysis sub-module, 2-a voice keyword detection module, 21-an emotion weight database, 22-a keyword extraction sub-module, 3-an emotion model set module, 4-an emotion classification module, 5-a business application module, 51-a customer information display sub-module, 52-an emotion display sub-module, 53-a response prompting sub-module, 54-an assessment data sub-module and 55-an enterprise problem analysis sub-module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, a voice emotion recognition and application system for call center calls in a preferred embodiment of the present invention includes a voice processing module 1, a voice keyword detection sub-module 2, an emotion model set module 3, an emotion classification module 4, a service application module 5, and a database module 6.

The voice processing module 1 comprises a voice extraction submodule 11 and a voice feature analysis submodule 12, wherein the voice extraction submodule 11 is used for acquiring voice in a call to be detected. The voice feature analysis submodule 12 is configured to receive voice data of the voice extraction submodule 11, and process the voice of the voice extraction submodule 11 through pre-emphasis, windowing and framing and endpoint detection, so as to obtain the voice rhythm, voice quality and frequency spectrum features of the voice extraction submodule 11.

In this embodiment, the speech feature analysis sub-module 12 obtains the short-time energy, the average amplitude, the short-time average zero crossing rate, the pitch frequency, the formants, the mel-frequency spectrum features, the linear prediction cepstrum coefficients and the mel-frequency cepstrum coefficients in the time domain or the frequency domain from the voice rhythm, the voice quality and the frequency spectrum features, and analyzes and calculates the statistics of the maximum value, the minimum value, the range, the frame where the maximum value is located, the frame where the minimum value is located, the mean value, the linear approximation slope, the linear approximation offset, the linear approximation secondary deviation, the standard deviation, the skewness, the kurtosis, the first order difference and the second order difference of the different features. Through extracting the relevant statistics of the call voice features in the voice feature analysis submodule 12, rich and comprehensive voice features are extracted, the limitation of single type or dimension features on the expression of emotion information is avoided, and the subsequent voice recognition work is facilitated.

The voice keyword detection module 2 is configured to receive voice data of the voice feature analysis submodule 12, and identify an emotion keyword and a theme keyword by establishing a keyword library, so as to obtain emotion data information and reacted problem data information in the voice extraction submodule 11.

In this embodiment, the voice keyword detection module 2 includes an emotion weight database 21 and a keyword extraction sub-module 22.

The emotion weight database 21 is used for establishing and storing an emotion weight database of the keyword.

The keyword extraction submodule 22 is configured to match data of the speech feature analysis submodule 12 with data of the emotion weight database 21 through an acoustic model, a language model, a pronunciation dictionary and a decoder to identify emotion keywords and theme keywords, and is capable of counting the frequency of the emotion keywords and the theme keywords appearing in speech, and the keyword extraction submodule 22 is further capable of giving a weight to an emotional tendency of the emotion keywords according to the data of the emotion weight database 21, and scoring each emotion of the speech by analyzing the weight in combination with the frequency of the emotion keywords.

The keyword extraction sub-module 22 can detect whether a specified keyword appears in the speech, and compared with a continuous speech recognition technology, the detection of the speech keyword does not need to recognize the whole speech stream, only needs to construct a keyword list of interest of the user, so that the flexibility is better, and meanwhile, the requirements on grammar, environmental noise and the like are lower, so that the method is more suitable for complex conversation scenes.

And when the acoustic model is trained, correctly labeling according to a given observation statement in the corpus and continuously iterating and optimizing parameters so that the correct label and the corresponding pronunciation thereof generate the maximum posterior probability.

For the language model for recognizing the acoustic model output result as a text, in order to cope with the detection of a keyword having a large vocabulary, a statistical language model is used, that is, the relationship between words is described from the viewpoint of probability based on statistics for the corpus.

The language model is used for recognizing the output result of the acoustic model as a text, and in order to cope with the detection of a keyword with a large word list, a statistical language model is used, namely the relation between words is described from the perspective of probability based on statistics of a corpus, and a large amount of word corpora are used as a training set during model training to improve the accuracy of the model.

The pronunciation dictionary is used for connecting the acoustic model and the language model, comprises mapping from words to phonemes, covers concerned recognized emotion keywords and topic keywords as much as possible when the pronunciation dictionary is constructed, and discards unnecessary words at the same time so as to improve retrieval efficiency and recognition performance.

The result of obtaining the keyword in the keyword extraction sub-module 22 also needs a decoder, and the decoder performs decoding through the viterbi algorithm, wherein the HMM state decoding is performed first to obtain the optimal state sequence, and then the keyword is detected and decoded to obtain the final recognition result.

In this embodiment, the keyword extraction sub-module 22 has a function of continuous training learning, and can perform supplementary training learning on new keywords, so that the system has good expansibility.

The emotion model set module 3 is used for storing a plurality of different emotion classification model sets and receiving data information of the voice processing module 1 so as to dynamically capture and track the emotion state of the caller. In this embodiment, the emotion model set module 3 scores speech emotion tendencies of emotion characteristics of the speech processing module 1 in each model by training a hidden markov model, a gaussian mixture model, a support vector machine model, an artificial neural network, a convolutional neural network model and a long-short term memory network model and combining the models.

Because each model has own advantages and limitations, the respective advantages of the models are played by training a plurality of different models, and the overall accuracy and robustness of the system are improved. And the speech emotion statistical type characteristics have better adaptability to the irrelevant speaker, so that the robustness of the system can be improved by training a Gaussian mixture model or a support vector machine model and classifying by using the characteristics.

The voice characteristic analysis submodule 12 extracts various characteristics of call voice, so that calculation and selection can be performed according to model requirements of the emotion model set module 3, a spectrogram containing rich emotion information is subjected to characteristic extraction in a training convolutional neural network model, the result is input to the long-term and short-term memory network model, and the emotion state of a caller can be dynamically captured and tracked.

The emotion classification module 4 is configured to determine an emotion category of the speech in the call to be detected by using a classification model according to the data information of the speech keyword detection module 2 and the emotion model set module 3, where the classification model may be any one of a decision tree, an SVM, a neural network, or the like in this embodiment.

In this embodiment, the emotion classification module 4 determines the emotion category of the speech obtained by the speech processing module 1 by a voting method, a scoring method and a combination method according to different model data provided by the speech keyword detection module 2 and the emotion model set module 3;

the voting method comprises the steps of obtaining emotion classification results of each model in the keyword extraction submodule 22 and the emotion model set module 3, counting the number of the models of which the current voice is judged to be certain emotion, and taking the emotion category with the most votes as an identification result;

the scoring method comprises the steps of obtaining scoring values in the keyword extraction submodule 22 and the emotion model set module 3, forming new features of the scoring values, inputting the new features into a trained decision tree, SVM and neural network classification model for calculation, and outputting emotion recognition results.

The combination method combines the scoring values in the keyword extraction submodule 22 and the emotion model set module 3 and the voice feature data of the voice feature analysis submodule 12 into new voice features, and trains and classifies the new voice features through a decision tree, an SVM and a neural network classification model to obtain emotion recognition results.

The emotion classification module 4 comprehensively utilizes the advantages of each model, avoids the limitation of a single model, and outputs a result that a certain voice belongs to a specific emotion category which is generally divided into three categories of anger, normal and satisfaction or more.

The business application module 5 comprises a customer information display sub-module 51, an emotion display sub-module 52, a response prompt sub-module 53, an assessment data sub-module 54 and an enterprise problem analysis sub-module 55.

The customer information display submodule 51 is in communication connection with the product sales recording platform, and the customer information display submodule 51 is used for displaying the purchase information of the customer on the product sales recording platform according to the telephone number of the customer. The customer telephone is associated with the product sold through the customer information display sub-module 51, so that customer service staff can know the product purchased by the customer, grasp the information of the customer and facilitate the communication.

The emotion display sub-module 52 is configured to receive the data information of the keyword extraction sub-module 22 and the emotion classification module 3, and display an emotion tendency and an emotion keyword of the client in the current call in real time. In the emotion display sub-module 52, the customer service staff can directly observe the emotion information of the customer and timely master the real-time emotion of the customer, so that the communication between the customer service staff and the customer is facilitated, and the communication effectiveness is improved.

The response prompting sub-module 53 includes a response scheme database 531 and a response prompting box 532, wherein the response scheme database 531 is used for storing product-related information, data information for dealing with different emotion type processing schemes, response terms and question processing procedures. The response prompt box 532 is used for automatically generating and displaying a response scheme through a machine learning algorithm in combination with the keyword extraction submodule 22, the emotion classification module 3 and the response scheme database 531 data. The response scheme of the response scheme database 531 can be established through a machine learning model and a deep learning model, so that the database can be updated by training with sample data according to the update of a product.

The assessment data submodule 54 is used for assessing the quality of the customer service according to the data of the emotion classification module 4. In this embodiment, the assessment data sub-module 54 is configured to analyze and calculate the situations of all call emotions of each customer service person in a certain period by obtaining the data of the emotion classification module 4, and the assessment data sub-module 54 can also automatically generate a statistical table and a statistical graph according to the situations of the call emotions. The assessment data submodule 54 can assess the customer service staff according to the proportion of each emotion type, so that the error of the traditional assessment is avoided, and the quality of the customer service staff can be effectively improved. Under the action of the statistical form and the statistical graph, the management personnel can conveniently and visually know the service quality of the customer service personnel, so that a proper management system is formulated.

The enterprise question analysis submodule 55 is used for analyzing the product condition according to the data of the voice keyword detection module 2. In this embodiment, the enterprise question analysis sub-module 55 is configured to obtain the topic keywords in the voice keyword extraction sub-module 22, and collect and count the key questions reflected by the customers by analyzing and calculating the topic keywords in the keyword extraction sub-module 22. The enterprise problem analysis submodule 55 can extract the submodule 22 according to the keyword, and count the frequency of the problems occurring in each product, so that the management personnel can know the operation condition of the enterprise products, and can find the problems in time and optimize the products in a targeted manner.

And the database module 6 is used for storing and sending data of the voice keyword detection module 2, the emotion model set module 3, the emotion classification module 4 and the business application module 5.

In this embodiment, the database module 6 establishes a connection channel between the data storage end and the Web end by using WebSocket, and provides an instant data service for data transmission among the voice keyword detection module 2, the emotion classification module 4, and the business application module 5. The response scheme database 531 may be registered in the database module 6, and may store various data in a distributed storage manner to provide a quick query response, where the stored data topics may include call emotion recognition, client call records, keyword topics, standard response schemes, and the like.

In this embodiment, the business module 5 obtains corresponding data information through a search calculation engine, and the search calculation engine provides data access support by the multi-source heterogeneous data access component and the metadata management and access module. The search calculation engine can perform query, classification, aggregation, description and visualization operations on the data and support business decision, so that the data required by the customer information display sub-module 51, the emotion display sub-module 52 and the response prompt sub-module 53 can be more efficiently called from the database module 6.

The business module can be established at the Web end, and provides display requirements of different data by accessing the result of the search calculation engine at the Web end, so that the assessment data submodule 54 and the enterprise problem analysis submodule 55 call the data content of the database module 6 to realize graphical data display modes of tables, curve graphs, distribution graphs, pie charts and the like of the data, and the data is more visual and is more beneficial to decision making.

When the speech processing module 1 receives the speech of the client, the speech feature analysis submodule 12 processes the speech content obtained by the speech extraction submodule 11, and obtains the voice rhythm, the tone quality and the frequency spectrum feature of the speech at the end.

The keyword extraction submodule 22 detects whether the speech contains the keywords specified in the emotion weight database 21, the keyword extraction submodule 22 identifies the acquired keyword information as the emotion keywords and the theme keywords, and simultaneously, the keywords are the emotion keywords according to the emotion tendencies, the weights and the emotion keywords are combined and analyzed in frequency, and each emotion of the speech is scored.

The emotion model set module 3 is used for storing a plurality of different emotion classification model sets, receiving data information of the voice processing module 1, dynamically capturing and tracking emotion states of a caller, and combining the models to score voice emotion tendencies of emotion characteristics of the voice processing module 1 in each model.

The emotion classification module 4 judges the emotion category of the speech obtained by the speech processing module 1 by a voting method, a scoring method and a combination method according to different model data provided by the keyword extraction submodule 22 and the emotion model set module 3.

The data contents of the voice processing module 1, the voice keyword detection submodule 2, the emotion model set module 3 and the emotion classification module 4 are stored in the database module 6, the business module 5 can input information such as a client number, client emotion recognized in real time, call keywords and the like into a search calculation engine in the call process, the search calculation engine automatically matches an optimal response scheme from the data storage system, relevant information is pushed to a Web terminal to be displayed to customer service staff, the customer service staff can obtain relevant business information from the client information display submodule 51, the emotion display submodule 52 and the response prompt submodule 53, and the customer service staff can effectively communicate with clients.

When the manager applies performance assessment, problem analysis and the like in the assessment data submodule 54 and the enterprise problem analysis submodule 55, the search calculation engine inquires relevant data from the data storage system according to inquiry requirements, and pushes results to the Web end for display after statistical analysis and calculation, so that customer service personnel and product quality are managed.

Claims

1. A voice emotion recognition and application system for call center calls is characterized by comprising a voice processing module (1), a voice keyword detection submodule (2), an emotion model set module (3), an emotion classification module (4), a business application module (5) and a database module (6);

the voice processing module (1) comprises a voice extraction submodule (11) and a voice feature analysis submodule (12), wherein the voice extraction submodule (11) is used for acquiring voice in a call to be detected; the voice feature analysis submodule (12) is used for receiving voice data of the voice extraction submodule (11), and processing voice of the voice extraction submodule (11) in a pre-emphasis, windowing and framing and end point detection mode to obtain the voice rhythm, voice quality and frequency spectrum features of the voice extraction submodule (11);

the voice keyword detection module (2) is used for receiving voice data of the voice feature analysis submodule (12) and identifying emotion keywords and theme keywords by establishing a keyword library so as to obtain emotion data information and reacted problem data information of a client in the voice extraction submodule (11);

the voice characteristic analysis submodule (12) acquires the short-time energy, the average amplitude, the short-time average zero-crossing rate, the pitch frequency, the formants, the Mel frequency spectrum characteristics, the linear prediction cepstrum coefficients, the Mel cepstrum coefficients and different characteristics of a spectrogram in a time domain or a frequency domain from the voice rhythm, the tone quality and the frequency spectrum characteristics, and analyzes and calculates the maximum value, the minimum value, the range, the frame where the maximum value is located, the frame where the minimum value is located, the mean value, the linear approximation slope, the linear approximation offset, the linear approximation secondary deviation, the standard deviation, the skewness, the kurtosis, the first-order difference and the second-order difference statistics of the different characteristics;

the voice keyword detection module (2) comprises an emotion weight database (21) and a keyword extraction submodule (22), wherein the emotion weight database (21) is used for establishing and storing an emotion weight database of keywords; the keyword extraction submodule (22) is used for matching the data of the voice characteristic analysis submodule (12) with the data of the emotion weight database (21) through an acoustic model, a language model, a pronunciation dictionary and a decoder to identify emotion keywords and theme keywords, and can count the frequency of the emotion keywords and the theme keywords appearing in voice, the keyword extraction submodule (22) can also endow a weight value for the emotional tendency of the emotion keywords according to the data of the emotion weight database (21), and scores each emotion of the voice by combining the weight value with the frequency of the emotion keywords and analyzing the weight value; the emotion model set module (3) is used for scoring the speech emotion tendencies of the emotion characteristics of the speech processing module (1) in each model by training a hidden Markov model, a Gaussian mixture model, a support vector machine model, an artificial neural network, a convolutional neural network model and a long-short term memory network model and combining the models;

the emotion classification module (4) judges the emotion category of the voice obtained by the voice feature analysis submodule (12) by adopting a voting method, a scoring method and a combination method according to different model data provided by the voice keyword detection module (2) and the emotion model set module (3);

the voting method comprises the steps of obtaining emotion classification results of each model in the keyword extraction submodule (22) and the emotion model set module (3), counting the number of models of which the current voice is judged to be a certain emotion, and taking the emotion category with the most votes as an identification result;

the scoring method comprises the steps of obtaining scoring values in the keyword extraction submodule (22) and the emotion model set module (3), forming new features by the scoring values, inputting the new features into a trained decision tree, SVM and neural network classification model for calculation, and outputting emotion recognition results;

the combination method combines the scoring values in the keyword extraction submodule (22) and the emotion model set module (3) and the voice feature data of the voice feature analysis submodule (12) into new voice features, and trains and classifies the new voice features through a decision tree, an SVM and a neural network classification model to obtain emotion recognition results;

the emotion model set module (3) is used for storing a plurality of different emotion classification model sets and receiving data information of the voice feature analysis submodule (12) so as to dynamically capture and track the emotion state of a caller;

the emotion classification module (4) is used for acquiring data information of the voice keyword detection module (2) and the emotion model set module (3), and judging emotion types of voices in the call to be detected by adopting a classification model;

the business application module (5) comprises a customer information display sub-module (51), an emotion display sub-module (52), a response prompt sub-module (53), an assessment data sub-module (54) and an enterprise problem analysis sub-module (55), wherein the customer information display sub-module (51) is in communication connection with the product sales recording platform, and the customer information display sub-module (51) is used for displaying the purchase information of a customer on the product sales recording platform according to the telephone number of the customer; the emotion display sub-module (52) is used for receiving the data information of the voice keyword detection sub-module (2) and the emotion classification module (3) and displaying the emotion tendency information of the client in the current call in real time; the response prompting submodule (53) comprises a response scheme database (531) and a response prompting frame (532), wherein the response scheme database (531) is used for storing product related information, data information for dealing with different emotion type processing schemes, response terms and problem processing flows; the response prompt box (532) is used for automatically generating and displaying a response prompt scheme by combining the data of the voice keyword detection submodule (2), the emotion classification module (3) and the response scheme database (531) through a machine learning algorithm; the assessment data submodule (54) is used for assessing the quality of the customer service according to the data of the emotion classification module (4); the enterprise problem analysis submodule (55) is used for analyzing the product condition according to the data of the voice keyword detection module (2);

the enterprise problem analysis submodule (55) is used for acquiring the theme key words in the voice key word detection module (2), and analyzing and calculating the theme key words in the voice key word detection module (2) so as to collect and count the key problems reflected by the customers;

and the database module (6) is used for storing and sending the data of the voice keyword detection module (2), the emotion model set module (3), the emotion classification module (4) and the business application module (5).

2. The system of claim 1, wherein the system comprises: the database module (6) establishes a connection channel between a data storage end and a Web end by using WebSocket, and provides instant data service for data transmission among the voice keyword detection module (2), the emotion classification module (4) and the business application module (5).

3. The system of claim 1, wherein the system comprises: the assessment data submodule (54) is used for analyzing and calculating the situations of all call emotions of each customer service person in a certain period by acquiring the data of the emotion classification module (4), and the assessment data submodule (54) can also automatically generate a statistical table and a statistical graph according to the situations of the call emotions.

4. The system of claim 1, wherein the system comprises: the business module (5) obtains corresponding data information through a search calculation engine, and the search calculation engine provides data access support through a multi-source heterogeneous data access assembly and a metadata management and access module.