CN112466324A

CN112466324A - Emotion analysis method, system, equipment and readable storage medium

Info

Publication number: CN112466324A
Application number: CN202011270079.1A
Authority: CN
Inventors: 程荣; 赵友林
Original assignee: Shanghai Information Technology Co ltd
Current assignee: Shanghai Information Technology Co ltd
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-03-09

Abstract

The embodiment of the application discloses a sentiment analysis method, a system, equipment and a readable storage medium, wherein the method comprises the following steps: extracting the audio features of the labeled sound samples; then converting the audio features into mathematical data; binding a data matrix of the mathematical data with the emotion category label to form a data set, and converting the set into a mathematical matrix; and splitting the data of the mathematical matrix into a training set and a testing set at random according to a proportion. Training the training set data by using tensoflow to obtain an emotion analysis model; and testing and verifying a test set sample based on the emotion analysis model to obtain an analysis result of the test sample, and comparing whether the analysis result is consistent with the emotion category label of the test sample until the accuracy of the sample on the test set meets a set threshold. The emotion contained in the voice can be accurately and quickly analyzed by using the model.

Description

Emotion analysis method, system, equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a method, a system, equipment and a readable storage medium for emotion analysis.

Background

The technology in the existing market mainly analyzes the information contained in the voice through the technology of converting the voice into characters, and the emotional feeling represented by the voice is difficult to identify; or, even if some keywords can identify part of emotional emotion, other words are replaced, and the mode is invalid; or the existing technology for recognizing emotion through voice features is not comprehensive in essential feature extraction, so that the analysis is not accurate.

Disclosure of Invention

Therefore, the embodiment of the application provides a method, a system, a device and a readable storage medium for emotion analysis, which do not convert voice into text information, but extract the essential characteristics of voice, train a sample by using an AI technology, thereby forming an emotion analysis model, and analyze emotion contained in voice by using the emotion analysis model.

In order to achieve the above object, the embodiments of the present application provide the following technical solutions:

according to a first aspect of embodiments of the present application, there is provided a method of emotion analysis, the method including:

acquiring a sound sample audio, carrying out standardization processing on the sound sample audio, and marking a predefined emotion category label on the sound sample audio after the standardization processing;

extracting audio features of the labeled sound sample audio, wherein the audio features comprise voiceprint acoustic feature information such as an MFCC feature matrix and the zero crossing rate of a sample audio fragment; converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample;

binding a data matrix of the mathematical data and the emotion category label to form a set, and converting the label set into a mathematical matrix; and the data of the mathematical matrix is divided into a training set and a test set at random according to a proportion.

Training the data in the training set by using tenserflow to obtain an emotion analysis model; and testing and verifying a test sample by a program script automatic processing and artificial auxiliary processing mode based on the emotion analysis model to obtain an analysis result of the test sample, and comparing whether the analysis result is consistent with the emotion category label of the test sample. If the emotion category labels are not consistent with the emotion category labels, analyzing whether the test sample data and the corresponding emotion category labels are accurate, and if the test sample data are not accurate, discarding the test sample; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the adjusted training set, adding a new test sample into the test set for next test, and repeating the process until the accuracy of the sample on the test set meets a set threshold.

Optionally, the emotion category labels include happiness, sadness, anger, fear, disgust, surprise, calm, and apprehension, among others. The emotion category labels will be appropriately adjusted according to business needs.

Optionally, the emotion analysis model is trained according to the following steps:

setting training parameters, wherein the training parameters comprise training times, a hierarchical structure, an activation function, a cost function, an optimization function and the like;

displaying the intermediate data in a graphical form, and observing the intermediate parameters of loss and acc when the training is finished; if the loss and acc intermediate parameters do not meet the expectations, the parameters are modified and retrained so that the loss and acc intermediate parameters meet the expectations.

Optionally, the method further comprises:

and analyzing the test audio sample by using the emotion analysis model, checking whether the analysis result of the test audio sample is accurate or not by using a program script automatic processing and artificial auxiliary processing mode based on the emotion analysis model, and if the label is inaccurate for inaccurate audio data, marking a correct label again, and then putting the label into a training set to update the training set. If the data is inaccurate, it is discarded.

According to a second aspect of embodiments of the present application, there is provided an emotion analysis system, including:

the system comprises a sample initial processing module, a sample analysis module and a data processing module, wherein the sample initial processing module is used for acquiring a sound sample audio frequency, standardizing the sound sample audio frequency and marking a predefined emotion category label on the standardized sound sample audio frequency;

the characteristic conversion module is used for extracting the audio characteristics of the labeled sound sample audio, wherein the audio characteristics comprise voiceprint acoustic characteristic information such as an MFCC characteristic matrix and the zero crossing rate of a sample audio fragment; converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample;

the label binding module is used for binding a data matrix of the mathematical data and the emotion type labels to form a set and converting the label set into a mathematical matrix; randomly splitting the data of the mathematical matrix into a training set and a test set according to a proportion;

the model training module is used for training the training set data by using tenserflow to obtain an emotion analysis model;

the model testing module is used for testing and verifying a test sample based on the emotion analysis model to obtain an analysis result of the test sample, comparing whether the analysis result is consistent with an emotion category label of the test sample, judging whether the test sample data and the corresponding emotion category label are accurate if the analysis result is inconsistent with the emotion category label of the test sample, and discarding the test sample if the test sample data is inaccurate; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the training set, and supplementing a new test sample into the test set for the next test. And repeating the process until the accuracy of the sample on the test set meets the set threshold.

Optionally, the model training module is specifically configured to:

Optionally, the system further comprises:

the model testing module is used for analyzing a test audio sample by using the emotion analysis model, testing and verifying the test sample by a program script automatic processing and artificial auxiliary processing mode based on the emotion analysis model to obtain an analysis result of the test sample, comparing the analysis result with an emotion class label of the test sample to determine whether the analysis result is consistent with the emotion class label of the test sample, if not, determining whether the test sample data and the corresponding emotion class label are accurate, and if the test sample data is inaccurate, discarding the test sample; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the adjusted training set, adding a new test sample into the test set for next test, and repeating the process until the accuracy of the sample on the test set meets a set threshold.

According to a third aspect of embodiments herein, there is provided an apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method of any of the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of the first aspects.

In summary, the embodiment of the present application provides an emotion analysis method, system, device and readable storage medium, where a sound sample audio is obtained, the sound sample audio is standardized, and a predefined emotion category label is marked on the standardized sound sample audio; extracting audio features of the labeled sample audio, wherein the audio features comprise voiceprint acoustic feature information such as an MFCC feature matrix and the zero crossing rate of a sample audio fragment; converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample; binding a data matrix of the mathematical data and the emotion category label to form a set, and converting the label set into a mathematical matrix; randomly splitting the data of the mathematical matrix into a training set and a test set according to a proportion; training the training set by using tenserflow to obtain an emotion analysis model; and testing and verifying a test sample by a program script automatic processing and artificial auxiliary processing mode based on the emotion analysis model to obtain an analysis result of the test sample, and comparing whether the analysis result is consistent with the emotion category label of the test sample. If the emotion type labels are inconsistent, judging whether the test sample data and the corresponding emotion type labels are accurate, and if the test sample data are not accurate, discarding the test sample; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the adjusted training set, and adding a new test sample into the test set for next test. And repeating the process until the accuracy of the sample on the test set meets the set threshold. The emotion contained in the voice can be accurately and quickly analyzed by using the model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic flow chart of an emotion analysis method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an embodiment of emotion analysis provided in an embodiment of the present application;

fig. 3 is a block diagram of an emotion analysis system provided in an embodiment of the present application.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a schematic flow chart of an emotion analysis method provided in an embodiment of the present application, which does not convert voice into text information, but extracts essential characteristics of voice and trains a sample by using an AI technique, so as to form an emotion analysis model, and analyzes emotion contained in voice by using the emotion analysis model. And performing targeted processing on the service by utilizing the identified emotional feeling.

As shown in fig. 1, the method comprises the steps of:

step 101: acquiring a sound sample audio, carrying out standardization processing on the sound sample audio, and marking a predefined emotion category label on the standardized sound sample audio.

Step 102: extracting audio features of the labeled sound sample audio, wherein the audio features comprise voiceprint acoustic feature information such as an MFCC feature matrix and the zero crossing rate of a sample audio fragment; and converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample.

Step 103: binding a data matrix of the mathematical data and the emotion category label to form a set, and converting the label set into a mathematical matrix; and the data of the mathematical matrix is divided into a training set and a test set at random according to a proportion.

Step 104: and (5) training the training set data by using tensoflow to obtain an emotion analysis model.

Step 105: testing and verifying a test sample based on the emotion analysis model to obtain an analysis result of the test sample, comparing the analysis result with an emotion category label of the test sample to determine whether the test sample and the corresponding emotion category label are accurate if the analysis result is not consistent with the emotion category label of the test sample, and discarding the test sample if the test sample is not accurate; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the training set, and supplementing a new test sample into the test set for the next test. And repeating the process until the accuracy of the sample on the test set meets the set threshold.

In one possible embodiment, the emotion classification labels include, but are not limited to, happiness, sadness, anger, fear, disgust, surprise, calm, and sadness, among others. The emotion category labels will be appropriately adjusted according to business needs.

In one possible embodiment, the emotion analysis model is trained according to the following steps: setting training parameters, wherein the training parameters comprise training times, a hierarchical structure, an activation function, a cost function, an optimization function and the like; displaying the intermediate data in a graphical form, and observing the intermediate parameters of loss and acc when the training is finished; if the loss and acc intermediate parameters do not meet the expectations, the parameters are modified and retrained so that the loss and acc intermediate parameters meet the expectations.

In one possible embodiment, the method further comprises:

and analyzing the test audio sample by using the emotion analysis model, and checking whether the analysis result of the test audio sample is correct or not by using a program script automatic processing and manual auxiliary processing mode, thereby analyzing the reason of incorrect audio data. The audio data that is typical for comparison is normalized and labeled and then put into the training set to update the training set.

The method provided by the embodiment of the application is not influenced by dialects (Mandarin, Minnan dialects and the like) of languages (Chinese, English and the like), does not need to extract the character information contained in the voice, and can accurately identify the emotion contained in the voice.

The technology mainly utilizes the current AI technology (tensorflow) to train an emotional emotion model, and then utilizes the model to analyze the emotional emotion of the voice. The main steps of the method provided by the embodiment of the present application are further described below with reference to fig. 2:

step 1: the sound samples are prepared and normalized (e.g., 2.5s per sample duration, sample rate 16000, mono; this standard can be adjusted according to traffic needs).

Step 2: and (5) labeling the sample. Firstly, dividing the emotion contained in the voice into multiple emotions such as happiness, sadness, anger, fear, disgust, surprise, calmness, sadness and the like according to the requirements (the emotion classification can be properly adjusted according to the business requirements); and then, marking a corresponding emotion label on the audio sample in a manual processing mode. In the process of marking, the required marking must be accurate, and the sample must have strong representativeness and distinctiveness, and the sample which can not meet the requirement is directly discarded. The labeling process always follows the principle of no abuse.

And step 3: and extracting essential features of the sample audio, wherein the features include but are not limited to voice print acoustic feature data such as MFCC feature matrixes of the sounds, zero crossing rates of the sound segments and the like. And converts these features into mathematical data including voiceprint acoustic feature data such as MFCC feature matrices, zero-crossing rate information, etc. of the audio of the sound sample. Then binding a data matrix of the mathematical data with the emotion category labels to form a set, and converting the label set into a mathematical matrix; and the data of the mathematical matrix is divided into a training set and a test set at random according to a proportion.

And 4, step 4: and training the sample data by using tensorflow to obtain an analysis model. The training process includes, but is not limited to, the following: setting training times, selecting a hierarchical structure, selecting an activation function, selecting a cost function, selecting an optimization function, and the like. During the training process, various intermediate data need to be displayed in a graphical form, and when the training is finished, various intermediate parameters such as loss and acc are observed. If the parameters like loss and acc do not meet expectations, the various parameters are modified and retrained so that the loss and acc parameters meet expectations. In an ideal situation, loss generally approaches 0 and acc generally approaches 1.

And 5: and testing and verifying the samples on the test set by using an analysis model. And testing the test sample by using the trained model through a program script automatic processing and artificial assistance mode to obtain an analysis result of the test sample. The results are then compared to the test sample's own label and if the results are consistent with the original labeled emotion, the sample is deemed to be as expected. If the sample is not in accordance with the expectation, the data of the sample is checked to be accurate, and then the label of the sample is checked to be accurate. If the data of the sample is inaccurate, discarding the sample; if the label is not accurate, the label is printed again. And (4) putting the re-labeled sample into the training set again, then adjusting the tenserflow training parameter, repeating the step 4, and re-training until the expected target is reached. (generally, the accuracy of the test sample set is required to be more than 95%).

Step 6: and (6) performing test operation. An audio file of the production environment is analyzed using the model. And checking whether the analysis result of the audio file of the production environment is accurate or not by using a mode of automatic processing and manual auxiliary processing of the program script. At the initial stage of online, hundred percent of inspection is needed, and indexes such as accuracy, error rate and the like are calculated. And (3) performing emphasis processing on the audio data which is found to be inaccurate in analysis in the examination, including but not limited to step 1 and step 2, standardizing and labeling the audio, then putting the audio into a training set, and updating the training set.

And 7: and repeating the steps 1 to 6 by using the updated training set, and performing iteration. Until the training set contains more and more representative samples, the analysis is more and more accurate (the accuracy rate of manual spot check is more than 95%), and the business requirements are met.

In general, sample data is collected first, and then the sample data is processed uniformly into standard samples (e.g., duration 2.5s, sampling rate 16000, 16 bits per channel). The standardized audio sample is then subjected to a labeling operation. And then extracting voice print acoustic characteristic data such as MFCC (Mel frequency cepstrum coefficient), zero crossing rate and the like from the labeled audio sample data, and associating the voice print acoustic characteristic data with the labeled data to form a data set. And for the data set, randomly dividing the data set into a training set and a testing set according to a set proportion. Training was performed with tensorflow and intermediate result data was recorded. And when the training batch is finished, drawing and displaying the intermediate result data. Stopping training when the intermediate result acc and the loss parameter accord with the expectation; otherwise, adjusting the training parameters of the tenserflow and continuing the training.

The new audio file is analyzed using the trained model. For some typical audio files whose analysis is inaccurate, the cause is analyzed. If the audio file is very typical, the audio file of the type is subjected to standardization operation, and is put into a training set to be retrained so as to obtain a better model after iteration. And repeating the steps, and repeatedly iterating the model to obtain the most accurate model.

Therefore, the emotion analysis method provided by the embodiment of the application does not relate to the conversion of audio to text information, and therefore the cross-language and cross-dialect can be achieved. The extracted sound basic features not only contain MFCC, but also include voiceprint acoustic feature data such as zero-crossing rate information, and therefore the problems that feature extraction is not comprehensive and the like caused by single features are solved. The most popular tensiorflow at present is adopted for training, and the training model can be repeatedly corrected, so that the optimal effect is achieved.

In summary, the embodiment of the present application provides an emotion analysis method, which includes acquiring a sound sample audio, standardizing the sound sample audio, and marking a predefined emotion category label on the standardized sound sample audio; extracting audio features of the labeled sound sample audio, wherein the audio features comprise voiceprint acoustic feature information such as an MFCC feature matrix and the zero crossing rate of a sample audio fragment; converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample; binding a data matrix of the mathematical data and the emotion category label to form a set, and converting the label set into a mathematical matrix; and the data of the mathematical matrix is divided into a training set and a test set at random according to a proportion. Training the samples of the training set by using tenserflow to obtain an emotion analysis model; performing test verification on samples on a test set based on the emotion analysis model to obtain an analysis result of the test sample, comparing whether the analysis result is consistent with an emotion category label of the test sample, if not, judging whether the test sample data and the corresponding emotion category label are accurate, and if the test sample data is inaccurate, discarding the test sample; if the emotion category label corresponding to the sample is inaccurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the changed training set, and supplementing a new test sample into the test set for next verification until the sample accuracy on the test set meets a set threshold. The emotion contained in the voice can be accurately and quickly analyzed by using the model.

Based on the same technical concept, an embodiment of the present application further provides an emotion analysis system, as shown in fig. 3, the system includes:

the sample initial processing module 301 is configured to obtain a sound sample audio, normalize the sound sample audio, and mark a predefined emotion category label on the normalized sound sample audio.

A feature transformation module 302, configured to extract an audio feature of the labeled sound sample audio, where the audio feature includes voiceprint acoustic feature information such as an MFCC feature matrix and a zero-crossing rate of a sample audio fragment; and converting the audio features into mathematical data, wherein the mathematical data comprises voiceprint acoustic feature data such as MFCC feature matrixes, zero-crossing rate information and the like of the audio of the sound sample.

And the label binding module 303 is configured to bind the data matrix of the mathematical data with the emotion category label to form a label set, convert the label set into a mathematical matrix, and randomly split the data of the mathematical matrix into a training set and a test set according to a ratio.

And the model training module 304 is used for training the training set data by using tenserflow to obtain an emotion analysis model.

And the model testing module 305 is configured to perform testing verification on the samples in the test set based on the emotion analysis model to obtain an analysis result of the test sample, and compare whether the analysis result is consistent with the emotion category label of the test sample. And if not, judging whether the test sample data and the corresponding emotion category label are accurate. If the test sample data is not accurate, discarding the test sample; if the corresponding emotion category label is not accurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, training the changed training set, and supplementing a new test sample into the test set for next verification. And repeating the process until the accuracy of the sample on the test set meets the set threshold.

In one possible embodiment, the emotion classification labels include happiness, sadness, anger, fear, disgust, surprise, calmness, and sadness, among others. The emotion category labels will be appropriately adjusted according to business needs.

In a possible implementation, the model training module is specifically configured to:

setting training parameters, wherein the training parameters comprise training times, a hierarchical structure, an activation function, a cost function, an optimization function and the like; displaying the intermediate data in a graphical form, and observing the intermediate parameters of loss and acc when the training is finished; if the loss and acc intermediate parameters do not meet the expectations, the parameters are modified and retrained so that the loss and acc intermediate parameters meet the expectations.

In one possible embodiment, the system further comprises:

and the test verification module is used for analyzing the test audio sample by using the emotion analysis model, checking whether the analysis result of the test audio sample is accurate by using a program script automatic processing and artificial auxiliary processing mode, and analyzing the reason of inaccurate audio data. If it is a problem with the audio data itself, the audio is discarded; if the label is not accurate, the audio file is subjected to standardized labeling processing and then placed into a training set to update the training set.

Based on the same technical concept, an embodiment of the present application further provides an apparatus, including: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor is configured to execute one or more program instructions to perform the method.

Based on the same technical concept, the embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Reference is made to the description of the method embodiments.

It is noted that while the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not a requirement or suggestion that the operations must be performed in this particular order or that all of the illustrated operations must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as in embodiments or flowcharts, additional or fewer steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.

The units, devices, modules, etc. set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of a plurality of sub-modules or sub-units, and the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The above-mentioned embodiments are further described in detail for the purpose of illustrating the invention, and it should be understood that the above-mentioned embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of sentiment analysis, the method comprising:

acquiring a sound sample audio, carrying out standardization processing on the sound sample audio, and marking a predefined emotion category label on the standardized sample audio;

extracting audio features of the tagged sound sample audio, the audio features including a MFCC feature matrix and a zero-crossing rate of a sample audio piece; then converting the audio features into mathematical data, wherein the mathematical data comprises an MFCC feature matrix and a zero-crossing rate of the audio of the sound sample;

binding the data matrix of the mathematical data and the emotion category label to form a set, converting the label set into a mathematical matrix, and randomly splitting the data of the mathematical matrix into a training set and a test set according to a proportion;

training the data in the sample training set by using tensoflow to obtain an emotion analysis model;

based on the emotion analysis model, carrying out test verification on samples on a test set in a mode of automatic program script processing and artificial auxiliary processing to obtain analysis results of the test samples, comparing whether the analysis results are consistent with emotion category labels of the test samples, if not, judging whether the test sample data and the corresponding emotion category labels are accurate, and if the test sample data are not accurate, discarding the test samples; if the emotion category label corresponding to the test sample is inaccurate, re-labeling; and putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, re-training the adjusted training set, adding a new test sample into the test set for next test, and repeating the process until the accuracy of the sample on the test set meets a set threshold.

2. The method of claim 1, wherein the emotion classification labels include happy, sad, angry, horror, disgust, surprise, calm, and sad; and the emotion category label is adjusted according to business requirements.

3. The method of claim 1, wherein the emotion analysis model is trained by:

setting training parameters, wherein the training parameters comprise training times, a hierarchical structure, an activation function, a cost function and an optimization function;

displaying the intermediate data in a graphical form, and judging the intermediate parameters of loss and acc when the training is finished; and if the loss and acc intermediate parameters do not accord with the expectation, modifying the training parameters, the hierarchical structure and the optimization function for retraining so as to enable the loss and acc intermediate parameters to accord with the expectation.

4. The method of claim 1, wherein the method further comprises:

analyzing the test audio sample by using the emotion analysis model, checking whether the analysis result of the test audio sample is correct or not by using a program script automatic processing and manual auxiliary processing mode, and if the incorrect audio data is wrong in label, performing standardization and labeling processing, and putting the incorrect audio data into a training set to update the training set; if the data of the sample is wrong, the sample is discarded.

5. An emotion analysis system, characterized in that the system comprises:

the characteristic conversion module is used for extracting the audio characteristics of the labeled sound sample audio, wherein the audio characteristics comprise an MFCC characteristic matrix and the zero-crossing rate of a sample audio fragment; then converting the audio features into mathematical data, wherein the mathematical data comprises an MFCC feature matrix and a zero-crossing rate of the audio of the sound sample;

the label binding module is used for binding the data matrix of the mathematical data and the emotion type labels to form a set, converting the label set into a mathematical matrix, and randomly splitting the data of the mathematical matrix into a training set and a test set according to a proportion;

the model testing module is used for testing and verifying a test sample based on the emotion analysis model to obtain an analysis result of the test sample, comparing whether the analysis result is consistent with an emotion category label of the test sample, judging whether the test sample data and the corresponding emotion category label are accurate if the analysis result is inconsistent with the emotion category label of the test sample, and discarding the test sample if the test sample data is inaccurate; if the corresponding emotion category label is not accurate, re-labeling; putting the re-labeled test sample into a training set, adjusting tensierflow training parameters, training the test sample, and supplementing a new test sample for the next test; and repeating until the accuracy of the test sample meets the set threshold.

6. The system of claim 5, wherein the emotion classification labels include happy, sad, angry, horror, disgust, surprise, calm, and sad; and the emotion category label is adjusted according to business requirements.

7. The system of claim 5, wherein the model training module is specifically configured to:

displaying the intermediate data in a graphical form, and judging the intermediate parameters of loss and acc when the training is finished; if the loss and acc intermediate parameters do not meet the expectations, the parameters are modified and retrained so that the loss and acc intermediate parameters meet the expectations.

8. The system of claim 5, wherein the system further comprises:

the model testing module is used for analyzing a testing audio sample by utilizing the emotion analysis model, checking whether the analysis result of the testing audio sample is correct or not by using a program script automatic processing and manual auxiliary processing mode, and if the incorrect audio data is wrong in label, performing standardization and labeling processing and putting the incorrect audio data into a training set to update the training set; if the data of the sample is wrong, the sample is discarded.

9. An apparatus, characterized in that the apparatus comprises: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is to store one or more program instructions; the processor, configured to execute one or more program instructions to perform the method of any of claims 1-4.

10. A computer-readable storage medium having one or more program instructions embodied therein for performing the method of any of claims 1-4.