WO2021027029A1 - 数据处理方法、装置、计算机设备和存储介质 - Google Patents

数据处理方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021027029A1
WO2021027029A1 PCT/CN2019/107727 CN2019107727W WO2021027029A1 WO 2021027029 A1 WO2021027029 A1 WO 2021027029A1 CN 2019107727 W CN2019107727 W CN 2019107727W WO 2021027029 A1 WO2021027029 A1 WO 2021027029A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
interviewer
emotion
voice
speech
Prior art date
Application number
PCT/CN2019/107727
Other languages
English (en)
French (fr)
Inventor
黄海杰
Original Assignee
深圳壹账通智能科技有限公司
壹帐通金融科技有限公司(新加坡)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司, 壹帐通金融科技有限公司(新加坡) filed Critical 深圳壹账通智能科技有限公司
Priority to SG11202004543PA priority Critical patent/SG11202004543PA/en
Publication of WO2021027029A1 publication Critical patent/WO2021027029A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • This application relates to a data processing method, device, computer equipment and storage medium.
  • the traditional intelligent interview system mostly recognizes facial micro-expression to find the abnormal expression of the interviewee, and serves as one of the basis for risk assessment.
  • Micro expression is a psychological term. People express their inner feelings to each other by making some facial expressions. Between the different expressions that people make, or in a certain expression, the face will "leak” other information. The shortest "micro expression” can last for 1/25 second. Although a subconscious expression may only last for a moment, it sometimes expresses the opposite emotion.
  • a data processing method, device, computer equipment, and storage medium are provided.
  • a data processing method includes:
  • the grammar analysis network is obtained by training the second sample text data;
  • the interview result of the interviewer is determined.
  • a data processing device includes:
  • Acquisition module used to acquire interviewer's audio data and interviewer's video data
  • the first extraction module is used to extract the interviewer's micro-speech feature based on the interviewer's audio data, and obtain the first voice emotion data according to the micro-speech feature;
  • the first processing module is used to convert the interviewer’s audio data into text data, split the text data into multiple sentences, and perform word segmentation on multiple sentences, and find matching preset and trained emotions according to each word in each sentence
  • the dictionary corresponding to the classification network determines the confidence that the text data belongs to each preset emotion category according to the search and matching results, and obtains the second speech emotion data, and the emotion classification network is trained by the first sample text data;
  • the second processing module is used to input text data into the trained grammatical analysis network to obtain the grammatical score of each sentence in the text data, calculate the average value of the grammatical score of each sentence, and obtain the grammatical score of the text data.
  • the second extraction module is used to randomly intercept video frames from the interviewer's video data, extract the interviewer's micro-expression features according to the video frames, and obtain the confidence level of the video data according to the micro-expression features;
  • the analysis module is used to determine the interview result of the interviewer according to the first voice emotion data, the second voice emotion data, the grammar score and the confidence of the video data.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the grammar analysis network is obtained by training the second sample text data;
  • the interview result of the interviewer is determined.
  • One or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • the grammar analysis network is obtained by training the second sample text data;
  • the interview result of the interviewer is determined.
  • Fig. 1 is an application scenario diagram of a data processing method according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a data processing method according to one or more embodiments.
  • FIG. 3 is a schematic diagram of a sub-flow according to one or more of step 204 in FIG. 2.
  • FIG. 4 is a schematic diagram of a sub-flow according to one or more of step 204 in FIG. 2.
  • Fig. 5 is a schematic diagram of a sub-flow according to one or more of step 204 in Fig. 2.
  • FIG. 6 is a schematic diagram of a sub-process according to one or more of step 206 in FIG. 2.
  • FIG. 7 is a schematic diagram of a sub-flow according to one or more of step 212 in FIG. 2.
  • Fig. 8 is a block diagram of a data processing device according to one or more embodiments.
  • Figure 9 is a block diagram of a computer device according to one or more embodiments.
  • the data processing method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 and the server 104 communicate through the network.
  • the server 104 obtains the interviewer’s audio data and the interviewer’s video data, extracts the interviewer’s micro-speech feature according to the interviewer’s audio data, obtains the first voice emotion data according to the micro-speech feature, converts the interviewer’s audio data into text data, and converts the text
  • the data is split into multiple sentences, and multiple sentences are segmented. According to the words in each sentence, the dictionary corresponding to the trained emotion classification network is searched and matched.
  • the search and matching results it is determined that the text data belongs to each preset
  • the confidence level of the emotion category the second speech emotion data is obtained
  • the emotion classification network is trained from the first sample text data
  • the text data is input into the trained grammar analysis network
  • the grammar score of each sentence in the text data is obtained
  • each sentence is calculated
  • the grammatical score average of the text data is obtained.
  • the grammatical analysis network is trained from the second sample text data.
  • the video frame is randomly intercepted from the interviewer’s video data, and the interviewer’s micro-expression features are extracted according to the video frame.
  • the video data confidence level is obtained, and the interview result of the interviewer is determined according to the first voice emotion data, the second voice emotion data, the grammar score and the video data confidence level, and is pushed to the terminal 102.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a data processing method is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step S202 Obtain interviewer audio data and interviewer video data.
  • the interviewer's video data refers to the video data recorded by the interviewer during the interview
  • the interviewer's audio data refers to the interviewer's audio data during the interview.
  • the interviewer's audio data can be extracted from the interviewer's video data.
  • Step S204 Extract the interviewer's micro-speech feature according to the interviewer's audio data, and obtain the first voice emotion data according to the micro-speech feature.
  • the server can extract the interviewer's micro-speech features from the interviewer's audio data by calling the speech feature extraction tool.
  • the micro-speech features include speech rate features, pitch features, and Mel frequency cepstral coefficients.
  • Speaking rate refers to the number of words per second in the voice data.
  • the words can be Chinese or English.
  • Pitch refers to the level of voice frequency.
  • Mel frequency cepstrum is based on the non-linear Mel scale of voice frequency. For the linear transformation of the logarithmic energy spectrum, the Mel frequency cepstrum coefficients are the coefficients that make up the Mel frequency cepstrum.
  • the server inputs the micro-speech features into the voice emotion classification model matching the interviewer’s gender information in the trained voice emotion classification model set, and can obtain the first speech emotion data corresponding to the micro-speech feature.
  • the first speech emotion data refers to the micro-speech
  • the voice features belong to the confidence of each preset emotion category.
  • the set of trained speech emotion classification models includes speech emotion classification models trained on sample data of interviewers of different genders, that is, an emotion classification model for analyzing male speech data and an emotion classification model for analyzing female speech data.
  • the server will obtain the interviewer's gender information, match the trained voice emotion classification model set according to the interviewer's gender information, and obtain a voice emotion classification model matching the interviewer's gender information from the trained voice emotion classification model set.
  • the voice emotion classification model is trained from sample voice data carrying annotation information.
  • the annotation information includes emotion category information and gender information.
  • the server divides the sample voice data according to gender information, and performs model training according to the divided sample voice data to obtain a set of voice emotion classification models.
  • Step S206 Convert the interviewer’s audio data into text data, split the text data into multiple sentences, and perform word segmentation on the multiple sentences, and find matching presets corresponding to the trained emotion classification network according to the words in each sentence
  • the dictionary determines the confidence that the text data belongs to each preset emotion category according to the search and matching results, and obtains the second voice emotion data, and the emotion classification network is trained by the first sample text data.
  • the emotion classification network can be a network based on BERT and superimposed on a classification layer containing N neurons (assuming N emotions are preset).
  • the server splits the text data into multiple sentences, performs word segmentation on each sentence, searches the dictionary matching BERT according to the words in each sentence, converts each word into the corresponding serial number of the word in the BERT dictionary, and converts the entire
  • the sequence number of the sentence is input into the BERT to obtain the confidence that each sentence belongs to each preset emotion category, and then according to the confidence that each sentence belongs to each preset emotion category, it is determined that the text data belongs to each preset emotion category
  • the confidence level of the second voice emotion data is obtained.
  • the emotion classification network can be obtained by training on the first sample text data. Each sample sentence in the first sample text data carries label information, and the label information is the emotion category information of each sample sentence.
  • the cache space realizes the optimization of the server's cache space.
  • Step S208 Input the text data into the trained grammatical analysis network to obtain the grammar score of each sentence in the text data, calculate the average of the grammatical scores of each sentence, and obtain the grammar score of the text data.
  • the grammatical analysis network is trained by the second sample text data get.
  • CoLA Corpus of Linguistic Acceptability
  • the data set includes multiple single sentences with annotations, which are marked as grammatically correct or not (0 is wrong, 1 is correct)
  • the grammatical analysis network can be used to determine the grammatical accuracy of the sentence.
  • the grammatical score ranges from 0 to 1, where 0 represents grammatical error, 1 represents grammatical correctness, and the confidence level between 0 and 1 can be understood as grammatical accuracy. degree.
  • the server calculates the average value of the grammar score of each sentence to obtain the grammar score of the text data.
  • the grammatical analysis network will automatically learn from the text data, without splitting and matching the grammatical structure of each sentence in the text data.
  • Step S210 randomly intercepting video frames from the interviewer's video data, extracting the interviewer's micro-expression features from the video frames, and obtaining the video data confidence level according to the micro-expression features.
  • the server randomly intercepts video frames from the interviewer's video data according to the preset time interval, obtains the interviewer's micro-expression features according to the video frames, and inputs the micro-expression features into the trained micro-expression model to obtain the micro-expression features attributable to each preset
  • the confidence level of the emotion category is sorted by the confidence level that the micro-expression features belong to each preset emotion category, and the maximum value of the confidence level is obtained to obtain the confidence level of the video data.
  • the micro-expression model is trained on sample micro-expression data.
  • Step S212 Determine the interview result of the interviewer according to the first voice emotion data, the second voice emotion data, the grammar score, and the confidence of the video data.
  • the server can obtain the audio data confidence by inputting the first voice emotion data, the second voice emotion data, and the grammar score into the trained audio classification model, and then according to the audio data confidence, video data confidence, and confidence parameters, Determine the interview result of the interviewer.
  • the parameters of the audio classification model include the confidence that the audio data in the first voice emotion data belongs to each preset emotion category, the confidence that the text data in the second voice emotion data belongs to each preset emotion category and the grammatical score.
  • sample voice data and sample text data that carry annotation information can be used as a training set.
  • the annotation information is used to mark whether the interviewer corresponding to the sample voice data and sample text data is lying.
  • the confidence parameter can be set according to needs, and the confidence parameter is an adjustable parameter.
  • the above data processing method extracts micro-speech features from the interviewer’s audio data, obtains the first voice emotion data based on the micro-speech features, converts the interviewer’s audio data into text data, analyzes the text data, and obtains the second voice emotion data and Grammar score, extract micro-expression features based on the interviewer’s video data, obtain the confidence of the video data based on the micro-expression features, and determine the interviewer’s interview based on the first voice emotion data, the second voice emotion data, the grammar score, and the confidence of the video data result.
  • step S204 includes:
  • Step S302 Invoking a voice feature extraction tool, and extracting the interviewer's micro-speech features based on the interviewer's audio data.
  • the micro-speech features include speech rate features, Mel frequency cepstral coefficients, and pitch features;
  • Step S304 Input the micro-speech feature into the matched voice emotion classification model to obtain first voice emotion data corresponding to the micro-speech feature.
  • the voice feature extraction tool to extract the Mel frequency cepstral coefficients perform fast Fourier transform on the interviewer’s audio data to obtain the spectrum, map the spectrum to the Mel scale, and perform discrete cosine transform after logarithm. Get the Mel frequency cepstrum coefficient.
  • the pitch features include the current segment pitch average, current segment pitch standard deviation, historical pitch average, and historical pitch standard deviation.
  • the method of extracting the average value of the current segment pitch is: Fast Fourier Transformation is performed on the interviewer's audio data to obtain a spectrogram of the audio data, and then the variance of each frequency band and the center of the spectrum is calculated, and the variance is summed and the square root is taken.
  • Historical pitch average and standard deviation refer to the average and standard deviation of the interviewer from the beginning of the interview to the current segment. These data will be stored in the server after the interview. For the convenience of calculation, the exponential moving average can be used for approximate calculation.
  • the update formula is:
  • Average historical pitch ⁇ *average historical pitch+(1- ⁇ )*average current pitch
  • Historical pitch standard deviation ⁇ *historical pitch standard deviation+(1- ⁇ )*current pitch standard deviation
  • is a weight parameter ranging from 0 to 1, which can be set according to your needs. The default here is 0.9.
  • Speaking rate features include current speaking rate, historical speaking rate average, and historical speaking rate standard deviation.
  • the historical speaking rate average and standard deviation are calculated and memorized by the server after the interview begins.
  • an exponential moving average can be used for approximate calculation.
  • the update formula is:
  • Average historical speaking rate ⁇ *average historical speaking rate+(1- ⁇ )*current speaking rate
  • Mean square deviation of historical speaking rate ⁇ *mean square deviation of historical speaking rate+(1- ⁇ )*(current speaking rate-average of historical speaking rate) 2
  • Standard deviation of historical speaking rate square root of the mean square deviation of historical speaking rate
  • the voice feature extraction tool is called, and the interviewer's micro-voice features are extracted from the interviewer's audio data, so as to realize the extraction of the interviewer's micro-voice features.
  • step S204 includes:
  • Step S402 Obtain gender information of the interviewer, and obtain a voice emotion classification model matching the interviewer's gender information from the trained voice emotion classification model set.
  • the voice emotion classification model is obtained by training the sample voice data carrying the annotation information, and the annotation information includes Emotion category information and gender information;
  • Step S404 acquiring the pitch feature, the Mel frequency cepstrum coefficient and the speech rate feature in the micro-speech feature
  • Step S406 Input the pitch feature, the Mel frequency cepstrum coefficient and the speech rate feature into the matched speech emotion classification model to obtain the confidence that the micro-speech feature belongs to each preset emotion category, and obtain the first micro-speech feature One voice emotion data.
  • the set of trained speech emotion classification models includes speech emotion classification models trained on sample data of interviewers of different genders, that is, an emotion classification model for analyzing male speech data and an emotion classification model for analyzing female speech data.
  • the server will obtain the interviewer's gender information, match the trained voice emotion classification model set according to the interviewer's gender information, and obtain a voice emotion classification model matching the interviewer's gender information from the trained voice emotion classification model set.
  • the voice emotion classification model is trained from sample voice data carrying annotation information.
  • the annotation information includes emotion category information and gender information.
  • the server divides the sample voice data according to gender information, and performs model training according to the divided sample voice data to obtain a set of voice emotion classification models.
  • Pitch features include the current segment pitch average, current segment pitch standard deviation, historical pitch average, and historical pitch standard deviation.
  • Speech rate features include current speech rate, historical speech rate average, and historical speech rate standard deviation. Server All the features included in the three features will be input as parameters into the matched speech emotion classification model. The convolutional neural network in the speech emotion classification model will synthesize all the features to give the micro-speech features belonging to each preset emotion category Confidence.
  • the matched speech emotion classification model is obtained according to the interviewer’s gender information, and the pitch feature, Mel frequency cepstrum coefficient and speech rate feature are input into the matched speech emotion classification model, and the obtained micro-speech feature belongs to each prediction Set the confidence level of the emotion category to obtain the first voice emotion data of the micro-speech feature, and realize the acquisition of the first voice emotion data.
  • the method further includes:
  • Step S502 Obtain sample voice data carrying label information
  • Step S504 dividing the sample voice data into a training set and a verification set
  • Step S506 Perform model training according to the training set and the initial speech emotion classification model to obtain a speech emotion classification model set;
  • Step S508 Perform model verification according to the verification set, and adjust each voice emotion classification model in the voice emotion classification model set.
  • the server After obtaining the sample voice data carrying the annotation information, the server first divides the sample voice data into the first sample voice data set and the second sample voice data set according to the gender information in the annotation information, and then the first sample voice data set And the second sample voice data set are divided into a training set and a validation set, respectively.
  • Model training is performed according to the training set in the first sample voice data set and the second sample voice data set to obtain the first voice emotion classification model and the second voice
  • the emotion classification model performs model verification according to the verification set in the first sample voice data set and the second sample voice data set, and adjusts the first voice emotion classification model and the second voice emotion classification model.
  • Both the first sample voice data set and the second sample voice data set only include sample voice data of interviewers of the same gender.
  • sample voice data carrying tagging information is obtained, the sample voice data is divided into a training set and a verification set, model training is performed based on the training set, and model verification is performed based on the verification set to obtain each voice emotion classification in the voice emotion classification model set
  • the model realizes the acquisition of the voice emotion classification model set.
  • step S206 includes:
  • Step S602 searching and matching a preset dictionary corresponding to the trained emotion classification network according to each word in each sentence, and determining the corresponding serial number of each word in each sentence in the dictionary;
  • Step S604 input the sequence number of each word in each sentence in the dictionary into the emotion classification network to obtain the confidence that each sentence in the text data belongs to each preset emotion category;
  • Step S606 Obtain the average value of the confidence that each sentence in the text data belongs to each preset emotion category, and obtain the confidence that the text data belongs to each preset emotion category according to the average value of the confidence.
  • the emotion classification network can be a network based on BERT and superimposed on a classification layer containing N neurons (assuming N emotions are preset).
  • the server splits the text data into multiple sentences, performs word segmentation on each sentence, searches the dictionary matching BERT according to the words in each sentence, converts each word into the corresponding serial number of the word in the BERT dictionary, and converts the entire
  • the sequence number of the sentence is input into the BERT to obtain the confidence that each sentence belongs to each preset emotion category, and then according to the confidence that each sentence belongs to each preset emotion category, it is determined that the text data belongs to each preset emotion category
  • the confidence level of the second voice emotion data is obtained.
  • the emotion classification network can be obtained by training on the first sample text data. Each sample sentence in the first sample text data carries label information, and the label information is the emotion category information of each sample sentence.
  • the sequence number of each word in each sentence in the dictionary is input into the emotion classification network to obtain the confidence that each sentence in the text data belongs to each preset emotion category, and then according to the confidence that each sentence in the text data belongs to each The confidence of the preset emotion category is obtained, and the confidence that the text data belongs to each preset emotion category is obtained, which realizes the acquisition of the confidence that the text data belongs to each preset emotion category.
  • step S212 includes:
  • Step S702 Obtain the audio data confidence level according to the first voice emotion data, the second voice emotion data, and the grammar score;
  • Step S704 Determine the interview result of the interviewer according to the audio data confidence level, the video data confidence level and the preset confidence level parameters.
  • the server can obtain the audio data confidence by inputting the first voice emotion data, the second voice emotion data, and the grammar score into the trained audio classification model, and then according to the audio data confidence, video data confidence, and confidence parameters, Determine the interview result of the interviewer.
  • the parameters of the audio classification model include the confidence that the audio data in the first voice emotion data belongs to each preset emotion category, the confidence that the text data in the second voice emotion data belongs to each preset emotion category and the grammatical score.
  • sample voice data and sample text data that carry annotation information can be used as a training set.
  • the annotation information is used to mark whether the interviewer corresponding to the sample voice data and sample text data is lying.
  • the confidence parameter can be set according to needs, and the confidence parameter is an adjustable parameter.
  • the interview result can be obtained from the interview score.
  • the method before step S206, the method further includes:
  • each sample sentence in the first sample text data carries emotion category information
  • the first sample text data is used as the training set for model training to obtain the emotion classification network.
  • a data processing device including: an acquisition module 802, a first extraction module 804, a first processing module 806, a second processing module 808, and a second extraction module 810 And analysis module 812, where:
  • the obtaining module 802 is used to obtain interviewer audio data and interviewer video data
  • the first extraction module 804 is configured to extract the interviewer's micro-speech feature according to the interviewer's audio data, and obtain first speech emotion data according to the micro-speech feature;
  • the first processing module 806 is used to convert the interviewer’s audio data into text data, split the text data into multiple sentences, and perform word segmentation on multiple sentences, and search for matching preset and trained words according to each sentence in each sentence.
  • the dictionary corresponding to the emotion classification network determines the confidence that the text data belongs to each preset emotion category according to the search matching result, and obtains the second speech emotion data, and the emotion classification network is trained by the first sample text data;
  • the second processing module 808 is used to input the text data into the trained grammatical analysis network to obtain the grammatical score of each sentence in the text data, calculate the average of the grammatical scores of each sentence, and obtain the grammatical score of the text data. Two samples of text data are obtained through training;
  • the second extraction module 810 is configured to randomly intercept video frames from the interviewer's video data, extract the interviewer's micro-expression features according to the video frames, and obtain the video data confidence level according to the micro-expression features;
  • the analysis module 812 is configured to determine the interview result of the interviewer according to the first voice emotion data, the second voice emotion data, the grammar score, and the confidence of the video data.
  • the above data processing device extracts micro-speech features based on the interviewer’s audio data, obtains first voice emotion data based on the micro-speech features, converts the interviewer’s audio data into text data, analyzes the text data, and obtains second voice emotion data, and Grammar score, extract micro-expression features based on the interviewer’s video data, obtain the confidence of the video data based on the micro-expression features, and determine the interviewer’s interview based on the first voice emotion data, the second voice emotion data, the grammar score, and the confidence of the video data result.
  • the first extraction module is also used to call the voice feature extraction tool to extract the interviewer’s micro-speech features based on the interviewer’s audio data.
  • the micro-speech features include speech rate features, Mel frequency cepstral coefficients, and pitch feature.
  • the first extraction module is also used to obtain gender information of the interviewer, and obtain a voice emotion classification model that matches the interviewer's gender information from the set of trained voice emotion classification models.
  • the voice emotion classification model is marked by carrying The sample voice data of the information is trained.
  • the labeling information includes emotion category information and gender information.
  • the pitch feature, Mel frequency cepstral coefficient and speech speed feature in the micro-speech feature are obtained, and the pitch feature and Mel frequency cepstral coefficient are obtained.
  • the speech emotion classification model that has been matched with the speech rate feature input, the confidence that the micro-speech feature belongs to each preset emotion category is obtained, and the first speech emotion data of the micro-speech feature is obtained.
  • the first extraction module is also used to obtain sample voice data carrying annotation information, divide the sample voice data into a training set and a validation set, and perform model training according to the training set and the initial voice emotion classification model to obtain the voice Emotion classification model set, perform model verification based on the validation set, and adjust each voice emotion classification model in the voice emotion classification model set.
  • the first processing module is further configured to search and match a preset dictionary corresponding to the trained emotion classification network according to each word in each sentence, and determine the corresponding serial number of each word in each sentence in the dictionary. Input the sequence number of each word in each sentence in the dictionary into the emotion classification network to obtain the confidence that each sentence in the text data belongs to each preset emotion category, and obtain the confidence that each sentence in the text data belongs to each preset emotion category According to the average value of confidence, the confidence that the text data belongs to each preset emotion category is obtained.
  • the analysis module is further configured to obtain the audio data confidence level according to the first voice emotion data, the second voice emotion data, and the grammar score, according to the audio data confidence level, the video data confidence level, and the preset confidence level Parameters to determine the interview result of the interviewer.
  • the first processing module is also used to obtain first sample text data, each sample sentence in the first sample text data carries emotion category information, and the first sample text data is used as a training set for the model Train to get the emotion classification network.
  • Each module in the above-mentioned data processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, and a network interface connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer readable instructions.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize a data processing method.
  • FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or less parts than shown in the figure, or combining some parts, or having a different part arrangement.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the one or more processors execute the following step:
  • the grammar analysis network is obtained by training the second sample text data;
  • the interview result of the interviewer is determined.
  • one or more non-volatile computer-readable storage media storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, cause the one or more processors to execute The following steps:
  • the grammar analysis network is obtained by training the second sample text data;
  • the interview result of the interviewer is determined.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (step RAM), dynamic RAM (DRAM), synchronous DRAM (step DRAM), double data rate step DRAM (DDR step DRAM), enhanced step DRAM (E step DRAM), synchronous link (step ynchlink) DRAM (step LDRAM), memory bus (Rambu step) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法,包括:根据微语音特征,得到第一语音情绪数据,将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,将文字数据输入已训练的语法分析网络,得到文字数据的语法评分,根据微表情特征,得到视频数据置信度,根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。

Description

数据处理方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2019年8月13日提交中国专利局,申请号为2019107454436,申请名称为“数据处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种数据处理方法、装置、计算机设备和存储介质。
背景技术
随着人工智能的发展,出现了智能面试系统,传统的智能面试系统,多为识别面部微表情去发现被面试者的异常表情,并作为风险评估的依据之一。微表情,是心理学名词。人们通过做一些表情把内心感受表达给对方看,在人们做的不同表情之间,或是某个表情里,脸部会“泄露”出其它的信息。“微表情”最短可持续1/25秒,虽然一个下意识的表情可能只持续一瞬间,但有时表达相反的情绪。
然而,发明人意识到,仅依靠识别微表情特征并不足以准确全面捕捉被面试者的心理状态,易导致面试结果与真实情况存在较大差异,存在识别准确率低的问题。
发明内容
根据本申请公开的各种实施例,提供一种数据处理方法、装置、计算机设备和存储介质。
一种数据处理方法包括:
获取面试者音频数据以及面试者视频数据;
根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微 表情特征,得到视频数据置信度;及
根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
一种数据处理装置包括:
获取模块,用于获取面试者音频数据以及面试者视频数据;
第一提取模块,用于根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
第一处理模块,用于将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
第二处理模块,用于将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
第二提取模块,用于从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;及
分析模块,用于根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取面试者音频数据以及面试者视频数据;
根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;及
根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取面试者音频数据以及面试者视频数据;
根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;及
根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中数据处理方法的应用场景图。
图2为根据一个或多个实施例中数据处理方法的流程示意图。
图3为根据一个或多个中图2中步骤204的子流程示意图。
图4为根据一个或多个中图2中步骤204的子流程示意图。
图5为根据一个或多个中图2中步骤204的子流程示意图。
图6为根据一个或多个中图2中步骤206的子流程示意图。
图7为根据一个或多个中图2中步骤212的子流程示意图。
图8为根据一个或多个实施例中数据处理装置的框图。
图9为根据一个或多个实施例中计算机设备的框图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进 行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的数据处理方法,可以应用于如图1所示的应用环境中。终端102与服务器104通过网络进行通信。服务器104获取面试者音频数据以及面试者视频数据,根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据,将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到,将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到,从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度,根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果,并推送至终端102。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种数据处理方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,获取面试者音频数据以及面试者视频数据。
面试者视频数据指的是面试者在接受面试时被录制的视频数据,面试者音频数据指的是面试者在接受面试时的音频数据,面试者音频数据可以从面试者视频数据中提取得到。
步骤S204,根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据。
服务器通过调用语音特征提取工具可从面试者音频数据中提取面试者的微语音特征,微语音特征包括语速特征、音高特征以及梅尔频率倒谱系数。语速指的是语音数据中每秒钟的单词数,单词可以为中文也可以为英文,音高指的是语音频率的高低,梅尔频率倒谱是基于声音频率的非线性梅尔刻度的对数能量频谱的线性变换,梅尔频率倒谱系数就是组成梅尔频率倒谱的系数。服务器将微语音特征输入已训练的语音情绪分类模型集合中与面试者性别信息匹配的语音情绪分类模型,可得到与微语音特征对应的第一语音情绪数据,第一语音情绪数据指的是微语音特征归属于各预设的情绪类别的置信度。
已训练的语音情绪分类模型集合中包括针对不同性别面试者样本数据训练得到的语音情绪分类模型,即分析男性语音数据的情绪分类模型和分析女性语音数据的情绪分类模型。服务器会获取面试者性别信息,根据面试者性别信息匹配已训练的语音情绪分类模型集合,从已训练的语音情绪分类模型集合中获取与面试者性别信息匹配的语音情绪分类模型。语音情绪分类模型由携带标注信息的样本语音数据训练得到,标注信息包括情绪类别 信息以及性别信息。服务器会根据性别信息对样本语音数据进行划分,根据划分后的样本语音数据分别进行模型训练,得到语音情绪分类模型集合。
步骤S206,将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到。
情绪分类网络可以为以BERT为基础,叠加一层含N个神经元(假定预设N种情绪)的分类层的网络。服务器将文字数据拆分为多个句子,对每个句子进行分词,根据各句子中各词语查找匹配BERT的字典,把每个词转换为该词在BERT的字典中对应的序列号,把整个句子的序列号输入进BERT,得到各句子归属于各预设的情绪类别的置信度,进而根据各句子归属于各预设的情绪类别的置信度,确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据。情绪分类网络可以由第一样本文字数据训练得到,第一样本文字数据中各样本句都携带有标注信息,标注信息为各样本句的情绪类别信息。
由于文字数据所需的缓存空间比音频数据以及视频数据小,在进行数据处理时,采用将面试者音频数据转换为文字数据,对文字数据进行处理的方式,能够在处理的过程中节省服务器的缓存空间,实现了对服务器的缓存空间的优化。
步骤S208,将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到。
在训练语法分析网络时,可采用CoLA(Corpus of Linguistic Acceptability)作为第二样本文字数据,该数据集包括多个携带标注的单句,标注为语法正确与否(0为错误,1为正确),在经过训练之后,语法分析网络可用于判定句子的语法准确度,语法分数范围为0~1,0代表语法错误,1代表语法正确,介于0到1之间的置信度可理解为语法准确度。在得到文字数据中各句子的语法分数之后,服务器会计算各句子的语法分数平均值,得到文字数据的语法评分。语法分析网络会自动根据文字数据进行学习,无需对文字数据中各句子进行拆分和匹配语法结构。
步骤S210,从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度。
服务器根据预设时间间隔从面试者视频数据中随机截取视频帧,根据视频帧获取面试者的微表情特征,将微表情特征输入已训练的微表情模型,可得到微表情特征归属于各预设的情绪类别的置信度,对微表情特征归属于各预设的情绪类别的置信度进行排序,获取置信度最大值,得到视频数据置信度。微表情模型由样本微表情数据训练得到。
步骤S212,根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
服务器可通过将第一语音情绪数据、第二语音情绪数据以及语法评分输入已训练的音 频分类模型的方式,得到音频数据置信度,进而根据音频数据置信度、视频数据置信度以及置信度参数,确定面试者的面试结果。具体的,音频分类模型的参数包括第一语音情绪数据中音频数据归属各预设的情绪类别的置信度,第二语音情绪数据中文字数据归属各预设的情绪类别的置信度以及语法评分。在训练音频分类模型时,可以以携带标注信息的样本语音数据以及样本文字数据作为训练集,标注信息用于标注与样本语音数据以及样本文字数据对应的面试者是否说谎。置信度参数可按照需要自行设置,置信度参数为可调参数。
上述数据处理方法,根据面试者音频数据提取微语音特征,根据微语音特征,得到第一语音情绪数据,将面试者音频数据转换为文字数据,对文字数据进行分析,得到第二语音情绪数据以及语法评分,根据面试者视频数据提取微表情特征,根据微表情特征,得到视频数据置信度,根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。通过多种方式识别面试者的多个特征,综合多个识别结果确定面试者的面试结果,从而能够准确全面捕捉被面试者的心理状态,提高识别准确率,使面试结果更贴近真实情况。
在其中一个实施例中,如图3所示,步骤S204包括:
步骤S302,调用语音特征提取工具,根据面试者音频数据提取面试者的微语音特征,微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征;
步骤S304,将微语音特征输入已匹配的语音情绪分类模型,得到与微语音特征对应的第一语音情绪数据。
调用语音特征提取工具,提取梅尔频率倒谱系数的方式为:对面试者音频数据进行快速傅里叶变换得到频谱,把频谱映射到梅尔比例,去对数后进行离散余弦变换,即可得到梅尔频率倒谱系数。音高特征包括当前片段音高平均值、当前片段音高标准差、历史音高平均值以及历史音高标准差。当前片段音高平均值的提取方式为:对面试者音频数据进行快速傅里叶变换,得到音频数据的频谱图,然后计算每个频段与频谱中心值的方差,对方差求和后取平方根。历史音高平均值和标准差是指面试者从本次面试开始到当前片段为止的平均值和标准差。这些数据会在面试开始后在服务器中进行记忆存储。为了计算方便,可以用指数移动平均值近似计算,更新公式为:
历史音高平均值=α*历史音高平均值+(1-α)*当前音高平均值
历史音高标准差=α*历史音高标准差+(1-α)*当前音高标准差
α为介于0到1的权重参数,可按照需要自行设置,此处默认为0.9。
语速特征包括当前语速、历史语速平均值以及历史语速标准差,历史语速平均值和标准差是服务器在面试开始后进行的计算和记忆存储。同样的,为了计算方便,可以用指数移动平均值近似计算,更新公式为:
历史语速平均值=α*历史语速平均值+(1-α)*当前语速
历史语速均方差=α*历史语速均方差+(1-α)*(当前语速–历史语速平均值) 2
历史语速标准差=历史语速均方差的开方值
上述实施例,调用语音特征提取工具,根据面试者音频数据提取面试者的微语音特征,实现了对面试者的微语音特征的提取。
在其中一个实施例中,如图4所示,步骤S204包括:
步骤S402,获取面试者性别信息,从已训练的语音情绪分类模型集合中获取与面试者性别信息匹配的语音情绪分类模型,语音情绪分类模型由携带标注信息的样本语音数据训练得到,标注信息包括情绪类别信息以及性别信息;
步骤S404,获取微语音特征中的音高特征、梅尔频率倒谱系数以及语速特征;
步骤S406,将音高特征、梅尔频率倒谱系数以及语速特征输入已匹配的语音情绪分类模型中,获取微语音特征归属于各预设的情绪类别的置信度,得到微语音特征的第一语音情绪数据。
已训练的语音情绪分类模型集合中包括针对不同性别面试者样本数据训练得到的语音情绪分类模型,即分析男性语音数据的情绪分类模型和分析女性语音数据的情绪分类模型。服务器会获取面试者性别信息,根据面试者性别信息匹配已训练的语音情绪分类模型集合,从已训练的语音情绪分类模型集合中获取与面试者性别信息匹配的语音情绪分类模型。语音情绪分类模型由携带标注信息的样本语音数据训练得到,标注信息包括情绪类别信息以及性别信息。服务器会根据性别信息对样本语音数据进行划分,根据划分后的样本语音数据分别进行模型训练,得到语音情绪分类模型集合。
音高特征包括当前片段音高平均值、当前片段音高标准差、历史音高平均值以及历史音高标准差,语速特征包括当前语速、历史语速平均以及历史语速标准差,服务器会将三个特征中包括的所有特征作为参数输入已匹配的语音情绪分类模型中,语音情绪分类模型中的卷积神经网络会综合所有特征给出微语音特征归属于各预设的情绪类别的置信度。
上述实施例,根据面试者性别信息获取匹配的语音情绪分类模型,将音高特征、梅尔频率倒谱系数以及语速特征输入已匹配的语音情绪分类模型中,获取微语音特征归属于各预设的情绪类别的置信度,得到微语音特征的第一语音情绪数据,实现了对第一语音情绪数据的获取。
在其中一个实施例中,如图5所示,步骤S402之前,还包括:
步骤S502,获取携带标注信息的样本语音数据;
步骤S504,将样本语音数据划分为训练集和验证集;
步骤S506,根据训练集以及初始语音情绪分类模型进行模型训练,得到语音情绪分类模型集合;
步骤S508,根据验证集进行模型验证,调整语音情绪分类模型集合中各语音情绪分类模型。
在获取携带标注信息的样本语音数据之后,服务器首先根据标注信息中的性别信息将样本语音数据划分为第一样本语音数据集合和第二样本语音数据集合,再将第一样本语音 数据集合和第二样本语音数据集合分别划分为训练集和验证集,根据第一样本语音数据集合和第二样本语音数据集合中的训练集进行模型训练,得到第一语音情绪分类模型和第二语音情绪分类模型,根据第一样本语音数据集合和第二样本语音数据集合中的验证集进行模型验证,调整第一语音情绪分类模型和第二语音情绪分类模型。第一样本语音数据集合和第二样本语音数据集合中都分别只包括了同性别面试者的样本语音数据。
上述实施例,获取携带标注信息的样本语音数据,将样本语音数据划分为训练集和验证集,根据训练集进行模型训练,根据验证集进行模型验证,得到语音情绪分类模型集合中各语音情绪分类模型,实现了对语音情绪分类模型集合的获取。
在其中一个实施例中,如图6所示,步骤S206包括:
步骤S602,根据各句子中各词语查找匹配预设的与已训练的情绪分类网络对应的字典,确定各句子中各词语在字典中对应的序列号;
步骤S604,将各句子中各词语在字典中对应的序列号输入情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度;
步骤S606,获取文字数据中各句子归属于各预设的情绪类别的置信度的平均值,根据置信度的平均值,得到文字数据归属于各预设的情绪类别的置信度。
情绪分类网络可以为以BERT为基础,叠加一层含N个神经元(假定预设N种情绪)的分类层的网络。服务器将文字数据拆分为多个句子,对每个句子进行分词,根据各句子中各词语查找匹配BERT的字典,把每个词转换为该词在BERT的字典中对应的序列号,把整个句子的序列号输入进BERT,得到各句子归属于各预设的情绪类别的置信度,进而根据各句子归属于各预设的情绪类别的置信度,确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据。情绪分类网络可以由第一样本文字数据训练得到,第一样本文字数据中各样本句都携带有标注信息,标注信息为各样本句的情绪类别信息。
上述实施例,将各句子中各词语在字典中对应的序列号输入情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度,进而根据文字数据中各句子归属于各预设的情绪类别的置信度,得到文字数据归属于各预设的情绪类别的置信度,实现了对文字数据归属于各预设的情绪类别的置信度的获取。
在其中一个实施例中,如图7所示,步骤S212包括:
步骤S702,根据第一语音情绪数据、第二语音情绪数据以及语法评分,得到音频数据置信度;
步骤S704,根据音频数据置信度、视频数据置信度以及预设的置信度参数,确定面试者的面试结果。
服务器可通过将第一语音情绪数据、第二语音情绪数据以及语法评分输入已训练的音频分类模型的方式,得到音频数据置信度,进而根据音频数据置信度、视频数据置信度以及置信度参数,确定面试者的面试结果。具体的,音频分类模型的参数包括第一语音情绪数据中音频数据归属各预设的情绪类别的置信度,第二语音情绪数据中文字数据归属各预 设的情绪类别的置信度以及语法评分。在训练音频分类模型时,可以以携带标注信息的样本语音数据以及样本文字数据作为训练集,标注信息用于标注与样本语音数据以及样本文字数据对应的面试者是否说谎。置信度参数可按照需要自行设置,置信度参数为可调参数。面试结果可以由面试评分得到,面试评分的公式可以为:面试评分=A*音频数据置信度+B*视频数据置信度,A和B即为置信度参数。
在其中一个实施例中,在步骤S206之前,所述方法还包括:
获取第一样本文字数据,第一样本文字数据中各样本句携带有情绪类别信息;及
将第一样本文字数据作为训练集进行模型训练,得到情绪分类网络。
应该理解的是,虽然图2-7的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-7中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图8所示,提供了一种数据处理装置,包括:获取模块802、第一提取模块804、第一处理模块806、第二处理模块808、第二提取模块810和分析模块812,其中:
获取模块802,用于获取面试者音频数据以及面试者视频数据;
第一提取模块804,用于根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
第一处理模块806,用于将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
第二处理模块808,用于将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
第二提取模块810,用于从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;
分析模块812,用于根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
上述数据处理装置,根据面试者音频数据提取微语音特征,根据微语音特征,得到第一语音情绪数据,将面试者音频数据转换为文字数据,对文字数据进行分析,得到第二语 音情绪数据以及语法评分,根据面试者视频数据提取微表情特征,根据微表情特征,得到视频数据置信度,根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。通过多种方式识别面试者的多个特征,综合多个识别结果确定面试者的面试结果,从而能够准确全面捕捉被面试者的心理状态,提高识别准确率,使面试结果更贴近真实情况。
在其中一个实施例中,第一提取模块还用于调用语音特征提取工具,根据面试者音频数据提取面试者的微语音特征,微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征。
在其中一个实施例中,第一提取模块还用于获取面试者性别信息,从已训练的语音情绪分类模型集合中获取与面试者性别信息匹配的语音情绪分类模型,语音情绪分类模型由携带标注信息的样本语音数据训练得到,标注信息包括情绪类别信息以及性别信息,获取微语音特征中的音高特征、梅尔频率倒谱系数以及语速特征,将音高特征、梅尔频率倒谱系数以及语速特征输入已匹配的语音情绪分类模型中,获取微语音特征归属于各预设的情绪类别的置信度,得到微语音特征的第一语音情绪数据。
在其中一个实施例中,第一提取模块还用于获取携带标注信息的样本语音数据,将样本语音数据划分为训练集和验证集,根据训练集以及初始语音情绪分类模型进行模型训练,得到语音情绪分类模型集合,根据验证集进行模型验证,调整语音情绪分类模型集合中各语音情绪分类模型。
在其中一个实施例中,第一处理模块还用于根据各句子中各词语查找匹配预设的与已训练的情绪分类网络对应的字典,确定各句子中各词语在字典中对应的序列号,将各句子中各词语在字典中对应的序列号输入情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度,获取文字数据中各句子归属于各预设的情绪类别的置信度的平均值,根据置信度的平均值,得到文字数据归属于各预设的情绪类别的置信度。
在其中一个实施例中,分析模块还用于根据第一语音情绪数据、第二语音情绪数据以及语法评分,得到音频数据置信度,根据音频数据置信度、视频数据置信度以及预设的置信度参数,确定面试者的面试结果。
在其中一个实施例中,第一处理模块还用于获取第一样本文字数据,第一样本文字数据中各样本句携带有情绪类别信息,将第一样本文字数据作为训练集进行模型训练,得到情绪分类网络。
关于数据处理装置的具体限定可以参见上文中对于数据处理方法的限定,在此不再赘述。上述数据处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部 结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种数据处理方法。
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在其中一个实施例中,一种计算机设备,包括存储器和一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被处理器执行时,使得一个或多个处理器执行以下步骤:
获取面试者音频数据以及面试者视频数据;
根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪分类网络由第一样本文字数据训练得到;
将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;及
根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
在其中一个实施例中,一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取面试者音频数据以及面试者视频数据;
根据面试者音频数据提取面试者的微语音特征,根据微语音特征,得到第一语音情绪数据;
将面试者音频数据转换为文字数据,将文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,情绪 分类网络由第一样本文字数据训练得到;
将文字数据输入已训练的语法分析网络,得到文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到文字数据的语法评分,语法分析网络由第二样本文字数据训练得到;
从面试者视频数据中随机截取视频帧,根据视频帧提取面试者的微表情特征,根据微表情特征,得到视频数据置信度;及
根据第一语音情绪数据、第二语音情绪数据、语法评分以及视频数据置信度,确定面试者的面试结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(步骤RAM)、动态RAM(DRAM)、同步DRAM(步骤DRAM)、双数据率步骤DRAM(DDR步骤DRAM)、增强型步骤DRAM(E步骤DRAM)、同步链路(步骤ynchlink)DRAM(步骤LDRAM)、存储器总线(Rambu步骤)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种数据处理方法,包括:
    获取面试者音频数据以及面试者视频数据;
    根据所述面试者音频数据提取面试者的微语音特征,根据所述微语音特征,得到第一语音情绪数据;
    将所述面试者音频数据转换为文字数据,将所述文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定所述文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,所述情绪分类网络由第一样本文字数据训练得到;
    将所述文字数据输入已训练的语法分析网络,得到所述文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到所述文字数据的语法评分,所述语法分析网络由第二样本文字数据训练得到;
    从所述面试者视频数据中随机截取视频帧,根据所述视频帧提取面试者的微表情特征,根据所述微表情特征,得到视频数据置信度;及
    根据所述第一语音情绪数据、所述第二语音情绪数据、所述语法评分以及所述视频数据置信度,确定面试者的面试结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述面试者音频数据提取面试者的微语音特征,包括:
    调用语音特征提取工具,根据所述面试者音频数据提取面试者的微语音特征,所述微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述微语音特征,得到第一语音情绪数据,包括:
    获取面试者性别信息,从已训练的语音情绪分类模型集合中获取与所述面试者性别信息匹配的语音情绪分类模型,所述语音情绪分类模型由携带标注信息的样本语音数据训练得到,所述标注信息包括情绪类别信息以及性别信息;
    获取所述微语音特征中的音高特征、梅尔频率倒谱系数以及语速特征;及
    将所述音高特征、所述梅尔频率倒谱系数以及所述语速特征输入已匹配的语音情绪分类模型中,获取所述微语音特征归属于各预设的情绪类别的置信度,得到所述微语音特征的第一语音情绪数据。
  4. 根据权利要求3所述的方法,其特征在于,在从已训练的语音情绪分类模型集合中获取与所述面试者性别信息匹配的语音情绪分类模型之前,所述方法还包括:
    获取携带标注信息的样本语音数据;
    将所述样本语音数据划分为训练集和验证集;
    根据所述训练集以及初始语音情绪分类模型进行模型训练,得到语音情绪分类模型集合;及
    根据所述验证集进行模型验证,调整所述语音情绪分类模型集合中各语音情绪分类模型。
  5. 根据权利要求1所述的方法,其特征在于,所述根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定所述文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,包括:
    根据各句子中各词语查找匹配预设的与已训练的情绪分类网络对应的字典,确定各句子中各词语在所述字典中对应的序列号;
    将各句子中各词语在所述字典中对应的序列号输入所述情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度;及
    获取所述文字数据中各句子归属于各预设的情绪类别的置信度的平均值,根据所述置信度的平均值,得到所述文字数据归属于各预设的情绪类别的置信度。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述第一语音情绪数据、所述第二语音情绪数据、所述语法评分以及所述视频数据置信度,确定面试者的面试结果包括:
    根据所述第一语音情绪数据、所述第二语音情绪数据以及所述语法评分,得到音频数据置信度;及
    根据所述音频数据置信度、所述视频数据置信度以及预设的置信度参数,确定面试者的面试结果。
  7. 根据权利要求1所述的方法,其特征在于,在根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典之前,所述方法还包括:
    获取所述第一样本文字数据,所述第一样本文字数据中各样本句携带有情绪类别信息;及
    将所述第一样本文字数据作为训练集进行模型训练,得到情绪分类网络。
  8. 一种数据处理装置,包括:
    获取模块,用于获取面试者音频数据以及面试者视频数据;
    第一提取模块,用于根据所述面试者音频数据提取面试者的微语音特征,根据所述微语音特征,得到第一语音情绪数据;
    第一处理模块,用于将所述面试者音频数据转换为文字数据,将所述文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定所述文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,所述情绪分类网络由第一样本文字数据训练得到;
    第二处理模块,用于将所述文字数据输入已训练的语法分析网络,得到所述文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到所述文字数据的语法评分,所述语法分析网络由第二样本文字数据训练得到;
    第二提取模块,用于从所述面试者视频数据中随机截取视频帧,根据所述视频帧提取 面试者的微表情特征,根据所述微表情特征,得到视频数据置信度;及
    分析模块,用于根据所述第一语音情绪数据、所述第二语音情绪数据、所述语法评分以及所述视频数据置信度,确定面试者的面试结果。
  9. 根据权利要求8所述的装置,其特征在于,第一提取模块还用于调用语音特征提取工具,根据所述面试者音频数据提取面试者的微语音特征,所述微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征。
  10. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取面试者音频数据以及面试者视频数据;
    根据所述面试者音频数据提取面试者的微语音特征,根据所述微语音特征,得到第一语音情绪数据;
    将所述面试者音频数据转换为文字数据,将所述文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定所述文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,所述情绪分类网络由第一样本文字数据训练得到;
    将所述文字数据输入已训练的语法分析网络,得到所述文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到所述文字数据的语法评分,所述语法分析网络由第二样本文字数据训练得到;
    从所述面试者视频数据中随机截取视频帧,根据所述视频帧提取面试者的微表情特征,根据所述微表情特征,得到视频数据置信度;及
    根据所述第一语音情绪数据、所述第二语音情绪数据、所述语法评分以及所述视频数据置信度,确定面试者的面试结果。
  11. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    调用语音特征提取工具,根据所述面试者音频数据提取面试者的微语音特征,所述微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征。
  12. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取面试者性别信息,从已训练的语音情绪分类模型集合中获取与所述面试者性别信息匹配的语音情绪分类模型,所述语音情绪分类模型由携带标注信息的样本语音数据训练得到,所述标注信息包括情绪类别信息以及性别信息;
    获取所述微语音特征中的音高特征、梅尔频率倒谱系数以及语速特征;及
    将所述音高特征、所述梅尔频率倒谱系数以及所述语速特征输入已匹配的语音情绪分类模型中,获取所述微语音特征归属于各预设的情绪类别的置信度,得到所述微语音特征 的第一语音情绪数据。
  13. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取携带标注信息的样本语音数据;
    将所述样本语音数据划分为训练集和验证集;
    根据所述训练集以及初始语音情绪分类模型进行模型训练,得到语音情绪分类模型集合;及
    根据所述验证集进行模型验证,调整所述语音情绪分类模型集合中各语音情绪分类模型。
  14. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    根据各句子中各词语查找匹配预设的与已训练的情绪分类网络对应的字典,确定各句子中各词语在所述字典中对应的序列号;
    将各句子中各词语在所述字典中对应的序列号输入所述情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度;及
    获取所述文字数据中各句子归属于各预设的情绪类别的置信度的平均值,根据所述置信度的平均值,得到所述文字数据归属于各预设的情绪类别的置信度。
  15. 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    根据所述第一语音情绪数据、所述第二语音情绪数据以及所述语法评分,得到音频数据置信度;及
    根据所述音频数据置信度、所述视频数据置信度以及预设的置信度参数,确定面试者的面试结果。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取面试者音频数据以及面试者视频数据;
    根据所述面试者音频数据提取面试者的微语音特征,根据所述微语音特征,得到第一语音情绪数据;
    将所述面试者音频数据转换为文字数据,将所述文字数据拆分为多个句子,并对多个句子进行分词,根据各句子中各词语查找匹配预设的与已训练情绪分类网络对应的字典,根据查找匹配结果确定所述文字数据归属于各预设的情绪类别的置信度,得到第二语音情绪数据,所述情绪分类网络由第一样本文字数据训练得到;
    将所述文字数据输入已训练的语法分析网络,得到所述文字数据中各句子的语法分数,计算各句子的语法分数平均值,得到所述文字数据的语法评分,所述语法分析网络由第二样本文字数据训练得到;
    从所述面试者视频数据中随机截取视频帧,根据所述视频帧提取面试者的微表情特征,根据所述微表情特征,得到视频数据置信度;及
    根据所述第一语音情绪数据、所述第二语音情绪数据、所述语法评分以及所述视频数据置信度,确定面试者的面试结果。
  17. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    调用语音特征提取工具,根据所述面试者音频数据提取面试者的微语音特征,所述微语音特征包括语速特征、梅尔频率倒谱系数以及音高特征。
  18. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取面试者性别信息,从已训练的语音情绪分类模型集合中获取与所述面试者性别信息匹配的语音情绪分类模型,所述语音情绪分类模型由携带标注信息的样本语音数据训练得到,所述标注信息包括情绪类别信息以及性别信息;
    获取所述微语音特征中的音高特征、梅尔频率倒谱系数以及语速特征;及
    将所述音高特征、所述梅尔频率倒谱系数以及所述语速特征输入已匹配的语音情绪分类模型中,获取所述微语音特征归属于各预设的情绪类别的置信度,得到所述微语音特征的第一语音情绪数据。
  19. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取携带标注信息的样本语音数据;
    将所述样本语音数据划分为训练集和验证集;
    根据所述训练集以及初始语音情绪分类模型进行模型训练,得到语音情绪分类模型集合;及
    根据所述验证集进行模型验证,调整所述语音情绪分类模型集合中各语音情绪分类模型。
  20. 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    根据各句子中各词语查找匹配预设的与已训练的情绪分类网络对应的字典,确定各句子中各词语在所述字典中对应的序列号;
    将各句子中各词语在所述字典中对应的序列号输入所述情绪分类网络,得到文字数据中各句子归属于各预设的情绪类别的置信度;及
    获取所述文字数据中各句子归属于各预设的情绪类别的置信度的平均值,根据所述置信度的平均值,得到所述文字数据归属于各预设的情绪类别的置信度。
PCT/CN2019/107727 2019-08-13 2019-09-25 数据处理方法、装置、计算机设备和存储介质 WO2021027029A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG11202004543PA SG11202004543PA (en) 2019-08-13 2019-09-25 Data processing method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910745443.6 2019-08-13
CN201910745443.6A CN110688499A (zh) 2019-08-13 2019-08-13 数据处理方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021027029A1 true WO2021027029A1 (zh) 2021-02-18

Family

ID=69108262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107727 WO2021027029A1 (zh) 2019-08-13 2019-09-25 数据处理方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
CN (1) CN110688499A (zh)
SG (1) SG11202004543PA (zh)
WO (1) WO2021027029A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739559B (zh) * 2020-05-07 2023-02-28 北京捷通华声科技股份有限公司 一种话语预警方法、装置、设备及存储介质
CN112818740A (zh) * 2020-12-29 2021-05-18 南京智能情资创新科技研究院有限公司 一种用于智能面试的心理素质维度评价方法及装置
CN112884326A (zh) * 2021-02-23 2021-06-01 无锡爱视智能科技有限责任公司 一种多模态分析的视频面试评估方法、装置和存储介质
CN112786054B (zh) * 2021-02-25 2024-06-11 深圳壹账通智能科技有限公司 基于语音的智能面试评估方法、装置、设备及存储介质
CN112990301A (zh) * 2021-03-10 2021-06-18 深圳市声扬科技有限公司 情绪数据标注方法、装置、计算机设备和存储介质
CN112836691A (zh) * 2021-03-31 2021-05-25 中国工商银行股份有限公司 智能面试方法及装置
CN113506586B (zh) * 2021-06-18 2023-06-20 杭州摸象大数据科技有限公司 用户情绪识别的方法和系统
CN113724697A (zh) * 2021-08-27 2021-11-30 北京百度网讯科技有限公司 模型生成方法、情绪识别方法、装置、设备及存储介质
CN113808709B (zh) * 2021-08-31 2024-03-22 天津师范大学 一种基于文本分析的心理弹性预测方法及系统
CN114627218B (zh) * 2022-05-16 2022-08-12 成都市谛视无限科技有限公司 一种基于虚拟引擎的人脸细微表情捕捉方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570496A (zh) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 情绪识别方法和装置以及智能交互方法和设备
CN108305642A (zh) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 情感信息的确定方法和装置
US20180376001A1 (en) * 2016-11-02 2018-12-27 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs at Call Centers
CN109766917A (zh) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 面试视频数据处理方法、装置、计算机设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503646B (zh) * 2016-10-19 2020-07-10 竹间智能科技(上海)有限公司 多模态情感辨识系统及方法
EP3729419A1 (en) * 2017-12-19 2020-10-28 Wonder Group Technologies Ltd. Method and apparatus for emotion recognition from speech
CN109829363A (zh) * 2018-12-18 2019-05-31 深圳壹账通智能科技有限公司 表情识别方法、装置、计算机设备和存储介质
CN109902158A (zh) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 语音交互方法、装置、计算机设备及存储介质
CN109948438A (zh) * 2019-02-12 2019-06-28 平安科技(深圳)有限公司 自动面试评分方法、装置、系统、计算机设备及存储介质
CN109905381A (zh) * 2019-02-15 2019-06-18 北京大米科技有限公司 自助面试方法、相关装置和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180376001A1 (en) * 2016-11-02 2018-12-27 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs at Call Centers
CN106570496A (zh) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 情绪识别方法和装置以及智能交互方法和设备
CN108305642A (zh) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 情感信息的确定方法和装置
CN109766917A (zh) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 面试视频数据处理方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
SG11202004543PA (en) 2021-03-30
CN110688499A (zh) 2020-01-14

Similar Documents

Publication Publication Date Title
WO2021027029A1 (zh) 数据处理方法、装置、计算机设备和存储介质
CN110276259B (zh) 唇语识别方法、装置、计算机设备及存储介质
WO2021068321A1 (zh) 基于人机交互的信息推送方法、装置和计算机设备
EP3832519A1 (en) Method and apparatus for evaluating translation quality
US10176804B2 (en) Analyzing textual data
WO2020244153A1 (zh) 会议语音数据处理方法、装置、计算机设备和存储介质
WO2020177230A1 (zh) 基于机器学习的医疗数据分类方法、装置、计算机设备及存储介质
US9558741B2 (en) Systems and methods for speech recognition
WO2021000497A1 (zh) 检索方法、装置、计算机设备和存储介质
WO2020147395A1 (zh) 基于情感的文本分类处理方法、装置和计算机设备
CN113094578B (zh) 基于深度学习的内容推荐方法、装置、设备及存储介质
CN113707125B (zh) 一种多语言语音合成模型的训练方法及装置
JP2017058674A (ja) 音声認識のための装置及び方法、変換パラメータ学習のための装置及び方法、コンピュータプログラム並びに電子機器
CN111833845A (zh) 多语种语音识别模型训练方法、装置、设备及存储介质
US20230089308A1 (en) Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering
US11961515B2 (en) Contrastive Siamese network for semi-supervised speech recognition
CN113254613B (zh) 对话问答方法、装置、设备及存储介质
CN110717021B (zh) 人工智能面试中获取输入文本和相关装置
US11893813B2 (en) Electronic device and control method therefor
CN110047469A (zh) 语音数据情感标注方法、装置、计算机设备及存储介质
CN111126084B (zh) 数据处理方法、装置、电子设备和存储介质
CN115796653A (zh) 一种面试发言评价方法及系统
US20220392439A1 (en) Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching
JP2015175859A (ja) パターン認識装置、パターン認識方法及びパターン認識プログラム
CN111933187B (zh) 情感识别模型的训练方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941414

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941414

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19941414

Country of ref document: EP

Kind code of ref document: A1