WO2021000408A1 - Procédé et appareil de notation d'entrevues, dispositif et support d'informations - Google Patents

Procédé et appareil de notation d'entrevues, dispositif et support d'informations Download PDF

Info

Publication number
WO2021000408A1
WO2021000408A1 PCT/CN2019/103134 CN2019103134W WO2021000408A1 WO 2021000408 A1 WO2021000408 A1 WO 2021000408A1 CN 2019103134 W CN2019103134 W CN 2019103134W WO 2021000408 A1 WO2021000408 A1 WO 2021000408A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
similarity
target
question text
candidate
Prior art date
Application number
PCT/CN2019/103134
Other languages
English (en)
Chinese (zh)
Inventor
邓悦
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021000408A1 publication Critical patent/WO2021000408A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • This application relates to the field of natural language processing, and in particular to an interview scoring method, device, equipment and storage medium.
  • Natural language processing in the field of artificial intelligence is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between people and smart devices in natural language (that is, spoken language).
  • the smart device In the smart interview scenario, when the smart device receives the user’s voice information, the smart device usually needs to convert the voice information into target text, and score the target text and the preset text to get the interviewer to answer the interview questions The answer is scored to understand the interviewer’s ability level. The ability of the interviewer is closely related to the accuracy of the interview score. However, the current interview scoring accuracy is not ideal. Therefore, how to provide an interview scoring method with high accuracy of interview scoring is one of the technical problems to be solved urgently by those skilled in the art.
  • This application provides an interview scoring method, device, equipment and storage medium, aiming to improve the accuracy of interview scoring.
  • this application provides an interview scoring method, which includes:
  • the candidate answer text corresponding to the candidate question text is obtained according to the set of micro-expression types, and the answer of the second user is calculated based on the candidate answer text and the target answer text Score.
  • this application also provides an interview scoring device, which includes:
  • a text acquisition unit for acquiring interview video information, and acquiring, according to the interview video information, the target question text corresponding to the first user’s question, the second user’s micro-expression type set, and the corresponding target when the second user answers the question Answer text
  • a text determining unit for determining a candidate question text corresponding to the target question text
  • a similarity determination unit configured to determine the similarity relationship between the target question text and the candidate question text according to a preset similarity rule
  • a result output unit configured to input the target question text and the candidate question text into a preset similarity verification model if the similarity relationship is not similar, so as to output the target question text and the candidate question text The similarity of the verification results;
  • the score calculation unit is configured to obtain a candidate answer text corresponding to the candidate question text according to the set of micro-expression types if the similarity check result is passed, and calculate the candidate answer text based on the candidate answer text and the target answer text Describe the answer score of the second user.
  • the present application also provides a computer device, the computer device includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the The computer program implements the interview scoring method as described above.
  • this application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the processor implements the interview scoring method described above .
  • This application discloses an interview scoring method, device, equipment, and storage medium.
  • candidate answer texts can be obtained according to the collection of micro-expression types to ensure the authenticity of the interview level to a certain extent.
  • the simple semantic text similarity is determined by the preset similarity rule measurement, and the complex semantic text similarity is determined by the similarity verification model, which improves the accuracy of interview scoring.
  • FIG. 1 is a schematic flowchart of an interview scoring method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an application scenario of an interview scoring method provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of sub-steps of an interview scoring method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of steps for obtaining target question text and target answer text provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of another interview scoring method provided by an embodiment of the application.
  • FIG. 6 is a schematic flowchart of sub-steps of an interview scoring method provided by an embodiment of the application.
  • FIG. 7 is a schematic block diagram of an interview scoring device provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of subunits of an interview scoring device provided by an embodiment of the application.
  • FIG. 9 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • FIG. 1 is a schematic flowchart of steps of an interview scoring method provided by an embodiment of the present application.
  • the interview scoring method can be applied to the application scenario shown in Figure 2.
  • the first user may interact with the second user through the first terminal and the second terminal.
  • first terminal and the second terminal may be the same terminal device or different terminal devices.
  • the first user may be an interviewer, and the second user may be an interviewer.
  • the first user may also be an interviewer, and the second user may also be an interviewer.
  • the following embodiments will be described in detail with the first user as the interviewer and the second user as the interviewer.
  • the interview scoring method specifically includes: step S101 to step S105.
  • S101 Obtain interview video information, and obtain, according to the interview video information, a target question text corresponding to a first user's question, a set of micro expression types of a second user, and a target answer text corresponding to the second user's answer to the question.
  • the first user can inquire the second user by means of video inquiries on site or remotely.
  • the voice data and the face data of the interviewee during the inquiry process are collected through the recording device or terminal device to generate the corresponding interview video information, and then the interview video information is uploaded through the recording device or terminal device
  • the server associates the interview video information based on the interviewer tag and the interviewer tag.
  • the interviewer tag is used to uniquely indicate the interviewer who participated in the interview
  • the interviewer tag is used to uniquely indicate the interviewer who participated in the interview.
  • the terminal device can be a mobile phone, tablet, laptop, desktop computer, personal digital assistant, and Electronic equipment such as wearable devices.
  • the target question text corresponding to the first user's question, the set of micro-expression types of the second user, and the second user's answer to the question are obtained according to the interview video information
  • the corresponding target answer text includes: sub-steps S1011 to S1013.
  • the target interview video and target interview audio to be identified are obtained.
  • the above-mentioned preset audio format can be set based on actual conditions, which is not specifically limited in this application.
  • S1012 Perform voice text recognition on the target interview audio to obtain a target question text corresponding to the first user asking a question and a target answer text corresponding to the second user answering the question.
  • the target interview audio includes question audio data and answer audio data that have occurred between the first user and the second user.
  • speech features include but are not limited to time-length related features, fundamental frequency related features, energy related features, cepstral coefficients, and Mel frequency cepstral coefficients.
  • the voice text recognition is performed on the target interview audio to obtain the target question text corresponding to the first user asking a question and the target answer text corresponding to the second user answering the question. Including: sub-steps S1012a to S1012e.
  • S1012a Perform frame and window processing on the target voice information to obtain several voice data blocks.
  • the target voice information generally includes target voice information in a period of time, and the period of time includes at least two frames of voice data blocks.
  • the target voice information In order to facilitate the subsequent arithmetic processing of the target voice information, it is necessary to perform framing processing on the target voice information to obtain voice data blocks in units of frames, and a collection of multiple voice data blocks is used as the voice block information.
  • the target voice information is divided into several frames of voice data blocks, and each frame of voice data block includes 30 voice data.
  • the frame length of the frame and window processing is specifically set to 60ms, and the voice information is segmented according to the set frame length of 60ms to obtain the segmented voice information, and then the segmented voice information is processed with a Hamming window to obtain the voice data block.
  • Adding Hamming window processing refers to multiplying the segmented speech information by a window function for the purpose of Fourier expansion.
  • the specific frame length can be set to other values, such as 20ms, 50ms, or other values.
  • S1012b Perform frequency domain changes on each of the voice data blocks to obtain a corresponding amplitude spectrum.
  • FFT Fast Fourier Transform
  • the amplitude is used as the amplitude spectrum, that is, the fast Fourier transform After the amplitude.
  • other parameters after FFT transformation can also be used, such as amplitude plus phase information.
  • S1012c Perform filtering processing on the amplitude spectrum through the Mel filter bank, and perform discrete cosine transform on the filtered amplitude spectrum to obtain Mel frequency cepstrum coefficients.
  • the filtering processing of the amplitude spectrum by the mel filter bank includes: obtaining the maximum frequency corresponding to the target voice information, and calculating the mel frequency corresponding to the maximum frequency by using a mel frequency calculation formula; Calculate the mel distance between the center frequencies of two adjacent triangular filters according to the calculated mel frequency and the number of triangular filters in the mel filter bank; according to the mel distance, complete the calculation of multiple triangular filters Linear distribution; filtering the amplitude spectrum according to multiple triangular filters that complete the linear distribution.
  • the Mel filter bank specifically includes 40 triangular filters linearly distributed in the Mel measurement. After filtering the obtained amplitude spectrum through 40 linearly distributed triangular filters measured by Mel, and then performing discrete cosine transform to obtain Mel frequency cepstrum coefficients.
  • the maximum mel frequency can be calculated using the mel frequency calculation formula, and the maximum mel frequency can be calculated according to the maximum mel frequency and the number of triangular filters (40).
  • f mel is the Mel frequency
  • f is the maximum frequency corresponding to the voice information
  • A is the coefficient, specifically 2595.
  • Z-Score normalization is also called standard deviation normalization.
  • the mean of the processed data is 0 and the standard deviation is 1.
  • Z-Score standardization is to uniformly convert data of different magnitudes into the same magnitude, and uniformly measure it with the calculated Z-Score value to ensure the comparability of data.
  • the spectrum vector is input to the pre-trained speech recognition model, so that the target question text corresponding to the first user's question and the target answer corresponding to the second user's answer to the question can be accurately obtained text.
  • the pre-trained speech recognition model can be obtained by training the initial neural network with a large amount of speech-text sample data.
  • the initial neural network may be Hidden Markov Model (HMM), etc.
  • the spectrum vector corresponding to each frame corresponds to a state
  • the states are combined into phonemes
  • the phonemes are combined into words, so as to obtain the target question text and the first question corresponding to the first user’s question. 2.
  • the corresponding target answer text when the user answers the question.
  • a pre-stored micro-expression recognition model is acquired, and the target interview video is recognized by the micro-expression recognition model to obtain a micro-expression type set.
  • the micro expression recognition model can be selected as a micro expression recognition model based on deep learning, and the micro expression recognition model is obtained through training.
  • the specific training method is as follows: prepare a data set, collect video fragments containing micro-expressions, normalize video images, train/verify/test set segmentation, etc.; design a micro-expression recognition model to be trained based on convolutional neural networks and recurrent neural networks , And use the training set to train the micro-expression recognition model to be trained until the model converges, then use the verification set and test set to verify and test the converged micro-expression recognition model, and solidify the micro-expression recognition model after meeting the requirements.
  • the method of micro-expression recognition on the target interview video can also be specifically as follows: micro-expression recognition is performed on each frame of the target interview video, the micro-expression type of each frame is determined, and the micro-expression of each image frame is collected. Expression type, get a collection of micro expression types. In specific embodiments, there are cases where the micro expression types are the same. For this reason, when the micro expression types of each image frame are collected, if the micro expression types of the image frames are the same, only the micro expression type of one of the image frames is collected. Ensure that the micro expression types in the micro expression type collection are not repeated.
  • the method for determining the micro-expression type of each frame of image is: split the target interview video into several frames of images, and extract the target feature vector of each frame of the several frames at the same time, and obtain the pre-stored micro-expression library , Then calculate the similarity probability between the target feature vector of each image frame and the feature vector of each preset micro-expression in the micro-expression library, and determine the type of micro-expression corresponding to the similarity probability greater than the preset similarity as The micro expression type of this image frame.
  • the aforementioned preset similarity probability can be set based on actual conditions, which is not specifically limited in this application.
  • the question-answer library may be a pre-stored professional question and answer library.
  • each question text can correspond to one answer text or multiple answer texts.
  • an inverted index can be used to select one or more preset question texts that have a high degree of overlap with keywords in the target question text from the question-answer database as candidate question texts.
  • similarity means that the target question text is similar to the candidate question text.
  • dissimilar means that the target question text is not similar to the candidate question text.
  • the preset similarity rules can be set according to actual needs. For example, if the text similarity between the target question text and the candidate question text is greater than a preset similarity threshold, it is determined that the target question text is similar to the candidate question text. If the text similarity between the target question text and the candidate question text is not greater than a preset similarity threshold, it is determined that the target question text is not similar to the candidate question text.
  • the preset similarity verification model can be selected based on the neural network through training.
  • the specific training method is: establish a text training sample set and a similarity verification model to be trained, and use the text training sample set to verify the similarity of the model Iterative training is performed until the similarity verification model converges.
  • the neural network can be a recurrent neural network or a convolutional neural network.
  • the answer score obtained by the second user when answering the interview question when he is in a tense state during the interview is usually different from the answer score obtained by answering the interview question when he is in a normal state.
  • the candidate answer text corresponding to the candidate question text is obtained according to the set of micro-expression types, and then the candidate answer text corresponding to the target The answer text calculates the answer score of the second user.
  • the question-answer database there are at least two candidate answer texts corresponding to the same candidate question text.
  • the question-answer library includes different preset micro-expression type groups, and a candidate answer text can be uniquely determined according to the preset micro-expression type group and the candidate question text.
  • the preset micro-expression type group can be set according to actual conditions, which is not specifically limited in this application.
  • the preset micro-expression type group stores micro-expression type tags for indicating whether the second user is nervous, such as micro-expression type tags indicating facial muscle twitching, pale complexion, and mouth pause.
  • the preset micro-expression type group includes a preset first micro-expression type group and a preset second micro-expression type group, which are respectively used to indicate that the second user's expression is in a nervous state and a normal state during the interview.
  • the obtaining the candidate answer text corresponding to the candidate question text according to the set of micro-expression types includes: determining whether the set of micro-expression types includes a preset number in the preset first micro-expression type group Or determine whether the set of micro expression types includes a preset number of micro expression types in the preset second micro expression type group; if the set of micro expression types includes the preset first micro expression type group A number of micro-expression types, the candidate answer text corresponding to the first micro-expression type group is obtained from a preset text-answer library; if the micro-expression type set includes the preset number in the second micro-expression type group The candidate answer text corresponding to the second micro-expression type group is obtained from the text-answer library.
  • the preset number can be set according to actual needs, such as more than half, or more than one third, and so on.
  • candidate answer texts obtained from the collection of micro-expression types can be used to ensure the authenticity of the interview level to a certain extent.
  • the simple semantic text similarity is determined by the preset similarity rule measurement, and the complex semantic text similarity is determined by the similarity verification model, which improves the accuracy of interview scoring.
  • FIG. 5 is a schematic flowchart of another interview scoring method provided by an embodiment of the application.
  • the interview scoring method includes steps S201 to S206.
  • voice signals corresponding to the user can be collected through voice collection devices such as recording devices, such as smart bracelets or smart watches, as well as voice recorders, smart phones, tablets, notebooks, or smart wearable devices.
  • voice collection devices such as recording devices, such as smart bracelets or smart watches, as well as voice recorders, smart phones, tablets, notebooks, or smart wearable devices.
  • the first user can inquire the second user by means of video inquiries on site or remotely.
  • the voice data and the face data of the interviewee during the inquiry process are collected through the recording device or terminal device to generate the corresponding interview video information, and then the interview video information is uploaded through the recording device or terminal device
  • the server associates the interview video information based on the interviewer tag and the interviewer tag.
  • the interviewer tag is used to uniquely indicate the interviewer who participated in the interview
  • the interviewer tag is used to uniquely indicate the interviewer who participated in the interview.
  • the terminal device can be a mobile phone, tablet, laptop, desktop computer, personal digital assistant, and Electronic equipment such as wearable devices.
  • the question-answer library may be a pre-stored professional question and answer library.
  • each question text can correspond to one answer text or multiple answer texts.
  • an inverted index can be used to select one or more preset question texts that have a high degree of overlap with keywords in the target question text from the question-answer database as candidate question texts.
  • the determining the candidate question text corresponding to the target question text specifically includes: performing word segmentation processing on the target question text to obtain word segmentation, and extracting keywords of the word segmentation according to a preset keyword database ; According to the keywords, a candidate question text corresponding to the target question text is determined from a preset question-answer library.
  • the preset keyword library may be a pre-stored thesaurus in which different keywords are stored.
  • word segmentation may be performed on the target question text, and then keywords are extracted from the result obtained by the word segmentation.
  • keywords are extracted from the result obtained by the word segmentation.
  • the key word of the target question text "how many patent laws are there in total” can be “patent laws, how many, and articles”.
  • the candidate question text “the number of patent laws” corresponding to the target question text "how many articles in the patent law” can be selected from the question-answer database "What is the role of the patent law” and “how much is the number of words in the patent law”.
  • similarity means that the target question text is similar to the candidate question text.
  • dissimilar means that the target question text is not similar to the candidate question text.
  • the preset similarity rules can be set according to actual needs. For example, if the text similarity between the target question text and the candidate question text is greater than a preset similarity threshold, it is determined that the target question text is similar to the candidate question text. If the text similarity between the target question text and the candidate question text is not greater than a preset similarity threshold, it is determined that the target question text is not similar to the candidate question text.
  • step S203 includes: substeps S2031 to S2033.
  • the text similarity between the two can be calculated, specifically based on the similarity calculation formula, the text similarity between the target question text and the candidate question text is calculated, thereby obtaining The similar relationship between the two.
  • the calculating the text similarity between the target question text and the candidate question text based on the similarity calculation formula includes: comparing the candidate question text and the target question text according to a word embedding model Perform vector transformation to obtain the first semantic vector corresponding to the candidate question text and the second semantic vector corresponding to the target question text; based on the similarity calculation formula, according to the first semantic vector and the second semantic vector Calculate the text similarity between the candidate question text and the target question text.
  • the candidate question text and the target question text can be converted into their corresponding semantic vectors according to the word embedding model, that is, the candidate question text is converted into the first semantic vector corresponding to the candidate question text, and the target question text is converted into the target question text The corresponding second semantic vector.
  • the word embedding model can be obtained by using the word2vec tool and training with a sample training set.
  • the word2vec tool is a method of vectorizing words using deep learning methods.
  • the sample training set can include text and semantic vectors.
  • the word embedding model can also be trained using other tools.
  • sim ⁇ A,B> is the text similarity
  • A is the first semantic vector corresponding to the candidate question text
  • B is the second semantic vector corresponding to the target question text
  • n is the dimension of the first semantic vector and the second semantic vector number.
  • the text similarity between the target question text and the candidate question text can be calculated according to the first semantic vector corresponding to the candidate question text and the second semantic vector corresponding to the target question text .
  • the candidate question text is "Is there any difference between the 2009 version of the Patent Law Implementation Regulations and the 2010 version”
  • the corresponding first semantic vector is [1,1,2,1,1,1,1,0 ]
  • the target question text is "Is there any difference between the 2009 version of the Patent Law Implementation Regulations and the 2010 version”
  • the corresponding second semantic vector is [1,1,2,1,1,1,1,0,1]
  • the preset similarity threshold can be set according to actual needs. If the text similarity is greater than the preset similarity threshold, it means that the target question text is similar to the candidate question text, and the similarity relationship is determined to be similar at this time. Exemplarily, if 98% of the text similarity is greater than 90%, it is determined that the similarity relationship between the target question text and the candidate question text is similar.
  • the text similarity is not greater than the preset similarity threshold, it means that the target question text is not similar to the candidate question text, and the similarity relationship is determined to be similar at this time.
  • the text similarity of 60% is less than 90%, it is determined that the similarity relationship between the target question text and the candidate question text is not similar.
  • step S204 is executed. If the similarity relationship is similar, step S205 is executed.
  • the preset similarity verification model can be selected based on the neural network through training.
  • the specific training method is: establish a text training sample set and a similarity verification model to be trained, and use the text training sample set to verify the similarity of the model Iterative training is performed until the similarity verification model converges.
  • the neural network can be a recurrent neural network or a convolutional neural network.
  • the similarity check model includes an input layer, an encoding layer, a mapping layer, and an output layer.
  • the input layer includes a first input sublayer and a second input sublayer, and the target question text and the candidate question text are separated from the parallel. The input of the first input sublayer and the second input sublayer ensures that the similarity verification result is not affected by the input sequence of the target question text and the candidate question text, and improves the accuracy of interview scoring.
  • step S204 includes : If the similarity relationship indicates that the target question text is not similar to the candidate question text, input the target question text into the first input sublayer, and input the candidate question text into the second input Sublayer; respectively input the output of the first input sublayer and the output of the second input sublayer into the coding layer, the mapping layer and the output layer of the similarity check model to output the target question text and the output layer Describe the similarity check result of the candidate question text.
  • the similarity verification result specifically includes: verification passed and verification failed.
  • a verification pass indicates that the target question text is similar to the candidate question text
  • a verification failure indicates that the target question text is not similar to the candidate question text.
  • the similarity check result can be but not limited to output in digital form. For example, input the target question text and the candidate question text into a neural network model, and if “1” is output, it means that the target question text and the candidate question text have passed the verification, and the target question text and the candidate question text The question text is similar; if “0” is output, it means that the target question text and the candidate question text are not checked, and it means that the target question text is not similar to the candidate question text.
  • step S205 If the similarity check result is verified, step S205 is executed. If the similarity check result fails to pass, step S206 is executed.
  • the answer score obtained by the second user when answering the interview question when he is in a tense state during the interview is usually different from the answer score obtained by answering the interview question when he is in a normal state.
  • the candidate answer text corresponding to the candidate question text is obtained according to the set of micro-expression types, and then the candidate answer text corresponding to the target The answer text calculates the answer score of the second user.
  • S206 Generate prompt information to prompt the first user that there is no candidate question text similar to the target question text in the preset question-answer database.
  • a prompt message is generated to remind the first user that there is no candidate question text similar to the target question text in the question-answer database, and the first user needs to ask the question again.
  • candidate answer texts obtained from the collection of micro-expression types can be used to ensure the authenticity of the interview level to a certain extent.
  • the simple semantic text similarity is determined by the preset similarity rule measurement, and the complex semantic text similarity is determined by the similarity verification model, which improves the accuracy of interview scoring.
  • FIG. 7 is a schematic block diagram of an interview scoring device provided in an embodiment of the present application, and the interview scoring device is used to implement any one of the aforementioned interview scoring methods.
  • the interview scoring device can be configured in a server or a terminal.
  • the server can be an independent server or a server cluster.
  • the terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device.
  • the interview scoring device 300 includes: a text acquisition unit 301, a text determination unit 302, a similarity determination unit 303, a result output unit 304, and a score calculation unit 305.
  • the text obtaining unit 301 is configured to obtain interview video information, and obtain, according to the interview video information, the target question text corresponding to the first user’s question, the second user’s micro-expression type set and the corresponding question when the second user answers the question The target answer text.
  • the text determining unit 302 is configured to determine the candidate question text corresponding to the target question text.
  • the similarity determination unit 303 is configured to determine the similarity relationship between the target question text and the candidate question text according to a preset similarity rule.
  • the result output unit 304 is configured to input the target question text and the candidate question text to a preset similarity verification model if the similarity relationship is not similar, so as to output the target question text and the candidate question text Similarity check result of text.
  • the score calculation unit 305 is configured to obtain a candidate answer text corresponding to the candidate question text according to the set of micro-expression types if the similarity check result is passed, and calculate based on the candidate answer text and the target answer text The answer score of the second user.
  • the text obtaining unit 301 is configured to: obtain interview video information, perform audio and video separation on the interview video information to obtain the target interview video and target interview audio to be recognized; perform voice text on the target interview audio Recognition, the target question text corresponding to the first user asking questions and the target answer text corresponding to the second user answering the questions are obtained; micro-expression recognition is performed on the target interview video to obtain the micro-expression type set of the second user.
  • the similarity determination unit 303 includes a similarity calculation subunit 3031, a similarity determination subunit 3032 and a dissimilarity determination subunit 3033.
  • the similarity calculation subunit 3031 is configured to calculate the text similarity between the target question text and the candidate question text based on the similarity calculation formula.
  • the similarity determination subunit 3032 is configured to determine a similarity relationship between the target question text and the candidate question text if the text similarity is greater than a preset similarity threshold.
  • the dissimilarity determination subunit 3033 is configured to determine a similarity relationship indicating that the target question text is not similar to the candidate question text if the text similarity is not greater than the preset similarity threshold.
  • the similarity calculation subunit 3031 is specifically configured to perform vector transformation on the candidate question text and the target question text according to a word embedding model to obtain the first semantic vector corresponding to the candidate question text A second semantic vector corresponding to the target question text; based on a similarity calculation formula, the text similarity between the candidate question text and the target question text is calculated according to the first semantic vector and the second semantic vector .
  • the score calculation unit 305 is further configured to, if the similarity relationship is similar, obtain the candidate answer text corresponding to the candidate question text according to the micro-expression type set, and according to the candidate answer text and the target The answer text calculates the answer score of the second user.
  • the input layer of the similarity check model includes a first input sublayer and a second input sublayer.
  • the result output unit 304 is specifically configured to input the target question text into the first input sublayer and input the candidate question text into the second input sublayer if the similarity relationship is not similar; respectively; Input the output of the first input sublayer and the output of the second input sublayer into the coding layer, the mapping layer, and the output layer of the similarity check model to output the target question text and the candidate question text Similar to the verification results.
  • the score calculation unit 305 is specifically configured to, if the set of micro-expression types includes a preset number of micro-expression types in the preset first micro-expression type group, obtain all the micro-expression types from a preset text-answer library.
  • Candidate answer text corresponding to the micro expression type group is specifically configured to, if the set of micro-expression types includes a preset number of micro-expression types in the preset first micro-expression type group, obtain all the micro-expression types from a preset text-answer library.
  • interview scoring device and the specific working process of each unit described above can refer to the corresponding process in the foregoing interview scoring method embodiment. I will not repeat them here.
  • the above-mentioned interview scoring device can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 9.
  • FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer equipment can be a server or a terminal.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium can store an operating system and a computer program.
  • the computer program includes program instructions. When the program instructions are executed, the processor can execute an interview scoring method.
  • the processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
  • the internal memory provides an environment for the operation of the computer program in the non-volatile storage medium, and when the computer program is executed by the processor, the processor can execute an interview scoring method.
  • the network interface is used for network communication, such as sending assigned tasks.
  • the network interface is used for network communication, such as sending assigned tasks.
  • FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • the specific computer equipment may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the processor is used to run a computer program stored in the memory to implement the following steps:
  • Acquiring interview video information acquiring, according to the interview video information, the target question text corresponding to the first user’s questioning, the second user’s micro-expression type set, and the target answer text corresponding to the second user’s answering the question; determining the The candidate question text corresponding to the target question text; determine the similarity relationship between the target question text and the candidate question text according to preset similarity rules; if the similarity relationship is not similar, compare the target question text with the candidate question text
  • the question text is input to a preset similarity verification model to output the similarity verification result of the target question text and the candidate question text; if the similarity verification result passes the verification, it is obtained according to the set of micro-expression types
  • the candidate answer text corresponding to the candidate question text calculates the answer score of the second user based on the candidate answer text and the target answer text.
  • the processor implements the acquisition of the target question text corresponding to the first user’s questioning, the set of micro-expression types of the second user, and the second user’s answer to the question according to the interview video information
  • the corresponding target answer text is used to achieve:
  • the processor when the processor is used to determine the similarity relationship between the target question text and the candidate question text corresponding to the target question text according to a preset similarity rule, the processor is configured to implement:
  • the text similarity between the target question text and the candidate question text is calculated; if the text similarity is greater than a preset similarity threshold, it is determined to represent the target question text and the candidate question text Similar similarity relationship; if the text similarity is not greater than the preset similarity threshold, determine a similarity relationship that is used to indicate that the target question text is not similar to the candidate question text.
  • the processor when the processor implements the calculation formula based on the similarity to calculate the text similarity between the target question text and the candidate question text, it is used to implement:
  • vector transformation is performed on the candidate question text and the target question text to obtain a first semantic vector corresponding to the candidate question text and a second semantic vector corresponding to the target question text; based on similarity
  • a calculation formula is to calculate the text similarity between the candidate question text and the target question text according to the first semantic vector and the second semantic vector.
  • the processor is further configured to implement the following after determining the similarity relationship between the target question text and the candidate question text according to a preset similarity rule:
  • the candidate answer text corresponding to the candidate question text is obtained according to the set of micro-expression types, and the answer score of the second user is calculated according to the candidate answer text and the target answer text.
  • the input layer of the similarity check model includes a first input sublayer and a second input sublayer.
  • the processor when the processor implements the acquisition of the candidate answer text corresponding to the candidate question text according to the set of micro-expression types, it is configured to implement:
  • the set of micro expression types includes a preset number of micro expression types in the preset first micro expression type group, obtain the candidate answer text corresponding to the first micro expression type group from a preset text-answer library; if The set of micro expression types includes a preset number of micro expression types in a preset second micro expression type group, and the candidate answer text corresponding to the second micro expression type group is obtained from the text-answer library.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to implement the present application Any one of the interview scoring methods provided in the embodiment.
  • the computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, such as the hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital (Secure Digital, SD) equipped on the computer device. ) Card, Flash Card, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

La présente invention concerne un procédé et un appareil de notation d'entrevues, un dispositif et un support d'informations. Le procédé comprend : l'obtention d'informations vidéo d'entrevue et l'obtention d'un texte de question cible, d'un ensemble de types de micro-expressions d'un second utilisateur et d'un texte de réponse cible; la détermination d'un texte de question candidat correspondant au texte de question cible; la détermination d'une relation de similarité entre le texte de question cible et le texte de question candidat; si la relation de similarité est une non-similarité, la fourniture d'un résultat de vérification de similarité selon un modèle de vérification de similarité prédéfini; et si la vérification est réussie, l'obtention d'un texte de réponse candidat et le calcul d'une note de réponse du second utilisateur.
PCT/CN2019/103134 2019-07-04 2019-08-28 Procédé et appareil de notation d'entrevues, dispositif et support d'informations WO2021000408A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910600403.2 2019-07-04
CN201910600403.2A CN110457432B (zh) 2019-07-04 2019-07-04 面试评分方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021000408A1 true WO2021000408A1 (fr) 2021-01-07

Family

ID=68482236

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103134 WO2021000408A1 (fr) 2019-07-04 2019-08-28 Procédé et appareil de notation d'entrevues, dispositif et support d'informations

Country Status (2)

Country Link
CN (1) CN110457432B (fr)
WO (1) WO2021000408A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806516A (zh) * 2021-09-22 2021-12-17 湖北天天数链技术有限公司 匹配度确定方法、装置、电子设备及计算机可读存储介质
CN114897183A (zh) * 2022-05-16 2022-08-12 北京百度网讯科技有限公司 问题数据处理方法、深度学习模型的训练方法和装置
CN115829533A (zh) * 2023-02-15 2023-03-21 成都萌想科技有限责任公司 一种智能化线上面试方法、系统、设备及存储介质
CN117252260A (zh) * 2023-09-06 2023-12-19 山东心法科技有限公司 一种基于大语言模型的面试技能训练方法、设备及介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105138A (zh) * 2019-11-20 2020-05-05 北京鳄梨科技有限公司 一种基于完成任务数据的人力资源分析评测系统
CN112836027A (zh) * 2019-11-25 2021-05-25 京东方科技集团股份有限公司 用于确定文本相似度的方法、问答方法及问答系统
CN111177336B (zh) * 2019-11-30 2023-11-10 西安华为技术有限公司 一种确定应答信息的方法和装置
CN111428012B (zh) * 2020-03-02 2023-05-26 平安科技(深圳)有限公司 基于注意力机制的智能问答方法、装置、设备和存储介质
CN112052320B (zh) * 2020-09-01 2023-09-29 腾讯科技(深圳)有限公司 一种信息处理方法、装置及计算机可读存储介质
CN112466308B (zh) * 2020-11-25 2024-09-06 北京明略软件系统有限公司 一种基于语音识别的辅助面试方法及系统
CN112528797B (zh) * 2020-12-02 2023-11-03 杭州海康威视数字技术股份有限公司 一种问题推荐方法、装置及电子设备
CN113241076A (zh) * 2021-05-12 2021-08-10 北京字跳网络技术有限公司 语音处理方法、装置和电子设备
CN113780993A (zh) * 2021-09-09 2021-12-10 平安科技(深圳)有限公司 数据处理方法、装置、设备及可读存储介质
CN114400005A (zh) * 2022-01-18 2022-04-26 平安科技(深圳)有限公司 语音消息生成方法和装置、计算机设备、存储介质
CN117708391B (zh) * 2024-02-05 2024-05-17 天开林源(天津)科技有限责任公司 一种数据处理方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243484A (zh) * 2015-10-02 2016-01-13 河南中欧企业咨询有限公司 基于数据处理的面试系统
WO2017070496A1 (fr) * 2015-10-21 2017-04-27 Duolingo, Inc. Personnalisation automatique d'examens
CN107705090A (zh) * 2017-09-27 2018-02-16 重庆市智汇人才开发有限公司 人才招聘系统及方法
CN109766917A (zh) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 面试视频数据处理方法、装置、计算机设备和存储介质
CN109905381A (zh) * 2019-02-15 2019-06-18 北京大米科技有限公司 自助面试方法、相关装置和存储介质
CN109961052A (zh) * 2019-03-29 2019-07-02 上海大易云计算股份有限公司 一种基于表情分析技术的视频面试方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10796217B2 (en) * 2016-11-30 2020-10-06 Microsoft Technology Licensing, Llc Systems and methods for performing automated interviews
CN108536708A (zh) * 2017-03-03 2018-09-14 腾讯科技(深圳)有限公司 一种自动问答处理方法及自动问答系统
CN108121800B (zh) * 2017-12-21 2021-12-21 北京百度网讯科技有限公司 基于人工智能的信息生成方法和装置
CN109472206B (zh) * 2018-10-11 2023-07-07 平安科技(深圳)有限公司 基于微表情的风险评估方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243484A (zh) * 2015-10-02 2016-01-13 河南中欧企业咨询有限公司 基于数据处理的面试系统
WO2017070496A1 (fr) * 2015-10-21 2017-04-27 Duolingo, Inc. Personnalisation automatique d'examens
CN107705090A (zh) * 2017-09-27 2018-02-16 重庆市智汇人才开发有限公司 人才招聘系统及方法
CN109766917A (zh) * 2018-12-18 2019-05-17 深圳壹账通智能科技有限公司 面试视频数据处理方法、装置、计算机设备和存储介质
CN109905381A (zh) * 2019-02-15 2019-06-18 北京大米科技有限公司 自助面试方法、相关装置和存储介质
CN109961052A (zh) * 2019-03-29 2019-07-02 上海大易云计算股份有限公司 一种基于表情分析技术的视频面试方法及系统

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806516A (zh) * 2021-09-22 2021-12-17 湖北天天数链技术有限公司 匹配度确定方法、装置、电子设备及计算机可读存储介质
CN114897183A (zh) * 2022-05-16 2022-08-12 北京百度网讯科技有限公司 问题数据处理方法、深度学习模型的训练方法和装置
CN114897183B (zh) * 2022-05-16 2023-06-13 北京百度网讯科技有限公司 问题数据处理方法、深度学习模型的训练方法和装置
CN115829533A (zh) * 2023-02-15 2023-03-21 成都萌想科技有限责任公司 一种智能化线上面试方法、系统、设备及存储介质
CN115829533B (zh) * 2023-02-15 2023-04-18 成都萌想科技有限责任公司 一种智能化线上面试方法、系统、设备及存储介质
CN117252260A (zh) * 2023-09-06 2023-12-19 山东心法科技有限公司 一种基于大语言模型的面试技能训练方法、设备及介质
CN117252260B (zh) * 2023-09-06 2024-06-11 山东心法科技有限公司 一种基于大语言模型的面试技能训练方法、设备及介质

Also Published As

Publication number Publication date
CN110457432A (zh) 2019-11-15
CN110457432B (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
WO2021000408A1 (fr) Procédé et appareil de notation d'entrevues, dispositif et support d'informations
WO2020173133A1 (fr) Procédé d'apprentissage de modèle de reconnaissance d'émotion, procédé de reconnaissance d'émotion, dispositif, appareil et support de stockage
CN107680582B (zh) 声学模型训练方法、语音识别方法、装置、设备及介质
CN109087670B (zh) 情绪分析方法、系统、服务器及存储介质
CN111694940B (zh) 一种用户报告的生成方法及终端设备
CN112233698B (zh) 人物情绪识别方法、装置、终端设备及存储介质
WO2022252636A1 (fr) Procédé et appareil de génération de réponse reposant sur l'intelligence artificielle, dispositif et support de stockage
WO2021218028A1 (fr) Procédé, appareil et dispositif d'affinage de contenu d'entretien basé sur l'intelligence artificielle, et support
WO2022141868A1 (fr) Procédé et appareil permettant d'extraire des caractéristiques de parole, terminal et support de stockage
CN109947971B (zh) 图像检索方法、装置、电子设备及存储介质
CN109299227B (zh) 基于语音识别的信息查询方法和装置
WO2024055752A1 (fr) Procédé d'apprentissage de modèle de synthèse vocale, procédé de synthèse vocale et appareils associés
Ismail et al. Development of a regional voice dataset and speaker classification based on machine learning
CN117493830A (zh) 训练数据质量的评估、评估模型的生成方法、装置及设备
CN111126084B (zh) 数据处理方法、装置、电子设备和存储介质
Sharma et al. Comparative analysis of various feature extraction techniques for classification of speech disfluencies
WO2024093578A1 (fr) Procédé et appareil de reconnaissance vocale, et dispositif électronique, support de stockage et produit programme d'ordinateur
Yue English spoken stress recognition based on natural language processing and endpoint detection algorithm
Joy et al. Deep scattering power spectrum features for robust speech recognition
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Joshi et al. A novel deep learning based Nepali speech recognition
CN111401069A (zh) 会话文本的意图识别方法、意图识别装置及终端
CN113053409B (zh) 音频测评方法及装置
Dielen Improving the Automatic Speech Recognition Model Whisper with Voice Activity Detection
CN113782005A (zh) 语音识别方法及装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936259

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19936259

Country of ref document: EP

Kind code of ref document: A1