WO2021057146A1 - Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage - Google Patents
Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage Download PDFInfo
- Publication number
- WO2021057146A1 WO2021057146A1 PCT/CN2020/098891 CN2020098891W WO2021057146A1 WO 2021057146 A1 WO2021057146 A1 WO 2021057146A1 CN 2020098891 W CN2020098891 W CN 2020098891W WO 2021057146 A1 WO2021057146 A1 WO 2021057146A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- confidence
- confidence level
- duration
- interviewer
- question
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000004044 response Effects 0.000 claims abstract description 32
- 238000011156 evaluation Methods 0.000 claims abstract description 3
- 230000036632 reaction speed Effects 0.000 claims description 111
- 230000002996 emotional effect Effects 0.000 claims description 72
- 239000012634 fragment Substances 0.000 claims description 38
- 230000008569 process Effects 0.000 claims description 19
- 230000007935 neutral effect Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000008451 emotion Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 14
- 230000003993 interaction Effects 0.000 description 8
- 238000010921 in-depth analysis Methods 0.000 description 5
- 230000007115 recruitment Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Definitions
- This application relates to the field of speech recognition technology, and in particular to a method, device, terminal and storage medium for determining interviewers based on speech.
- the first aspect of the present application provides a voice-based interviewer judgment method, the method includes:
- the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
- the second aspect of the present application provides a voice-based interviewer determination device, the device includes:
- the acquisition module is used to acquire the answer voice of the interviewer’s multiple questions
- the slicing module is used to slice the answer voice of each question to obtain multiple voice fragments
- the calculation module is used to calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments;
- the first determining module is configured to determine the emotional stability of the interviewer according to the volume characteristics of each question
- the second determination module is configured to use a pre-built confidence determination model to determine the speaking rate feature, the intermittent duration, and the duration, and determine the interviewer's confidence;
- the third determining module is configured to use a pre-built confidence level determination model to determine the speaking rate feature and the interruption duration, and determine the interviewer's response speed;
- the output module is used to output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
- a third aspect of the present application provides a terminal, the terminal includes a processor, and the processor is configured to implement the following steps when executing computer-readable instructions stored in a memory:
- the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
- a fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
- the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
- the voice-based interviewer determination method, device, terminal, and storage medium described in this application can be applied to fields such as smart government affairs, thereby promoting the construction of smart cities.
- This application obtains the answer speech of each question of the interviewer, slices the answer speech of each question to obtain multiple speech fragments, and extracts the volume characteristics, speaking rate characteristics, duration, and intermittent length of each of the speech fragments , Determine the emotional stability of the interviewer based on the volume characteristics, and then use the pre-built confidence judgment model and reaction speed judgment model to judge the speech rate characteristics, duration, and intermittent time to determine the interviewer’s confidence and
- the reaction speed is to output the interview result of the interviewer according to the emotional stability, reaction speed, and self-confidence.
- This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
- Fig. 1 is a flowchart of a voice-based interviewer judgment method provided by Embodiment 1 of the present application.
- Fig. 2 is a structural diagram of a voice-based interviewer judging device provided in the second embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a terminal provided in Embodiment 3 of the present application.
- Fig. 1 is a flowchart of a voice-based interviewer judgment method provided by Embodiment 1 of the present application.
- the voice-based interviewer determination method can be applied to a terminal.
- the voice-based interview provided by the method of this application can be directly integrated on the terminal.
- the function determined by the user may be run in the terminal in the form of a Software Development Kit (SKD).
- the voice-based interviewer judgment method specifically includes the following steps. According to different needs, the order of the steps in the flowchart can be changed, and some of the steps can be omitted.
- the method before the obtaining voice answers to multiple questions of the interviewer, the method further includes:
- the process of constructing the confidence level judgment model and the reaction speed judgment model includes:
- the first significant feature with a large degree of confidence and the second significant feature with a large degree of response speed are selected from the multiple features, wherein the first significant Features include: speech rate characteristics, duration, and intermittent duration, and the second significant feature includes: speech rate features, intermittent duration;
- a reaction speed determination model is constructed based on the plurality of second salient characteristics, the plurality of reaction speed grades, and the characteristic range corresponding to each of the reaction speed grades.
- the self-confidence, emotional stability, and reaction speed of the sample speech of each question answered by multiple interviewers are labeled, and then the four relevant features and the corresponding labeling results are used as the learning object to establish a learning model , Found that: from the data distribution of each relevant feature in different degrees of confidence/emotional stability/reaction speed, the data distribution of people with different confidence/emotional stability/reaction speed is obvious and regular, thus The interviewer's confidence, emotional stability, and reaction speed can be quantitatively evaluated through four relevant characteristics of the interviewer: volume characteristics, speaking rate characteristics, duration and intermittent duration.
- a feature type with a relatively large degree of discrimination According to the four relevant features and confidence levels of the sample speech, generate the first box plots of each relevant feature at different confidence levels and the second box plots of each relevant feature at different reaction speed levels, and start from the first The box chart identifies several first notable features that have a greater degree of discrimination in different levels of confidence: speaking rate characteristics, duration, and intermittent duration, and the second box chart determines the degree of discrimination between different levels of reaction speed The relatively large second significant features: the characteristics of speech rate, the length of the interruption. Finally, a self-confidence judgment model is constructed based on the three first salient features of speech rate, duration, and intermittent duration. A response speed judgment model is constructed based on the two second salient features of speech rate and intermittent duration.
- the first box diagram is generated from the distribution of the eigenvalues of the first salient feature at different confidence levels
- the second box diagram is generated from the distribution of the eigenvalues of the second salient feature at different reaction speed levels.
- the salient feature corresponding to the salient feature when training the salient feature, it is necessary to determine the salient feature corresponding to the salient feature according to the maximum and minimum values corresponding to the salient feature in the box diagrams of different confidence/reaction speed levels. The range of characteristic values in different levels of confidence/reaction speed. After determining the feature value range corresponding to the salient feature at different confidence levels/reaction speed grades, it is necessary to determine whether the feature value range conforms to the extreme value consistency, for example, one salient feature corresponds to five confidence levels/reactions.
- the feature value range needs to be changed.
- the salient features in the above example correspond to the features in the five confidence/reaction speed grades
- the value range is [a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5]
- S12 Slice the answer speech of each question to obtain multiple speech fragments.
- the interviewer's answer speech for each question is divided into multiple speech fragments.
- the answer voice of each question of the interviewer is divided into 28 voice fragments.
- S13 Calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments.
- the volume feature refers to the size of the interviewer's voice when answering questions.
- the speaking rate feature refers to the speed of the interviewer in answering questions, and the amount of voice content per unit time.
- the duration refers to the length of time that the interviewer continuously speaks when answering questions.
- the intermittent duration refers to the length of time that the interviewer does not speak when answering questions.
- Each voice segment has four related features: volume feature, speaking rate feature, duration, and intermittent duration. After averaging the related features of all voice segments of the same question, you can get the relevant feature of each question.
- Mean Specifically, the volume characteristics of the multiple speech fragments of each question are averaged to obtain the mean value of the volume characteristics of each question; the speech rate characteristics of the multiple speech fragments of each question are averaged to obtain each question The mean value of the speech rate characteristics; average the duration of the multiple speech fragments of each question to obtain the mean duration of each question; average the discontinuous duration of the multiple speech fragments of each question to obtain each The mean duration of the problem. That is, the volume feature, the speech rate feature, the duration and the intermittent duration obtained from the multiple speech segments all refer to the average value.
- the size of the sound can reflect the emotional stability of a person.
- the determining the emotional stability of the interviewer according to the volume characteristics of each question includes:
- volume characteristic amplitude values Correspondences between different volume characteristic amplitude values and emotional stability are preset. Once the interviewer’s volume characteristic amplitude values are determined, the emotional stability of the interviewer can be matched according to the correspondence.
- the maximum volume feature of all questions is max
- the minimum volume feature is min
- the average volume feature of all questions is avg
- the volume feature of each question is ai
- the volume fluctuation range of each question Is
- the average volume fluctuation range is less than 20%
- the interviewer’s emotional stability is the first degree of stability, indicating that the interviewer’s emotional stability is "high”
- the average volume fluctuation range is between 20%-30%
- the emotional stability of the interviewer is determined to be the second degree of stability, indicating that the emotional stability of the interviewer is "medium”.
- the average volume fluctuation is greater than 30%
- the emotional stability of the interviewer is determined to be the third degree of stability , Indicating that the interviewee’s emotional stability is "low”.
- S15 Use a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration, and determine the interviewer's confidence level.
- the use of a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration duration, and determining the confidence level of the interviewer includes:
- the average is rounded up to get the interviewer’s confidence judgment result.
- the confidence levels of the five questions are determined as follows: Question 1-Confidence level A, Question 2-Confidence Level B, Question 3-Confidence Level B, Question 4-Confidence Level B, Question 5-Confidence Level A, sort the confidence levels corresponding to the 5 questions according to the serial number of the question.
- ABBBA finally determines that the center position in ABBBA is B, and the target confidence level is B, as the final judgment result of the confidence of the interviewer in the interview process.
- the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
- the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
- the average is 4.4, and the score is 5 points after rounding up (larger), then the interviewer's confidence level judgment result is Grade A.
- the use of a pre-built confidence determination model to determine the speech rate characteristics, interruption duration, and duration of each question, and determining the confidence level of each question includes:
- the confidence level of the target candidate in the confidence level ranking queue is the confidence level of the problem.
- each confidence level in any feature box diagram determines a feature range (range Is the maximum and minimum of different levels), only when all the characteristics of a certain question (characteristics of speech rate, interruption duration, duration) are determined to be the same level, the confidence level of the question is determined to be this level .
- range I the maximum and minimum of different levels
- the confidence level of the question is determined to be this level .
- the speech rate feature of a speech is 3.4
- the interval length is 1.3
- the duration is 5.6.
- the speech rate feature range of grade B in the speech rate feature box chart is [3.2,4]
- the interval time box chart The interval duration range of the middle level B is [0.8, 1.5], and the interval duration range of the B level in the duration box chart is [5.3, 5.7]. Because of the characteristics of speech rate, interval length and duration, all satisfy the range of level B. Therefore, the confidence level of this question is judged as B level for the first time.
- the first confidence level is A and B
- the second confidence is A and B
- the third confidence is A and B, that is, the first confidence level
- the first confidence level is A, B, and C
- the second confidence is A, B, and C
- the third confidence is A, B, and C. If there are multiple levels of the first confidence level, the second confidence level, and the third confidence level, and the multiple first, second, and third confidence levels are all the same, the candidate confidence level There are multiple levels: level A, level B, and level C.
- the confidence level ranking queue is ABC. Based on the law of large numbers, the confidence level of the target candidate is determined as level B as the confidence level of the problem.
- the method further includes:
- the first confidence level is A, B, and D
- the second confidence is A, B, and E
- the third confidence is A, B, and C. That is, the first, second, and third confidence levels are multiple and the multiple first, second, and third confidence levels are not the same, but the first Confidence, second confidence, and third confidence have the same grades A and B, then the same grades A and B are used as candidate confidence grades, and finally the confidence grade of the problem is determined based on the law of large numbers For the B grade.
- the method further includes:
- the neutral grade refers to the grade when all the grades are not met after traversing.
- the pre-built confidence determination model determines that the confidence level corresponding to the speech rate feature of the question is grade A, and use the pre-built confidence determination model to determine the confidence level corresponding to the intermittent duration of the question Level B, using the pre-built confidence determination model to determine that the duration of the question corresponds to the level of confidence level A, because the question’s speech rate characteristics, interruption duration, and duration are not all of the same confidence level, then It is determined that the confidence level of the question does not belong to the A level and does not belong to the B level, that is, there is no situation where the first, second and third confidence levels are the same at the same time, then the confidence level of the question is determined to be the neutral level.
- the problem of the neutral level is most likely to belong to the most general situation, that is, the C level, so the neutral level can be preset as the C level.
- S16 Use a pre-built confidence determination model to determine the speech rate feature and the interruption duration, and determine the interviewer's response speed.
- the S15 and the S16 are executed in parallel.
- two threads can be started for synchronous execution at the same time.
- One thread is used to determine the speech rate feature, interruption duration, and duration using a pre-built confidence determination model, and the other thread uses To use a pre-built reaction speed judgment model to judge the speech rate characteristics and the length of the interruption. Since the two threads are executed in parallel, it can improve the interviewer's confidence and response speed judgment efficiency, shorten the judgment time, and improve the efficiency of interview screening.
- S17 Output the interview result of the interviewer according to the emotional stability, reaction speed, and confidence.
- the interviewer In the interview process, after the interviewer’s voice analysis of the interviewer’s answer to the question, the interviewer’s emotional stability, reaction speed, and confidence, the interviewer can be selected according to the focus of the interview position to meet the interview requirements.
- the voice-based interviewer judgment method described in this application obtains the answer voice of each question of the interviewer, slices the answer voice of each question, and obtains multiple voice fragments, and extracts each of the voices.
- the volume characteristics, speaking rate characteristics, duration, and intermittent duration of the fragments are used to determine the emotional stability of the interviewer based on the volume characteristics, and then the pre-built confidence determination model and reaction speed determination model are used to determine the speech rate characteristics and duration.
- Intermittent time judgment determine the confidence and reaction speed of the interviewer, and output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
- This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
- this application can be applied in fields such as smart government affairs, so as to promote the development of smart cities.
- Fig. 2 is a structural diagram of a voice-based interviewer judging device provided in the second embodiment of the present application.
- the voice-based interviewer determination device 20 may include multiple functional modules composed of computer-readable instruction segments.
- the computer-readable instructions of each program segment in the voice-based interviewer determination device 20 may be stored in the memory of the terminal and executed by the at least one processor to execute (see FIG. 1 for details). The function of the interviewer's judgment.
- the voice-based interviewer determination device 20 can be divided into multiple functional modules according to the functions it performs.
- the functional modules may include: an acquisition module 201, a construction module 202, a slicing module 203, a calculation module 204, a first determination module 205, a second determination module 206, a third determination module 207, and an output module 208.
- the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
- the obtaining module 201 is used to obtain the answer voices of multiple questions of the interviewer.
- the apparatus before the obtaining the answer voices of the multiple questions of the interviewer, the apparatus further includes:
- the construction module 202 is used to construct a confidence degree judgment model and a reaction speed judgment model.
- the process of constructing the confidence level judgment model and the reaction speed judgment model includes:
- the first significant feature with a large degree of confidence and the second significant feature with a large degree of response speed are selected from the multiple features, wherein the first significant Features include: speech rate characteristics, duration, and intermittent duration, and the second significant feature includes: speech rate features, intermittent duration;
- a reaction speed determination model is constructed based on the plurality of second salient characteristics, the plurality of reaction speed grades, and the characteristic range corresponding to each of the reaction speed grades.
- the self-confidence, emotional stability, and reaction speed of the sample speech of each question answered by multiple interviewers are labeled, and then the four relevant features and the corresponding labeling results are used as the learning object to establish a learning model , Found that: from the data distribution of each relevant feature in different degrees of confidence/emotional stability/reaction speed, the data distribution of people with different confidence/emotional stability/reaction speed is obvious and regular, thus The interviewer's confidence, emotional stability, and reaction speed can be quantitatively evaluated through four relevant characteristics of the interviewer: volume characteristics, speaking rate characteristics, duration and intermittent duration.
- a feature type with a relatively large degree of discrimination According to the four relevant features and confidence levels of the sample speech, generate the first box plots of each relevant feature at different confidence levels and the second box plots of each relevant feature at different reaction speed levels, and start from the first The box chart identifies several first notable features that have a greater degree of discrimination in different levels of confidence: speaking rate characteristics, duration, and intermittent duration, and the second box chart determines the degree of discrimination between different levels of reaction speed The relatively large second significant features: the characteristics of speech rate, the length of the interruption. Finally, a self-confidence judgment model is constructed based on the three first salient features of speech rate, duration, and intermittent duration. A response speed judgment model is constructed based on the two second salient features of speech rate and intermittent duration.
- the first box diagram is generated from the distribution of the eigenvalues of the first salient feature at different confidence levels
- the second box diagram is generated from the distribution of the eigenvalues of the second salient feature at different reaction speed levels.
- the salient feature corresponding to the salient feature when training the salient feature, it is necessary to determine the salient feature corresponding to the salient feature according to the maximum and minimum values corresponding to the salient feature in the box diagrams of different confidence/reaction speed levels. The range of characteristic values in different levels of confidence/reaction speed. After determining the feature value range corresponding to the salient feature at different confidence levels/reaction speed grades, it is necessary to determine whether the feature value range conforms to the extreme value consistency, for example, one salient feature corresponds to five confidence levels/reactions.
- the feature value range needs to be changed.
- the salient features in the above example correspond to the features in the five confidence/reaction speed grades
- the value range is [a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5]
- the slicing module 203 is used to slice the answer speech of each question to obtain multiple speech fragments.
- the interviewer's answer speech for each question is divided into multiple speech fragments.
- the answer voice of each question of the interviewer is divided into 28 voice fragments.
- the calculation module 204 is configured to calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments.
- the volume feature refers to the size of the interviewer's voice when answering questions.
- the speaking rate feature refers to the speed of the interviewer in answering questions, and the amount of voice content per unit time.
- the duration refers to the length of time that the interviewer continuously speaks when answering questions.
- the intermittent duration refers to the length of time that the interviewer does not speak when answering questions.
- Each voice segment has four related features: volume feature, speaking rate feature, duration, and intermittent duration. After averaging the related features of all voice segments of the same question, you can get the relevant feature of each question.
- Mean Specifically, the volume characteristics of the multiple speech fragments of each question are averaged to obtain the mean value of the volume characteristics of each question; the speech rate characteristics of the multiple speech fragments of each question are averaged to obtain each question The mean value of the speech rate characteristics; average the duration of the multiple speech fragments of each question to obtain the mean duration of each question; average the discontinuous duration of the multiple speech fragments of each question to obtain each The mean duration of the problem. That is, the volume feature, the speech rate feature, the duration and the intermittent duration obtained from the multiple speech segments all refer to the average value.
- the first determining module 205 is configured to determine the emotional stability of the interviewer according to the volume characteristics of each question.
- the size of the sound can reflect the emotional stability of a person.
- the first determining module 205 determining the emotional stability of the interviewer according to the volume characteristics of each question includes:
- volume characteristic amplitude values Correspondences between different volume characteristic amplitude values and emotional stability are preset. Once the interviewer’s volume characteristic amplitude values are determined, the emotional stability of the interviewer can be matched according to the correspondence.
- the maximum volume feature of all questions is max
- the minimum volume feature is min
- the average volume feature of all questions is avg
- the volume feature of each question is ai
- the volume fluctuation range of each question Is
- the average volume fluctuation range is less than 20%
- the interviewer’s emotional stability is the first degree of stability, indicating that the interviewer’s emotional stability is "high”
- the average volume fluctuation range is between 20%-30%
- the emotional stability of the interviewer is determined to be the second degree of stability, indicating that the emotional stability of the interviewer is "medium”.
- the average volume fluctuation is greater than 30%
- the emotional stability of the interviewer is determined to be the third degree of stability , Indicating that the interviewee’s emotional stability is "low”.
- the second determining module 206 is configured to use a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration duration, and determine the confidence level of the interviewer.
- the second determining module 206 uses a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration, and determining the interviewer’s confidence level includes:
- the average is rounded up to get the interviewer’s confidence judgment result.
- the confidence levels of the five questions are determined as follows: Question 1-Confidence level A, Question 2-Confidence Level B, Question 3-Confidence Level B, Question 4-Confidence Level B, Question 5-Confidence Level A, sort the confidence levels corresponding to the 5 questions according to the serial number of the question.
- ABBBA finally determines that the center position in ABBBA is B, and the target confidence level is B, as the final judgment result of the confidence of the interviewer in the interview process.
- the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
- the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
- the average is 4.4, and the score is 5 points after rounding up (larger), then the interviewer's confidence level judgment result is Grade A.
- the use of a pre-built confidence determination model to determine the speech rate characteristics, interruption duration, and duration of each question, and determining the confidence level of each question includes:
- the confidence level of the target candidate in the confidence level ranking queue is the confidence level of the problem.
- each confidence level in any feature box diagram determines a feature range (range Is the maximum and minimum of different levels), only when all the characteristics of a certain question (characteristics of speech rate, interruption duration, duration) are determined to be the same level, the confidence level of the question is determined to be this level .
- range I the maximum and minimum of different levels
- the confidence level of the question is determined to be this level .
- the speech rate feature of a speech is 3.4
- the interval length is 1.3
- the duration is 5.6.
- the speech rate feature range of grade B in the speech rate feature box chart is [3.2,4]
- the interval time box chart The interval duration range of the middle level B is [0.8, 1.5], and the interval duration range of the B level in the duration box chart is [5.3, 5.7]. Because of the characteristics of speech rate, interval length and duration, all satisfy the range of level B. Therefore, the confidence level of this question is judged as B level for the first time.
- the first confidence level is A and B
- the second confidence is A and B
- the third confidence is A and B, that is, the first confidence level
- the first confidence level is A, B, and C
- the second confidence is A, B, and C
- the third confidence is A, B, and C. If there are multiple levels of the first confidence level, the second confidence level, and the third confidence level, and the multiple first, second, and third confidence levels are all the same, the candidate confidence level There are multiple levels: level A, level B, and level C.
- the confidence level ranking queue is ABC. Based on the law of large numbers, the confidence level of the target candidate is determined as level B as the confidence level of the problem.
- the device further includes:
- the judgment module is used to judge whether the multiple levels of the first confidence level, the second confidence level, and the third confidence level have the same level;
- the judgment module is also used to determine the same grade as the candidate confidence grade if there are the same grades.
- the first confidence level is A, B, and D
- the second confidence is A, B, and E
- the third confidence is A, B, and C. That is, the first, second, and third confidence levels are multiple and the multiple first, second, and third confidence levels are not the same, but the first Confidence, second confidence, and third confidence have the same grades A and B, then the same grades A and B are used as candidate confidence grades, and finally the confidence grade of the problem is determined based on the law of large numbers For the B grade.
- the third determining module 207 is further configured to determine The confidence level of the problem is neutral.
- the neutral grade refers to the grade when all the grades are not met after traversing.
- the pre-built confidence determination model determines that the confidence level corresponding to the speech rate feature of the question is grade A, and use the pre-built confidence determination model to determine the confidence level corresponding to the intermittent duration of the question Level B, using the pre-built confidence determination model to determine that the duration of the question corresponds to the level of confidence level A, because the question’s speech rate characteristics, interruption duration, and duration are not all of the same confidence level, then It is determined that the confidence level of the question does not belong to the A level and does not belong to the B level, that is, there is no situation where the first, second and third confidence levels are the same at the same time, then the confidence level of the question is determined to be the neutral level.
- the problem of the neutral level is most likely to belong to the most general situation, that is, the C level, so the neutral level can be preset as the C level.
- the third determining module 207 is further configured to use a pre-built confidence level determination model to determine the speech rate characteristics and interruption duration, and determine the interviewer's response speed.
- the second determining module 206 and the third determining module 207 are executed in parallel.
- two threads can be started for synchronous execution at the same time.
- One thread is used to determine the speech rate feature, interruption duration, and duration using a pre-built confidence determination model, and the other thread uses To use a pre-built reaction speed judgment model to judge the speech rate characteristics and the length of the interruption. Since the two threads are executed in parallel, it can improve the interviewer's confidence and response speed judgment efficiency, shorten the judgment time, and improve the efficiency of interview screening.
- the output module 208 is configured to output the interview result of the interviewer according to the emotional stability, reaction speed, and confidence.
- the interviewer In the interview process, after the interviewer’s voice analysis of the interviewer’s answer to the question, the interviewer’s emotional stability, reaction speed, and confidence, the interviewer can be selected according to the focus of the interview position to meet the interview requirements.
- the voice-based interviewer judgment device described in this application obtains the answer voice of each question of the interviewer, slices the answer voice of each question, and obtains multiple voice fragments, and extracts each of the voices.
- the volume characteristics, speaking rate characteristics, duration, and intermittent duration of the fragments are used to determine the emotional stability of the interviewer based on the volume characteristics, and then the pre-built confidence determination model and reaction speed determination model are used to determine the speech rate characteristics and duration.
- Intermittent time judgment determine the confidence and reaction speed of the interviewer, and output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
- This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
- this application can be applied in fields such as smart government affairs, so as to promote the development of smart cities.
- the terminal 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
- the structure of the terminal shown in FIG. 3 does not constitute a limitation of the embodiment of the present application. It may be a bus-type structure or a star structure. The terminal 3 may also include more More or less other hardware or software, or different component arrangements.
- the terminal 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes but is not limited to a microprocessor, an application specific integrated circuit, and Programming gate arrays, digital processors and embedded devices, etc.
- the terminal 3 may also include client equipment.
- the client equipment includes, but is not limited to, any electronic product that can interact with the client through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer. Computers, tablets, smart phones, digital cameras, etc.
- terminal 3 is only an example. If other existing or future electronic products can be adapted to this application, they should also be included in the protection scope of this application and included here by reference.
- the memory 31 is used to store computer-readable instructions and various data, such as a device installed in the terminal 3, and realize high-speed and automatic completion of programs or data during the operation of the terminal 3 Access.
- the memory 31 includes volatile and non-volatile memory, for example, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), and programmable read-only memory (Programmable Read-Only Memory).
- PROM Erasable Programmable Read-Only Memory
- EPROM Erasable Programmable Read-Only Memory
- OTPROM One-time Programmable Read-Only Memory
- EEPROM electronically erasable rewritable Read-only memory
- CD-ROM Compact Disc Read-Only Memory
- CD-ROM Compact Disc Read-Only Memory
- the computer-readable storage medium may be non-volatile or volatile.
- the at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one Or a combination of multiple central processing units (CPU), microprocessors, digital processing chips, graphics processors, and various control chips.
- the at least one processor 32 is the control core (Control Unit) of the terminal 3.
- Various interfaces and lines are used to connect the various components of the entire terminal 3, and by running or executing programs or modules stored in the memory 31, And call the data stored in the memory 31 to execute various functions of the terminal 3 and process data.
- the at least one communication bus 33 is configured to implement connection and communication between the memory 31 and the at least one processor 32 and the like.
- the terminal 3 may also include a power source (such as a battery) for supplying power to various components.
- the power source may be logically connected to the at least one processor 32 through a power management device, so as to realize management through the power management device. Functions such as charging, discharging, and power management.
- the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
- the terminal 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
- the above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium.
- the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) or a processor execute the method described in each embodiment of the present application. section.
- the at least one processor 32 can execute the operating device of the terminal 3 and various installed applications, computer-readable instructions, etc., such as the above-mentioned modules.
- the memory 31 stores computer-readable instructions, and the at least one processor 32 can call the computer-readable instructions stored in the memory 31 to perform related functions.
- the various modules described in FIG. 2 are computer-readable instructions stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the various modules.
- the memory 31 stores multiple instructions, and the multiple instructions are executed by the at least one processor 32 to implement all or part of the steps in the method described in the present application.
- the disclosed device and method can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
- modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
L'invention concerne un procédé de détermination d'une personne interviewée se basant sur la voix, un dispositif de détermination d'une personne interviewée se basant sur la voix (20), un terminal (3) et un support de stockage. Le procédé de détermination d'une personne interviewée se basant sur la voix comprend les étapes consistant à : obtenir des voix de réponse d'une pluralité de questions pour une personne interviewée (S11) ; segmenter la voix de réponse de chaque question afin d'obtenir une pluralité de segments vocaux (S12) ; calculer la caractéristique de volume, la caractéristique de vitesse, la durée de continuité et la durée de discontinuité pour chaque question en fonction de la pluralité de segments vocaux (S13) ; déterminer la stabilité d'émotion de la personne interviewée en fonction de la caractéristique de volume pour chaque question (S14) ; déterminer les caractéristiques de vitesse, les durées de discontinuité et les durées de continuité à l'aide d'un modèle de détermination de degré d'assurance préconstruit pour déterminer le degré d'assurance de la personne interviewée (S15) ; déterminer les caractéristiques de vitesse et les durées de discontinuité à l'aide du modèle de détermination de degré d'assurance préconstruit pour déterminer la vitesse de réponse de la personne interviewée (S16) ; et émettre un résultat d'interview de la personne interviewée selon la stabilité d'émotion, la vitesse de réponse et le degré d'assurance (S17). Selon le procédé de détermination d'une personne interviewée se basant sur la voix, la personne interviewée peut être évaluée de manière objective et complète, de telle sorte qu'un résultat d'évaluation est plus détaillé et précis.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910900813.9 | 2019-09-23 | ||
CN201910900813.9A CN110827796B (zh) | 2019-09-23 | 2019-09-23 | 基于语音的面试者判定方法、装置、终端及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021057146A1 true WO2021057146A1 (fr) | 2021-04-01 |
Family
ID=69548146
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/098891 WO2021057146A1 (fr) | 2019-09-23 | 2020-06-29 | Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110827796B (fr) |
WO (1) | WO2021057146A1 (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827796B (zh) * | 2019-09-23 | 2024-05-24 | 平安科技(深圳)有限公司 | 基于语音的面试者判定方法、装置、终端及存储介质 |
CN112786054B (zh) * | 2021-02-25 | 2024-06-11 | 深圳壹账通智能科技有限公司 | 基于语音的智能面试评估方法、装置、设备及存储介质 |
US11824819B2 (en) | 2022-01-26 | 2023-11-21 | International Business Machines Corporation | Assertiveness module for developing mental model |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103634472A (zh) * | 2013-12-06 | 2014-03-12 | 惠州Tcl移动通信有限公司 | 根据通话语音判断用户心情及性格的方法、系统及手机 |
WO2016014321A1 (fr) * | 2014-07-21 | 2016-01-28 | Microsoft Technology Licensing, Llc | Reconnaissance d'émotions en temps réel à partir de signaux audio |
CN106663383A (zh) * | 2014-06-23 | 2017-05-10 | 因特维欧研发股份有限公司 | 分析受试者的方法和系统 |
WO2018093770A2 (fr) * | 2016-11-18 | 2018-05-24 | IPsoft Incorporated | Génération de comportements de communication d'agents anthropomorphiques virtuels sur la base de l'affect d'un utilisateur |
WO2018112134A2 (fr) * | 2016-12-15 | 2018-06-21 | Analytic Measures Inc. | Procédé et système informatiques automatisés permettant de mesurer l'énergie, l'attitude et les compétences interpersonnelles d'un utilisateur |
CN110211591A (zh) * | 2019-06-24 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | 基于情感分类的面试数据分析方法、计算机装置及介质 |
CN110263326A (zh) * | 2019-05-21 | 2019-09-20 | 平安科技(深圳)有限公司 | 一种用户行为预测方法、预测装置、存储介质及终端设备 |
CN110827796A (zh) * | 2019-09-23 | 2020-02-21 | 平安科技(深圳)有限公司 | 基于语音的面试者判定方法、装置、终端及存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818798B (zh) * | 2017-10-20 | 2020-08-18 | 百度在线网络技术(北京)有限公司 | 客服服务质量评价方法、装置、设备及存储介质 |
CN109637520B (zh) * | 2018-10-16 | 2023-08-22 | 平安科技(深圳)有限公司 | 基于语音分析的敏感内容识别方法、装置、终端及介质 |
CN110135692A (zh) * | 2019-04-12 | 2019-08-16 | 平安普惠企业管理有限公司 | 智能评级控制方法、装置、计算机设备及存储介质 |
CN110135800A (zh) * | 2019-04-23 | 2019-08-16 | 南京葡萄诚信息科技有限公司 | 一种人工智能视频面试方法及系统 |
-
2019
- 2019-09-23 CN CN201910900813.9A patent/CN110827796B/zh active Active
-
2020
- 2020-06-29 WO PCT/CN2020/098891 patent/WO2021057146A1/fr active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103634472A (zh) * | 2013-12-06 | 2014-03-12 | 惠州Tcl移动通信有限公司 | 根据通话语音判断用户心情及性格的方法、系统及手机 |
CN106663383A (zh) * | 2014-06-23 | 2017-05-10 | 因特维欧研发股份有限公司 | 分析受试者的方法和系统 |
WO2016014321A1 (fr) * | 2014-07-21 | 2016-01-28 | Microsoft Technology Licensing, Llc | Reconnaissance d'émotions en temps réel à partir de signaux audio |
WO2018093770A2 (fr) * | 2016-11-18 | 2018-05-24 | IPsoft Incorporated | Génération de comportements de communication d'agents anthropomorphiques virtuels sur la base de l'affect d'un utilisateur |
WO2018112134A2 (fr) * | 2016-12-15 | 2018-06-21 | Analytic Measures Inc. | Procédé et système informatiques automatisés permettant de mesurer l'énergie, l'attitude et les compétences interpersonnelles d'un utilisateur |
CN110263326A (zh) * | 2019-05-21 | 2019-09-20 | 平安科技(深圳)有限公司 | 一种用户行为预测方法、预测装置、存储介质及终端设备 |
CN110211591A (zh) * | 2019-06-24 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | 基于情感分类的面试数据分析方法、计算机装置及介质 |
CN110827796A (zh) * | 2019-09-23 | 2020-02-21 | 平安科技(深圳)有限公司 | 基于语音的面试者判定方法、装置、终端及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110827796B (zh) | 2024-05-24 |
CN110827796A (zh) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021057146A1 (fr) | Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage | |
TWI703458B (zh) | 資料處理模型構建方法、裝置、伺服器和用戶端 | |
CN110874716A (zh) | 面试测评方法、装置、电子设备及存储介质 | |
US11170770B2 (en) | Dynamic adjustment of response thresholds in a dialogue system | |
TW201734841A (zh) | 分布式環境下監督學習算法的基準測試方法和裝置 | |
JP2017016566A (ja) | 情報処理装置、情報処理方法及びプログラム | |
WO2023279692A1 (fr) | Procédé et appareil de traitement de données basés sur une plateforme questions-réponses, et dispositif associé | |
WO2022135496A1 (fr) | Procédé et dispositif de traitement de données d'interaction vocale | |
Bujacz et al. | Psychosocial working conditions among high-skilled workers: A latent transition analysis. | |
CN113011159A (zh) | 人工座席监听方法、装置、电子设备及存储介质 | |
CN114663223A (zh) | 基于人工智能的信用风险评估方法、装置及相关设备 | |
CN113256108A (zh) | 人力资源分配方法、装置、电子设备及存储介质 | |
CN113190372A (zh) | 多源数据的故障处理方法、装置、电子设备及存储介质 | |
CN114242109A (zh) | 基于情感识别的智能外呼方法、装置、电子设备及介质 | |
US20180114173A1 (en) | Cognitive service request dispatching | |
WO2020242449A1 (fr) | Détermination d'observations concernant certains sujets lors de réunions | |
US11475068B2 (en) | Automatic question answering method and apparatus, storage medium and server | |
CN113158690A (zh) | 对话机器人的测试方法和装置 | |
CN115422094B (zh) | 算法自动化测试方法、中心调度设备及可读存储介质 | |
CN116523188A (zh) | 一种企业创新能力的评价方法及装置 | |
WO2023272853A1 (fr) | Procédé et appareil d'appel de moteur sql à base d'ia, et dispositif et support | |
CN117151358A (zh) | 工单派发方法、装置、电子设备、存储介质和程序产品 | |
CN114925674A (zh) | 文件合规性检查方法、装置、电子设备及存储介质 | |
WO2020007349A1 (fr) | Méthode de sélection de stratégie d'inactivation intelligente et méthode de sélection de stratégie d'inactivation fondée sur des types d'inactivation multiples | |
CN111522943A (zh) | 逻辑节点的自动化测试方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20869434 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20869434 Country of ref document: EP Kind code of ref document: A1 |