WO2021057146A1 - Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage - Google Patents

Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage Download PDF

Info

Publication number
WO2021057146A1
WO2021057146A1 PCT/CN2020/098891 CN2020098891W WO2021057146A1 WO 2021057146 A1 WO2021057146 A1 WO 2021057146A1 CN 2020098891 W CN2020098891 W CN 2020098891W WO 2021057146 A1 WO2021057146 A1 WO 2021057146A1
Authority
WO
WIPO (PCT)
Prior art keywords
confidence
confidence level
duration
interviewer
question
Prior art date
Application number
PCT/CN2020/098891
Other languages
English (en)
Chinese (zh)
Inventor
黄竹梅
王志鹏
孙汀娟
周雅君
李恒
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021057146A1 publication Critical patent/WO2021057146A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Definitions

  • This application relates to the field of speech recognition technology, and in particular to a method, device, terminal and storage medium for determining interviewers based on speech.
  • the first aspect of the present application provides a voice-based interviewer judgment method, the method includes:
  • the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
  • the second aspect of the present application provides a voice-based interviewer determination device, the device includes:
  • the acquisition module is used to acquire the answer voice of the interviewer’s multiple questions
  • the slicing module is used to slice the answer voice of each question to obtain multiple voice fragments
  • the calculation module is used to calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments;
  • the first determining module is configured to determine the emotional stability of the interviewer according to the volume characteristics of each question
  • the second determination module is configured to use a pre-built confidence determination model to determine the speaking rate feature, the intermittent duration, and the duration, and determine the interviewer's confidence;
  • the third determining module is configured to use a pre-built confidence level determination model to determine the speaking rate feature and the interruption duration, and determine the interviewer's response speed;
  • the output module is used to output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
  • a third aspect of the present application provides a terminal, the terminal includes a processor, and the processor is configured to implement the following steps when executing computer-readable instructions stored in a memory:
  • the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
  • a fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the interview result of the interviewer is output according to the emotional stability, reaction speed and self-confidence.
  • the voice-based interviewer determination method, device, terminal, and storage medium described in this application can be applied to fields such as smart government affairs, thereby promoting the construction of smart cities.
  • This application obtains the answer speech of each question of the interviewer, slices the answer speech of each question to obtain multiple speech fragments, and extracts the volume characteristics, speaking rate characteristics, duration, and intermittent length of each of the speech fragments , Determine the emotional stability of the interviewer based on the volume characteristics, and then use the pre-built confidence judgment model and reaction speed judgment model to judge the speech rate characteristics, duration, and intermittent time to determine the interviewer’s confidence and
  • the reaction speed is to output the interview result of the interviewer according to the emotional stability, reaction speed, and self-confidence.
  • This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
  • Fig. 1 is a flowchart of a voice-based interviewer judgment method provided by Embodiment 1 of the present application.
  • Fig. 2 is a structural diagram of a voice-based interviewer judging device provided in the second embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a terminal provided in Embodiment 3 of the present application.
  • Fig. 1 is a flowchart of a voice-based interviewer judgment method provided by Embodiment 1 of the present application.
  • the voice-based interviewer determination method can be applied to a terminal.
  • the voice-based interview provided by the method of this application can be directly integrated on the terminal.
  • the function determined by the user may be run in the terminal in the form of a Software Development Kit (SKD).
  • the voice-based interviewer judgment method specifically includes the following steps. According to different needs, the order of the steps in the flowchart can be changed, and some of the steps can be omitted.
  • the method before the obtaining voice answers to multiple questions of the interviewer, the method further includes:
  • the process of constructing the confidence level judgment model and the reaction speed judgment model includes:
  • the first significant feature with a large degree of confidence and the second significant feature with a large degree of response speed are selected from the multiple features, wherein the first significant Features include: speech rate characteristics, duration, and intermittent duration, and the second significant feature includes: speech rate features, intermittent duration;
  • a reaction speed determination model is constructed based on the plurality of second salient characteristics, the plurality of reaction speed grades, and the characteristic range corresponding to each of the reaction speed grades.
  • the self-confidence, emotional stability, and reaction speed of the sample speech of each question answered by multiple interviewers are labeled, and then the four relevant features and the corresponding labeling results are used as the learning object to establish a learning model , Found that: from the data distribution of each relevant feature in different degrees of confidence/emotional stability/reaction speed, the data distribution of people with different confidence/emotional stability/reaction speed is obvious and regular, thus The interviewer's confidence, emotional stability, and reaction speed can be quantitatively evaluated through four relevant characteristics of the interviewer: volume characteristics, speaking rate characteristics, duration and intermittent duration.
  • a feature type with a relatively large degree of discrimination According to the four relevant features and confidence levels of the sample speech, generate the first box plots of each relevant feature at different confidence levels and the second box plots of each relevant feature at different reaction speed levels, and start from the first The box chart identifies several first notable features that have a greater degree of discrimination in different levels of confidence: speaking rate characteristics, duration, and intermittent duration, and the second box chart determines the degree of discrimination between different levels of reaction speed The relatively large second significant features: the characteristics of speech rate, the length of the interruption. Finally, a self-confidence judgment model is constructed based on the three first salient features of speech rate, duration, and intermittent duration. A response speed judgment model is constructed based on the two second salient features of speech rate and intermittent duration.
  • the first box diagram is generated from the distribution of the eigenvalues of the first salient feature at different confidence levels
  • the second box diagram is generated from the distribution of the eigenvalues of the second salient feature at different reaction speed levels.
  • the salient feature corresponding to the salient feature when training the salient feature, it is necessary to determine the salient feature corresponding to the salient feature according to the maximum and minimum values corresponding to the salient feature in the box diagrams of different confidence/reaction speed levels. The range of characteristic values in different levels of confidence/reaction speed. After determining the feature value range corresponding to the salient feature at different confidence levels/reaction speed grades, it is necessary to determine whether the feature value range conforms to the extreme value consistency, for example, one salient feature corresponds to five confidence levels/reactions.
  • the feature value range needs to be changed.
  • the salient features in the above example correspond to the features in the five confidence/reaction speed grades
  • the value range is [a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5]
  • S12 Slice the answer speech of each question to obtain multiple speech fragments.
  • the interviewer's answer speech for each question is divided into multiple speech fragments.
  • the answer voice of each question of the interviewer is divided into 28 voice fragments.
  • S13 Calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments.
  • the volume feature refers to the size of the interviewer's voice when answering questions.
  • the speaking rate feature refers to the speed of the interviewer in answering questions, and the amount of voice content per unit time.
  • the duration refers to the length of time that the interviewer continuously speaks when answering questions.
  • the intermittent duration refers to the length of time that the interviewer does not speak when answering questions.
  • Each voice segment has four related features: volume feature, speaking rate feature, duration, and intermittent duration. After averaging the related features of all voice segments of the same question, you can get the relevant feature of each question.
  • Mean Specifically, the volume characteristics of the multiple speech fragments of each question are averaged to obtain the mean value of the volume characteristics of each question; the speech rate characteristics of the multiple speech fragments of each question are averaged to obtain each question The mean value of the speech rate characteristics; average the duration of the multiple speech fragments of each question to obtain the mean duration of each question; average the discontinuous duration of the multiple speech fragments of each question to obtain each The mean duration of the problem. That is, the volume feature, the speech rate feature, the duration and the intermittent duration obtained from the multiple speech segments all refer to the average value.
  • the size of the sound can reflect the emotional stability of a person.
  • the determining the emotional stability of the interviewer according to the volume characteristics of each question includes:
  • volume characteristic amplitude values Correspondences between different volume characteristic amplitude values and emotional stability are preset. Once the interviewer’s volume characteristic amplitude values are determined, the emotional stability of the interviewer can be matched according to the correspondence.
  • the maximum volume feature of all questions is max
  • the minimum volume feature is min
  • the average volume feature of all questions is avg
  • the volume feature of each question is ai
  • the volume fluctuation range of each question Is
  • the average volume fluctuation range is less than 20%
  • the interviewer’s emotional stability is the first degree of stability, indicating that the interviewer’s emotional stability is "high”
  • the average volume fluctuation range is between 20%-30%
  • the emotional stability of the interviewer is determined to be the second degree of stability, indicating that the emotional stability of the interviewer is "medium”.
  • the average volume fluctuation is greater than 30%
  • the emotional stability of the interviewer is determined to be the third degree of stability , Indicating that the interviewee’s emotional stability is "low”.
  • S15 Use a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration, and determine the interviewer's confidence level.
  • the use of a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration duration, and determining the confidence level of the interviewer includes:
  • the average is rounded up to get the interviewer’s confidence judgment result.
  • the confidence levels of the five questions are determined as follows: Question 1-Confidence level A, Question 2-Confidence Level B, Question 3-Confidence Level B, Question 4-Confidence Level B, Question 5-Confidence Level A, sort the confidence levels corresponding to the 5 questions according to the serial number of the question.
  • ABBBA finally determines that the center position in ABBBA is B, and the target confidence level is B, as the final judgment result of the confidence of the interviewer in the interview process.
  • the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
  • the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
  • the average is 4.4, and the score is 5 points after rounding up (larger), then the interviewer's confidence level judgment result is Grade A.
  • the use of a pre-built confidence determination model to determine the speech rate characteristics, interruption duration, and duration of each question, and determining the confidence level of each question includes:
  • the confidence level of the target candidate in the confidence level ranking queue is the confidence level of the problem.
  • each confidence level in any feature box diagram determines a feature range (range Is the maximum and minimum of different levels), only when all the characteristics of a certain question (characteristics of speech rate, interruption duration, duration) are determined to be the same level, the confidence level of the question is determined to be this level .
  • range I the maximum and minimum of different levels
  • the confidence level of the question is determined to be this level .
  • the speech rate feature of a speech is 3.4
  • the interval length is 1.3
  • the duration is 5.6.
  • the speech rate feature range of grade B in the speech rate feature box chart is [3.2,4]
  • the interval time box chart The interval duration range of the middle level B is [0.8, 1.5], and the interval duration range of the B level in the duration box chart is [5.3, 5.7]. Because of the characteristics of speech rate, interval length and duration, all satisfy the range of level B. Therefore, the confidence level of this question is judged as B level for the first time.
  • the first confidence level is A and B
  • the second confidence is A and B
  • the third confidence is A and B, that is, the first confidence level
  • the first confidence level is A, B, and C
  • the second confidence is A, B, and C
  • the third confidence is A, B, and C. If there are multiple levels of the first confidence level, the second confidence level, and the third confidence level, and the multiple first, second, and third confidence levels are all the same, the candidate confidence level There are multiple levels: level A, level B, and level C.
  • the confidence level ranking queue is ABC. Based on the law of large numbers, the confidence level of the target candidate is determined as level B as the confidence level of the problem.
  • the method further includes:
  • the first confidence level is A, B, and D
  • the second confidence is A, B, and E
  • the third confidence is A, B, and C. That is, the first, second, and third confidence levels are multiple and the multiple first, second, and third confidence levels are not the same, but the first Confidence, second confidence, and third confidence have the same grades A and B, then the same grades A and B are used as candidate confidence grades, and finally the confidence grade of the problem is determined based on the law of large numbers For the B grade.
  • the method further includes:
  • the neutral grade refers to the grade when all the grades are not met after traversing.
  • the pre-built confidence determination model determines that the confidence level corresponding to the speech rate feature of the question is grade A, and use the pre-built confidence determination model to determine the confidence level corresponding to the intermittent duration of the question Level B, using the pre-built confidence determination model to determine that the duration of the question corresponds to the level of confidence level A, because the question’s speech rate characteristics, interruption duration, and duration are not all of the same confidence level, then It is determined that the confidence level of the question does not belong to the A level and does not belong to the B level, that is, there is no situation where the first, second and third confidence levels are the same at the same time, then the confidence level of the question is determined to be the neutral level.
  • the problem of the neutral level is most likely to belong to the most general situation, that is, the C level, so the neutral level can be preset as the C level.
  • S16 Use a pre-built confidence determination model to determine the speech rate feature and the interruption duration, and determine the interviewer's response speed.
  • the S15 and the S16 are executed in parallel.
  • two threads can be started for synchronous execution at the same time.
  • One thread is used to determine the speech rate feature, interruption duration, and duration using a pre-built confidence determination model, and the other thread uses To use a pre-built reaction speed judgment model to judge the speech rate characteristics and the length of the interruption. Since the two threads are executed in parallel, it can improve the interviewer's confidence and response speed judgment efficiency, shorten the judgment time, and improve the efficiency of interview screening.
  • S17 Output the interview result of the interviewer according to the emotional stability, reaction speed, and confidence.
  • the interviewer In the interview process, after the interviewer’s voice analysis of the interviewer’s answer to the question, the interviewer’s emotional stability, reaction speed, and confidence, the interviewer can be selected according to the focus of the interview position to meet the interview requirements.
  • the voice-based interviewer judgment method described in this application obtains the answer voice of each question of the interviewer, slices the answer voice of each question, and obtains multiple voice fragments, and extracts each of the voices.
  • the volume characteristics, speaking rate characteristics, duration, and intermittent duration of the fragments are used to determine the emotional stability of the interviewer based on the volume characteristics, and then the pre-built confidence determination model and reaction speed determination model are used to determine the speech rate characteristics and duration.
  • Intermittent time judgment determine the confidence and reaction speed of the interviewer, and output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
  • This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
  • this application can be applied in fields such as smart government affairs, so as to promote the development of smart cities.
  • Fig. 2 is a structural diagram of a voice-based interviewer judging device provided in the second embodiment of the present application.
  • the voice-based interviewer determination device 20 may include multiple functional modules composed of computer-readable instruction segments.
  • the computer-readable instructions of each program segment in the voice-based interviewer determination device 20 may be stored in the memory of the terminal and executed by the at least one processor to execute (see FIG. 1 for details). The function of the interviewer's judgment.
  • the voice-based interviewer determination device 20 can be divided into multiple functional modules according to the functions it performs.
  • the functional modules may include: an acquisition module 201, a construction module 202, a slicing module 203, a calculation module 204, a first determination module 205, a second determination module 206, a third determination module 207, and an output module 208.
  • the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
  • the obtaining module 201 is used to obtain the answer voices of multiple questions of the interviewer.
  • the apparatus before the obtaining the answer voices of the multiple questions of the interviewer, the apparatus further includes:
  • the construction module 202 is used to construct a confidence degree judgment model and a reaction speed judgment model.
  • the process of constructing the confidence level judgment model and the reaction speed judgment model includes:
  • the first significant feature with a large degree of confidence and the second significant feature with a large degree of response speed are selected from the multiple features, wherein the first significant Features include: speech rate characteristics, duration, and intermittent duration, and the second significant feature includes: speech rate features, intermittent duration;
  • a reaction speed determination model is constructed based on the plurality of second salient characteristics, the plurality of reaction speed grades, and the characteristic range corresponding to each of the reaction speed grades.
  • the self-confidence, emotional stability, and reaction speed of the sample speech of each question answered by multiple interviewers are labeled, and then the four relevant features and the corresponding labeling results are used as the learning object to establish a learning model , Found that: from the data distribution of each relevant feature in different degrees of confidence/emotional stability/reaction speed, the data distribution of people with different confidence/emotional stability/reaction speed is obvious and regular, thus The interviewer's confidence, emotional stability, and reaction speed can be quantitatively evaluated through four relevant characteristics of the interviewer: volume characteristics, speaking rate characteristics, duration and intermittent duration.
  • a feature type with a relatively large degree of discrimination According to the four relevant features and confidence levels of the sample speech, generate the first box plots of each relevant feature at different confidence levels and the second box plots of each relevant feature at different reaction speed levels, and start from the first The box chart identifies several first notable features that have a greater degree of discrimination in different levels of confidence: speaking rate characteristics, duration, and intermittent duration, and the second box chart determines the degree of discrimination between different levels of reaction speed The relatively large second significant features: the characteristics of speech rate, the length of the interruption. Finally, a self-confidence judgment model is constructed based on the three first salient features of speech rate, duration, and intermittent duration. A response speed judgment model is constructed based on the two second salient features of speech rate and intermittent duration.
  • the first box diagram is generated from the distribution of the eigenvalues of the first salient feature at different confidence levels
  • the second box diagram is generated from the distribution of the eigenvalues of the second salient feature at different reaction speed levels.
  • the salient feature corresponding to the salient feature when training the salient feature, it is necessary to determine the salient feature corresponding to the salient feature according to the maximum and minimum values corresponding to the salient feature in the box diagrams of different confidence/reaction speed levels. The range of characteristic values in different levels of confidence/reaction speed. After determining the feature value range corresponding to the salient feature at different confidence levels/reaction speed grades, it is necessary to determine whether the feature value range conforms to the extreme value consistency, for example, one salient feature corresponds to five confidence levels/reactions.
  • the feature value range needs to be changed.
  • the salient features in the above example correspond to the features in the five confidence/reaction speed grades
  • the value range is [a1,b1],[a2,b2],[a3,b3],[a4,b4],[a5,b5]
  • the slicing module 203 is used to slice the answer speech of each question to obtain multiple speech fragments.
  • the interviewer's answer speech for each question is divided into multiple speech fragments.
  • the answer voice of each question of the interviewer is divided into 28 voice fragments.
  • the calculation module 204 is configured to calculate the volume characteristic, the speaking rate characteristic, the duration, and the intermittent duration of each question according to the multiple speech fragments.
  • the volume feature refers to the size of the interviewer's voice when answering questions.
  • the speaking rate feature refers to the speed of the interviewer in answering questions, and the amount of voice content per unit time.
  • the duration refers to the length of time that the interviewer continuously speaks when answering questions.
  • the intermittent duration refers to the length of time that the interviewer does not speak when answering questions.
  • Each voice segment has four related features: volume feature, speaking rate feature, duration, and intermittent duration. After averaging the related features of all voice segments of the same question, you can get the relevant feature of each question.
  • Mean Specifically, the volume characteristics of the multiple speech fragments of each question are averaged to obtain the mean value of the volume characteristics of each question; the speech rate characteristics of the multiple speech fragments of each question are averaged to obtain each question The mean value of the speech rate characteristics; average the duration of the multiple speech fragments of each question to obtain the mean duration of each question; average the discontinuous duration of the multiple speech fragments of each question to obtain each The mean duration of the problem. That is, the volume feature, the speech rate feature, the duration and the intermittent duration obtained from the multiple speech segments all refer to the average value.
  • the first determining module 205 is configured to determine the emotional stability of the interviewer according to the volume characteristics of each question.
  • the size of the sound can reflect the emotional stability of a person.
  • the first determining module 205 determining the emotional stability of the interviewer according to the volume characteristics of each question includes:
  • volume characteristic amplitude values Correspondences between different volume characteristic amplitude values and emotional stability are preset. Once the interviewer’s volume characteristic amplitude values are determined, the emotional stability of the interviewer can be matched according to the correspondence.
  • the maximum volume feature of all questions is max
  • the minimum volume feature is min
  • the average volume feature of all questions is avg
  • the volume feature of each question is ai
  • the volume fluctuation range of each question Is
  • the average volume fluctuation range is less than 20%
  • the interviewer’s emotional stability is the first degree of stability, indicating that the interviewer’s emotional stability is "high”
  • the average volume fluctuation range is between 20%-30%
  • the emotional stability of the interviewer is determined to be the second degree of stability, indicating that the emotional stability of the interviewer is "medium”.
  • the average volume fluctuation is greater than 30%
  • the emotional stability of the interviewer is determined to be the third degree of stability , Indicating that the interviewee’s emotional stability is "low”.
  • the second determining module 206 is configured to use a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration duration, and determine the confidence level of the interviewer.
  • the second determining module 206 uses a pre-built confidence level determination model to determine the speech rate feature, interruption duration, and duration, and determining the interviewer’s confidence level includes:
  • the average is rounded up to get the interviewer’s confidence judgment result.
  • the confidence levels of the five questions are determined as follows: Question 1-Confidence level A, Question 2-Confidence Level B, Question 3-Confidence Level B, Question 4-Confidence Level B, Question 5-Confidence Level A, sort the confidence levels corresponding to the 5 questions according to the serial number of the question.
  • ABBBA finally determines that the center position in ABBBA is B, and the target confidence level is B, as the final judgment result of the confidence of the interviewer in the interview process.
  • the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
  • the scores of all questions can be converted into numerical values, and the numerical conversion results are averaged and rounded up (larger) to obtain a personal grade.
  • the average is 4.4, and the score is 5 points after rounding up (larger), then the interviewer's confidence level judgment result is Grade A.
  • the use of a pre-built confidence determination model to determine the speech rate characteristics, interruption duration, and duration of each question, and determining the confidence level of each question includes:
  • the confidence level of the target candidate in the confidence level ranking queue is the confidence level of the problem.
  • each confidence level in any feature box diagram determines a feature range (range Is the maximum and minimum of different levels), only when all the characteristics of a certain question (characteristics of speech rate, interruption duration, duration) are determined to be the same level, the confidence level of the question is determined to be this level .
  • range I the maximum and minimum of different levels
  • the confidence level of the question is determined to be this level .
  • the speech rate feature of a speech is 3.4
  • the interval length is 1.3
  • the duration is 5.6.
  • the speech rate feature range of grade B in the speech rate feature box chart is [3.2,4]
  • the interval time box chart The interval duration range of the middle level B is [0.8, 1.5], and the interval duration range of the B level in the duration box chart is [5.3, 5.7]. Because of the characteristics of speech rate, interval length and duration, all satisfy the range of level B. Therefore, the confidence level of this question is judged as B level for the first time.
  • the first confidence level is A and B
  • the second confidence is A and B
  • the third confidence is A and B, that is, the first confidence level
  • the first confidence level is A, B, and C
  • the second confidence is A, B, and C
  • the third confidence is A, B, and C. If there are multiple levels of the first confidence level, the second confidence level, and the third confidence level, and the multiple first, second, and third confidence levels are all the same, the candidate confidence level There are multiple levels: level A, level B, and level C.
  • the confidence level ranking queue is ABC. Based on the law of large numbers, the confidence level of the target candidate is determined as level B as the confidence level of the problem.
  • the device further includes:
  • the judgment module is used to judge whether the multiple levels of the first confidence level, the second confidence level, and the third confidence level have the same level;
  • the judgment module is also used to determine the same grade as the candidate confidence grade if there are the same grades.
  • the first confidence level is A, B, and D
  • the second confidence is A, B, and E
  • the third confidence is A, B, and C. That is, the first, second, and third confidence levels are multiple and the multiple first, second, and third confidence levels are not the same, but the first Confidence, second confidence, and third confidence have the same grades A and B, then the same grades A and B are used as candidate confidence grades, and finally the confidence grade of the problem is determined based on the law of large numbers For the B grade.
  • the third determining module 207 is further configured to determine The confidence level of the problem is neutral.
  • the neutral grade refers to the grade when all the grades are not met after traversing.
  • the pre-built confidence determination model determines that the confidence level corresponding to the speech rate feature of the question is grade A, and use the pre-built confidence determination model to determine the confidence level corresponding to the intermittent duration of the question Level B, using the pre-built confidence determination model to determine that the duration of the question corresponds to the level of confidence level A, because the question’s speech rate characteristics, interruption duration, and duration are not all of the same confidence level, then It is determined that the confidence level of the question does not belong to the A level and does not belong to the B level, that is, there is no situation where the first, second and third confidence levels are the same at the same time, then the confidence level of the question is determined to be the neutral level.
  • the problem of the neutral level is most likely to belong to the most general situation, that is, the C level, so the neutral level can be preset as the C level.
  • the third determining module 207 is further configured to use a pre-built confidence level determination model to determine the speech rate characteristics and interruption duration, and determine the interviewer's response speed.
  • the second determining module 206 and the third determining module 207 are executed in parallel.
  • two threads can be started for synchronous execution at the same time.
  • One thread is used to determine the speech rate feature, interruption duration, and duration using a pre-built confidence determination model, and the other thread uses To use a pre-built reaction speed judgment model to judge the speech rate characteristics and the length of the interruption. Since the two threads are executed in parallel, it can improve the interviewer's confidence and response speed judgment efficiency, shorten the judgment time, and improve the efficiency of interview screening.
  • the output module 208 is configured to output the interview result of the interviewer according to the emotional stability, reaction speed, and confidence.
  • the interviewer In the interview process, after the interviewer’s voice analysis of the interviewer’s answer to the question, the interviewer’s emotional stability, reaction speed, and confidence, the interviewer can be selected according to the focus of the interview position to meet the interview requirements.
  • the voice-based interviewer judgment device described in this application obtains the answer voice of each question of the interviewer, slices the answer voice of each question, and obtains multiple voice fragments, and extracts each of the voices.
  • the volume characteristics, speaking rate characteristics, duration, and intermittent duration of the fragments are used to determine the emotional stability of the interviewer based on the volume characteristics, and then the pre-built confidence determination model and reaction speed determination model are used to determine the speech rate characteristics and duration.
  • Intermittent time judgment determine the confidence and reaction speed of the interviewer, and output the interview result of the interviewer according to the emotional stability, reaction speed and confidence.
  • This application uses in-depth analysis and mining of the human-computer interaction voice during the interview process to determine multiple characteristics of the interviewer, such as emotional stability, reaction speed, and self-confidence. Through these characteristics, the interviewer can be evaluated objectively and comprehensively. The result is more precise and accurate, which improves the efficiency and quality of the interview judgment.
  • this application can be applied in fields such as smart government affairs, so as to promote the development of smart cities.
  • the terminal 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
  • the structure of the terminal shown in FIG. 3 does not constitute a limitation of the embodiment of the present application. It may be a bus-type structure or a star structure. The terminal 3 may also include more More or less other hardware or software, or different component arrangements.
  • the terminal 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes but is not limited to a microprocessor, an application specific integrated circuit, and Programming gate arrays, digital processors and embedded devices, etc.
  • the terminal 3 may also include client equipment.
  • the client equipment includes, but is not limited to, any electronic product that can interact with the client through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer. Computers, tablets, smart phones, digital cameras, etc.
  • terminal 3 is only an example. If other existing or future electronic products can be adapted to this application, they should also be included in the protection scope of this application and included here by reference.
  • the memory 31 is used to store computer-readable instructions and various data, such as a device installed in the terminal 3, and realize high-speed and automatic completion of programs or data during the operation of the terminal 3 Access.
  • the memory 31 includes volatile and non-volatile memory, for example, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), and programmable read-only memory (Programmable Read-Only Memory).
  • PROM Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM electronically erasable rewritable Read-only memory
  • CD-ROM Compact Disc Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • the computer-readable storage medium may be non-volatile or volatile.
  • the at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one Or a combination of multiple central processing units (CPU), microprocessors, digital processing chips, graphics processors, and various control chips.
  • the at least one processor 32 is the control core (Control Unit) of the terminal 3.
  • Various interfaces and lines are used to connect the various components of the entire terminal 3, and by running or executing programs or modules stored in the memory 31, And call the data stored in the memory 31 to execute various functions of the terminal 3 and process data.
  • the at least one communication bus 33 is configured to implement connection and communication between the memory 31 and the at least one processor 32 and the like.
  • the terminal 3 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 32 through a power management device, so as to realize management through the power management device. Functions such as charging, discharging, and power management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the terminal 3 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) or a processor execute the method described in each embodiment of the present application. section.
  • the at least one processor 32 can execute the operating device of the terminal 3 and various installed applications, computer-readable instructions, etc., such as the above-mentioned modules.
  • the memory 31 stores computer-readable instructions, and the at least one processor 32 can call the computer-readable instructions stored in the memory 31 to perform related functions.
  • the various modules described in FIG. 2 are computer-readable instructions stored in the memory 31 and executed by the at least one processor 32, so as to realize the functions of the various modules.
  • the memory 31 stores multiple instructions, and the multiple instructions are executed by the at least one processor 32 to implement all or part of the steps in the method described in the present application.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé de détermination d'une personne interviewée se basant sur la voix, un dispositif de détermination d'une personne interviewée se basant sur la voix (20), un terminal (3) et un support de stockage. Le procédé de détermination d'une personne interviewée se basant sur la voix comprend les étapes consistant à : obtenir des voix de réponse d'une pluralité de questions pour une personne interviewée (S11) ; segmenter la voix de réponse de chaque question afin d'obtenir une pluralité de segments vocaux (S12) ; calculer la caractéristique de volume, la caractéristique de vitesse, la durée de continuité et la durée de discontinuité pour chaque question en fonction de la pluralité de segments vocaux (S13) ; déterminer la stabilité d'émotion de la personne interviewée en fonction de la caractéristique de volume pour chaque question (S14) ; déterminer les caractéristiques de vitesse, les durées de discontinuité et les durées de continuité à l'aide d'un modèle de détermination de degré d'assurance préconstruit pour déterminer le degré d'assurance de la personne interviewée (S15) ; déterminer les caractéristiques de vitesse et les durées de discontinuité à l'aide du modèle de détermination de degré d'assurance préconstruit pour déterminer la vitesse de réponse de la personne interviewée (S16) ; et émettre un résultat d'interview de la personne interviewée selon la stabilité d'émotion, la vitesse de réponse et le degré d'assurance (S17). Selon le procédé de détermination d'une personne interviewée se basant sur la voix, la personne interviewée peut être évaluée de manière objective et complète, de telle sorte qu'un résultat d'évaluation est plus détaillé et précis.
PCT/CN2020/098891 2019-09-23 2020-06-29 Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage WO2021057146A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910900813.9 2019-09-23
CN201910900813.9A CN110827796B (zh) 2019-09-23 2019-09-23 基于语音的面试者判定方法、装置、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2021057146A1 true WO2021057146A1 (fr) 2021-04-01

Family

ID=69548146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098891 WO2021057146A1 (fr) 2019-09-23 2020-06-29 Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage

Country Status (2)

Country Link
CN (1) CN110827796B (fr)
WO (1) WO2021057146A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827796B (zh) * 2019-09-23 2024-05-24 平安科技(深圳)有限公司 基于语音的面试者判定方法、装置、终端及存储介质
CN112786054B (zh) * 2021-02-25 2024-06-11 深圳壹账通智能科技有限公司 基于语音的智能面试评估方法、装置、设备及存储介质
US11824819B2 (en) 2022-01-26 2023-11-21 International Business Machines Corporation Assertiveness module for developing mental model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634472A (zh) * 2013-12-06 2014-03-12 惠州Tcl移动通信有限公司 根据通话语音判断用户心情及性格的方法、系统及手机
WO2016014321A1 (fr) * 2014-07-21 2016-01-28 Microsoft Technology Licensing, Llc Reconnaissance d'émotions en temps réel à partir de signaux audio
CN106663383A (zh) * 2014-06-23 2017-05-10 因特维欧研发股份有限公司 分析受试者的方法和系统
WO2018093770A2 (fr) * 2016-11-18 2018-05-24 IPsoft Incorporated Génération de comportements de communication d'agents anthropomorphiques virtuels sur la base de l'affect d'un utilisateur
WO2018112134A2 (fr) * 2016-12-15 2018-06-21 Analytic Measures Inc. Procédé et système informatiques automatisés permettant de mesurer l'énergie, l'attitude et les compétences interpersonnelles d'un utilisateur
CN110211591A (zh) * 2019-06-24 2019-09-06 卓尔智联(武汉)研究院有限公司 基于情感分类的面试数据分析方法、计算机装置及介质
CN110263326A (zh) * 2019-05-21 2019-09-20 平安科技(深圳)有限公司 一种用户行为预测方法、预测装置、存储介质及终端设备
CN110827796A (zh) * 2019-09-23 2020-02-21 平安科技(深圳)有限公司 基于语音的面试者判定方法、装置、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818798B (zh) * 2017-10-20 2020-08-18 百度在线网络技术(北京)有限公司 客服服务质量评价方法、装置、设备及存储介质
CN109637520B (zh) * 2018-10-16 2023-08-22 平安科技(深圳)有限公司 基于语音分析的敏感内容识别方法、装置、终端及介质
CN110135692A (zh) * 2019-04-12 2019-08-16 平安普惠企业管理有限公司 智能评级控制方法、装置、计算机设备及存储介质
CN110135800A (zh) * 2019-04-23 2019-08-16 南京葡萄诚信息科技有限公司 一种人工智能视频面试方法及系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634472A (zh) * 2013-12-06 2014-03-12 惠州Tcl移动通信有限公司 根据通话语音判断用户心情及性格的方法、系统及手机
CN106663383A (zh) * 2014-06-23 2017-05-10 因特维欧研发股份有限公司 分析受试者的方法和系统
WO2016014321A1 (fr) * 2014-07-21 2016-01-28 Microsoft Technology Licensing, Llc Reconnaissance d'émotions en temps réel à partir de signaux audio
WO2018093770A2 (fr) * 2016-11-18 2018-05-24 IPsoft Incorporated Génération de comportements de communication d'agents anthropomorphiques virtuels sur la base de l'affect d'un utilisateur
WO2018112134A2 (fr) * 2016-12-15 2018-06-21 Analytic Measures Inc. Procédé et système informatiques automatisés permettant de mesurer l'énergie, l'attitude et les compétences interpersonnelles d'un utilisateur
CN110263326A (zh) * 2019-05-21 2019-09-20 平安科技(深圳)有限公司 一种用户行为预测方法、预测装置、存储介质及终端设备
CN110211591A (zh) * 2019-06-24 2019-09-06 卓尔智联(武汉)研究院有限公司 基于情感分类的面试数据分析方法、计算机装置及介质
CN110827796A (zh) * 2019-09-23 2020-02-21 平安科技(深圳)有限公司 基于语音的面试者判定方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN110827796B (zh) 2024-05-24
CN110827796A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2021057146A1 (fr) Procédé et dispositif de détermination d'une personne interviewée se basant sur la voix, terminal, et support de stockage
TWI703458B (zh) 資料處理模型構建方法、裝置、伺服器和用戶端
CN110874716A (zh) 面试测评方法、装置、电子设备及存储介质
US11170770B2 (en) Dynamic adjustment of response thresholds in a dialogue system
TW201734841A (zh) 分布式環境下監督學習算法的基準測試方法和裝置
JP2017016566A (ja) 情報処理装置、情報処理方法及びプログラム
WO2023279692A1 (fr) Procédé et appareil de traitement de données basés sur une plateforme questions-réponses, et dispositif associé
WO2022135496A1 (fr) Procédé et dispositif de traitement de données d'interaction vocale
Bujacz et al. Psychosocial working conditions among high-skilled workers: A latent transition analysis.
CN113011159A (zh) 人工座席监听方法、装置、电子设备及存储介质
CN114663223A (zh) 基于人工智能的信用风险评估方法、装置及相关设备
CN113256108A (zh) 人力资源分配方法、装置、电子设备及存储介质
CN113190372A (zh) 多源数据的故障处理方法、装置、电子设备及存储介质
CN114242109A (zh) 基于情感识别的智能外呼方法、装置、电子设备及介质
US20180114173A1 (en) Cognitive service request dispatching
WO2020242449A1 (fr) Détermination d'observations concernant certains sujets lors de réunions
US11475068B2 (en) Automatic question answering method and apparatus, storage medium and server
CN113158690A (zh) 对话机器人的测试方法和装置
CN115422094B (zh) 算法自动化测试方法、中心调度设备及可读存储介质
CN116523188A (zh) 一种企业创新能力的评价方法及装置
WO2023272853A1 (fr) Procédé et appareil d'appel de moteur sql à base d'ia, et dispositif et support
CN117151358A (zh) 工单派发方法、装置、电子设备、存储介质和程序产品
CN114925674A (zh) 文件合规性检查方法、装置、电子设备及存储介质
WO2020007349A1 (fr) Méthode de sélection de stratégie d'inactivation intelligente et méthode de sélection de stratégie d'inactivation fondée sur des types d'inactivation multiples
CN111522943A (zh) 逻辑节点的自动化测试方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20869434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20869434

Country of ref document: EP

Kind code of ref document: A1