WO2022179048A1 - 基于语音的智能面试评估方法、装置、设备及存储介质 - Google Patents

基于语音的智能面试评估方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022179048A1
WO2022179048A1 PCT/CN2021/109701 CN2021109701W WO2022179048A1 WO 2022179048 A1 WO2022179048 A1 WO 2022179048A1 CN 2021109701 W CN2021109701 W CN 2021109701W WO 2022179048 A1 WO2022179048 A1 WO 2022179048A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
feature
calibration
voice
detected
Prior art date
Application number
PCT/CN2021/109701
Other languages
English (en)
French (fr)
Inventor
赵沁
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022179048A1 publication Critical patent/WO2022179048A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Definitions

  • the present application relates to the field of intelligent decision-making of artificial intelligence, and in particular, to a voice-based intelligent interview evaluation method, device, equipment and storage medium.
  • the inventor realizes that in the above method, the answer text and voice text of the interviewer are collected, analyzed and evaluated, and other information other than the voice needs to be integrated to obtain the quality and professional characteristics of the interviewer, resulting in It solves the problems of large amount of calculation, many parameters and weak interpretability, which leads to the low efficiency of remote interview evaluation.
  • the present application provides a voice-based intelligent interview evaluation method, device, equipment and storage medium, which are used to improve the efficiency of remote interview evaluation.
  • a first aspect of the present application provides a voice-based intelligent interview evaluation method, including:
  • the detection feature value and the calibration feature value are compared and analyzed to obtain an analysis result of the interviewee's status, and an evaluation report is generated according to the analysis result of the interviewer's status.
  • a second aspect of the present application provides a voice-based intelligent interview assessment device, comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor executing the When the computer readable instructions are described, the following steps are implemented:
  • the detection feature value and the calibration feature value are compared and analyzed to obtain an analysis result of the interviewee's status, and an evaluation report is generated according to the analysis result of the interviewer's status.
  • a third aspect of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is caused to perform the following steps:
  • the detection feature value and the calibration feature value are compared and analyzed to obtain an analysis result of the interviewee's status, and an evaluation report is generated according to the analysis result of the interviewer's status.
  • a fourth aspect of the present application provides a voice-based intelligent interview evaluation device, including:
  • the endpoint detection module is used to obtain the voice signal of the remote interviewee to be processed, perform endpoint detection on the voice signal of the remote interviewee to be processed, obtain valid voice paragraphs, and according to the preset calibration period, the valid voice
  • the paragraphs are divided into speech paragraphs to be calibrated and speech paragraphs to be detected;
  • a feature extraction module configured to extract the voice features of the to-be-calibrated voice paragraphs and the to-be-detected voice paragraphs, respectively, to obtain the calibrated voice features and the detected voice features;
  • a calculation module for calculating the statistical values of the calibration voice feature and the detection voice feature respectively, to obtain the calibration feature value and the detection feature value;
  • the analysis and generation module is configured to compare and analyze the detection feature value and the calibration feature value to obtain an analysis result of the interviewee's status, and generate an evaluation report according to the analysis result of the interviewer's status.
  • the voice signal of the remote interviewer to be processed is obtained, the endpoint detection is performed on the voice signal of the remote interviewer to be processed, and valid speech paragraphs are obtained, and according to the preset calibration period, the valid speech paragraphs are divided into To-be-calibrated speech paragraphs and to-be-detected speech paragraphs; to extract speech features to the to-be-calibrated speech paragraphs and to-be-detected speech paragraphs, respectively, to obtain the calibrated speech features and the detected speech features; to calculate the statistical values of the calibrated speech features and the detected speech features, respectively, Obtain the calibration eigenvalue and the detection eigenvalue; compare and analyze the detection eigenvalue and the calibration eigenvalue, obtain the analysis result of the interviewee's condition, and generate an evaluation report according to the analysis result of the interviewer's condition.
  • the voice signal can be calculated quickly and effectively.
  • the intermediate features of the voice signal of the remote interviewee to be processed, the calculation amount is small, the parameters are small, the robustness is strong, based on statistical signal processing, the interpretability is strong, the physical meaning is clear, and there is no need for too many prior assumptions, flexible use Improves the efficiency of remote interview assessments.
  • FIG. 1 is a schematic diagram of an embodiment of a voice-based intelligent interview evaluation method in the embodiment of the application
  • FIG. 2 is a schematic diagram of another embodiment of the voice-based intelligent interview evaluation method in the embodiment of the application.
  • FIG. 3 is a schematic diagram of an embodiment of a voice-based intelligent interview evaluation device in an embodiment of the application
  • FIG. 4 is a schematic diagram of another embodiment of the voice-based intelligent interview evaluation device in the embodiment of the application.
  • FIG. 5 is a schematic diagram of an embodiment of a voice-based intelligent interview evaluation device in an embodiment of the present application.
  • the embodiments of the present application provide a voice-based intelligent interview evaluation method, device, device, and storage medium, which improve the efficiency of remote interview evaluation.
  • an embodiment of the voice-based intelligent interview evaluation method in the embodiment of the present application includes:
  • the executive body of the present application may be a voice-based intelligent interview evaluation device, and may also be a terminal or a server, which is not specifically limited here.
  • the embodiments of the present application take the server as an execution subject as an example for description.
  • the server can perform noise reduction and enhancement processing on the interviewer's voice signal by receiving the voice signal of the interviewee sent by the microphone or other recording equipment, and obtain the remote interviewee's voice signal to be processed;
  • the server can also extract the voice signal of the remote interviewee after data preprocessing from the preset database, or receive the voice signal of the remote interviewee to be processed sent by the processing terminal.
  • the server invokes a preset voice activity detection (VAD) algorithm to detect the endpoint of the remote interviewer's voice signal to be processed, and divides the remote interviewer's voice signal to be processed according to the endpoint to obtain valid speech paragraphs.
  • VAD voice activity detection
  • the server divides the valid speech paragraphs into the speech paragraphs to be calibrated and the speech paragraphs to be detected according to the preset calibration period. For example, if the preset calibration period is the first 20 seconds of the speech signal, the first 20 seconds (the first M) The valid speech passages are divided into speech passages to be calibrated, and the valid speech passages after the 20th second (M+1th) are divided into speech passages to be detected.
  • the features extracted from the speech features include, but are not limited to, volume features, intonation features, drag features, speech speed features, and fluency features.
  • the server can extract the speech features of the speech segment to be calibrated and the speech segment to be detected through the preset speech feature model, respectively, to obtain the calibrated speech feature and the detected speech feature. It is a model constructed by connecting the network structures corresponding to the features, intonation features, drag features, speed features and fluency features respectively.
  • the voice feature model can be used to extract volume features, intonation features, drag features, and speed features. and fluency characteristics.
  • the server can divide the to-be-calibrated speech segment and the to-be-detected speech segment into frames, and the frames may overlap, and calculate the volume characteristics of the framed segmented speech segment and the to-be-detected speech segment frame by frame, so as to obtain the calibrated volume characteristic and Detect the volume feature, calculate the intonation features of the frame-divided calibrated speech paragraphs and the to-be-detected speech paragraphs frame by frame, and obtain the calibrated intonation features and the detected intonation features; Detect and extract the envelope with the speech segment to be detected, obtain the envelope of the calibration signal and the envelope of the detection signal, calculate the characteristics of the calibration drag and the speed of speech through the envelope of the calibration signal and the envelope of the detection signal respectively, and detecting the feature of dragging sound and the feature of detecting speech rate; the server obtains the feature of calibration fluency and the feature of detection fluency by calculating the pause times of the speech segment to be calibrated and the speech segment to be detected based on the preset time length respectively,
  • the server After the server obtains the scaled voice features and the detected voice features, it calculates the scaled feature vector of the scaled voice feature and the detected feature vector of the detected voice feature, and calculates the maximum value, mean value and standard value of the scaled feature vector through a preset statistical algorithm. Difference and quantile, write the scaled voice feature and scaled eigenvalue into the preset table Excel, get the scaled eigenvalue, and similarly get the detection eigenvalue. Among them, statistical values include but are not limited to the maximum value, mean, standard deviation and quantile.
  • the server can compare and analyze the detection eigenvalues and the calibration eigenvalues through a preset nonlinear model, so as to obtain the analysis results of the interviewee's situation;
  • the characteristic values are compared and analyzed to obtain the analysis results of the interviewer's status.
  • the analysis results of the interviewer's status include but are not limited to the results of emotional orientation, confidence, hesitation, concentration, and personality traits.
  • the comparison analysis strategy includes emotions The division conditions of pointing result, confidence level result, hesitation level result, concentration result and personality trait result, among which, through the comparative analysis of volume detection value and volume calibration value, and intonation detection value and intonation calibration value, emotion can be obtained.
  • the results of confidence level and hesitation level can be obtained.
  • the results of personality traits can be obtained, and the key values of emotional orientation results, self-confidence results, hesitation results, concentration results and personality traits results can be generated, and the preset evaluation results hash table can be matched by key-value pairs to obtain the corresponding
  • the analysis results of the interviewer's status, the evaluation results hash table includes the emotional orientation results, the self-confidence results, the hesitation results, the concentration results and the personality traits results corresponding to the grade scores, grade descriptions and evaluation results, such as: grade scores include: 1-3, 4-5, 6-8 and 9-10, the grade descriptions corresponding to the grade points are very poor, poor, good and excellent, respectively, and the corresponding evaluation results are not hired, not hired, not hired but released Into the repository and hiring.
  • the server matches the corresponding target evaluation report module from the preset evaluation report module according to the analysis result of the interviewer's condition, synthesizes the target evaluation report module, obtains the evaluation report template, and writes the interviewer's condition analysis result into the evaluation report template, Get an evaluation report.
  • the emotional orientation result, the self-confidence level result, the hesitant level result, the concentration result and the personality trait result may all include, but are not limited to, score and/or degree descriptor.
  • the voice signal can be calculated quickly and effectively.
  • the intermediate features of the voice signal of the remote interviewee to be processed, the calculation amount is small, the parameters are small, the robustness is strong, based on statistical signal processing, the interpretability is strong, the physical meaning is clear, and there is no need for too many prior assumptions, flexible use Improves the efficiency of remote interview assessments.
  • FIG. 2 another embodiment of the voice-based intelligent interview evaluation method in the embodiment of the present application includes:
  • 201 Acquire the voice signal of the remote interviewee to be processed, perform endpoint detection on the voice signal of the remote interviewer to be processed, obtain valid voice paragraphs, and divide the valid voice paragraphs into the voice paragraphs to be calibrated and the voice paragraphs to be calibrated according to a preset calibration period. Detect speech passages.
  • the server obtains the initial remote interview voice signal, performs voiceprint recognition and voiceprint feature extraction on the initial remote interview voice signal, and obtains a voiceprint feature set; matches the voiceprint feature set with preset interviewer voiceprint feature information , obtain the matching voiceprint feature, and obtain the target voiceprint feature from the voiceprint feature set according to the matching voiceprint feature; extract the interviewee's voice signal corresponding to the target voiceprint feature from the initial remote interview voice signal; reduce the interviewee's voice signal Noise processing and signal enhancement processing are performed to obtain the remote interviewer's voice signal to be processed.
  • the server obtains the initial remote interview voice signal by receiving the remote interview voice signal sent by the preset terminal or mobile device, and performs voiceprint recognition and voiceprint feature extraction on the initial remote interview voice signal to obtain a voiceprint feature set.
  • the comparison knowledge map of the fingerprint feature set, and the preset reference knowledge map of the interviewer's voiceprint feature information is generated, and random walks are performed on the comparison knowledge map and the reference knowledge map, respectively, to obtain the comparison voiceprint sequence and the reference voiceprint sequence. Compare the similarity between the voiceprint sequence and the reference voiceprint sequence, and judge whether the similarity is greater than the preset threshold.
  • the matching voiceprint features in the voiceprint feature set are deleted, and the target voiceprint feature is obtained.
  • the server extracts the interviewer's voice signal corresponding to the target voiceprint feature in the initial remote interview voice signal, and performs noise reduction and signal enhancement processing on the interviewee's voice signal to improve the remote interview to be processed.
  • the speech signal-to-noise ratio and quality of the user's speech signal are the interviewee’s voiceprint features.
  • the server respectively performs frame-by-frame processing on the speech paragraphs to be calibrated and the speech paragraphs to be detected, and recognizes and extracts the energy features based on the time domain, so as to obtain the calibrated volume feature and the detected volume feature;
  • the detected speech paragraphs are identified and extracted based on the pitch period information and the pitch frequency information, and the calibrated intonation feature and the detected intonation feature are obtained.
  • the preset observation window length the to-be-calibrated speech paragraphs and the to-be-detected speech paragraphs slide and pause in turn.
  • the server can perform frame-by-frame processing of the speech segment to be calibrated and the speech segment to be detected, and there can be overlap between frames.
  • the fast Fourier transformation algorithm fast fourier transformation, FFT
  • each frame of the calibrated speech segment and each frame can be processed.
  • a frame of speech paragraphs to be detected are respectively subjected to fast Fourier transform processing to obtain the processed speech paragraphs to be calibrated and the processed speech paragraphs to be detected.
  • the calibration volume feature and the detection volume feature are obtained;
  • the server calculates the fundamental frequency information (including the pitch period information and the pitch period information of the speech segment to be detected after frame-by-frame processing, respectively, frame by frame through the preset algorithm based on the short-time autocorrelation method and the short-time average amplitude difference). frequency information), through the preset channel model and sound tube model, calculate the formants of the to-be-calibrated speech paragraphs and the to-be-detected speech paragraphs after frame-by-frame processing, respectively, and calculate the The fundamental frequency and the formant are determined as the calibration intonation feature, and the fundamental frequency and the formant of the speech segment to be detected after frame division processing are determined as the detection intonation feature.
  • the server performs envelope extraction, peak-valley calculation, and speech rate calculation in turn for the speech segment to be calibrated and the speech segment to be detected, respectively, to obtain the calibrated drag feature and the calibrated speech rate feature in the calibrated speech feature, and the detected speech feature. Detect drag features and detect speech rate features.
  • the server slides the speech segment to be calibrated and the speech segment to be detected in turn to obtain the sliding calibration speech and the sliding detection speech.
  • Scale the duration and calculate the duration between adjacent two ends in the sliding detection speech, obtain the detection duration, and determine whether the calibration duration is greater than the preset duration.
  • the number of pauses to obtain the calibration fluency feature determine whether the detection duration is greater than the preset duration, if so, it is determined to be paused, if not, it is determined to be non-pause, and the number of pauses is calculated to obtain the detection fluency feature.
  • the server extracts the envelope of the calibration signal of the speech segment to be calibrated, and the envelope of the detection signal of the speech segment to be detected; respectively performs peak and valley calculations on the envelope of the calibration signal and the envelope of the detection signal to obtain the number of calibrated syllables and the length of the scaled syllables, as well as the number of detected syllables and the length of the detected syllables; according to the number of scaled syllables and the length of the scaled syllables to determine the number of drags, the features of the scaled drags are obtained, and the number of drags is determined according to the number of detected syllables and the length of detected syllables , to obtain the detection feature of dragging sound; calculate the duration of the target segment of the speech paragraph to be calibrated, and the duration of the detected segment of the speech segment to be detected; The number and the length of the detected segment are calculated to detect the speech rate feature.
  • the server performs envelope detection on the speech segment to be calibrated and the speech segment to be detected through a preset amplitude demodulation algorithm, and extracts the detected envelope information to obtain the envelope of the calibration signal and the envelope of the detected signal, and calculate Scale the peak in the signal envelope and the two valleys adjacent to the peak to obtain the number of scaled syllables NS1, calculate the duration between the two valleys, and obtain the length of the scaled syllable. Similarly, the number of detected scaled syllables can be obtained.
  • NS2 and detecting the length of the scaled syllables determine whether the number of scaled syllables is less than the preset number and/or the length of the scaled syllables is less than the preset length, if so, then determine that the syllable is a drag sound, and continue to judge other syllables until the detection After finishing the last syllable in the number of syllables, count the number of syllables, and get the characteristic of scaling slurs. If not, then determine that the syllable is non-slung, and continue to judge other syllables until the last syllable in the number of syllables is detected.
  • the detected slur features can be obtained to obtain the length of the calibrated segment T1 and the length of the detected segment T2.
  • the calibrated speech rate can be obtained.
  • step 203 The execution process of step 203 is similar to the execution process of the foregoing step 103, and details are not repeated here.
  • the server calculates the level multiple of the volume calibration value in the calibration feature value and the intonation calibration value in the detection feature value by presetting the first multiple, respectively, to obtain the first volume calibration level value and the first intonation calibration level value, and by presetting the second multiple, calculate the level multiples of the volume calibration value and the intonation calibration value, respectively, to obtain the second volume calibration level value and the second intonation calibration level value; according to the first volume calibration level value Determine multiple volume calibration intervals with the second volume calibration level value to obtain a volume level range value, and determine a plurality of intonation calibration intervals according to the first intonation calibration level value and the second intonation calibration level value to obtain the intonation level range value.
  • the preset first multiple is ⁇
  • the preset second multiple is ⁇
  • the volume scaling value and the intonation scaling value are Q and W respectively
  • the first volume scaling level value ⁇ Q and the second volume scaling level are calculated.
  • value ⁇ Q the first intonation scaled level value ⁇ W and the second intonation scaled level value ⁇ W
  • the volume level range values are [- ⁇ , ⁇ Q], [ ⁇ Q, ⁇ Q] and [ ⁇ Q,+ ⁇ ]
  • the intonation level range The values are [- ⁇ , ⁇ W], [ ⁇ W, ⁇ W] and [ ⁇ W,+ ⁇ ].
  • the server judges and analyzes the volume detection value and the volume level range value, and judges and analyzes the intonation detection value and the intonation level range value to obtain an emotion orientation result, wherein the preset emotion orientation judgment
  • the strategy is as follows: if the volume calibration value Q is in the volume level range value [- ⁇ , ⁇ Q] and the intonation calibration value W is in the intonation level range value [- ⁇ , ⁇ W], it is determined that the emotional orientation is the first level, if The volume calibration value Q is in the volume level range value [ ⁇ Q, ⁇ Q], and the intonation calibration value W is in the intonation level range value [ ⁇ W, ⁇ W], then it is determined that the emotional direction is the second level, if the volume calibration value Q If it is in the volume level range value [ ⁇ Q,+ ⁇ ], and the intonation scale value W is in the intonation level range value [ ⁇ W,+ ⁇ ], it is determined that the emotional orientation is the third level, so as to obtain the emotional orientation result.
  • the drag sound detection value is less than or equal to the preset drag sound range value, and if so, it is determined that the drag sound detection value is small; if not, it is determined that the drag sound detection value is larger, and whether the speech rate detection value is less than or equal to The preset speech rate range value. If yes, the detection value of speech rate is determined to be small. If not, the detection value of speech rate is determined to be larger. If the detection value of drag sound is larger and the detection value of speech rate is smaller, the degree of hesitation is determined. Larger and less confident. If the detection value of dragging sound is small and the detection value of speaking rate is large, it is determined that the degree of hesitation is less and the degree of confidence is greater.
  • the confidence and hesitation score decision tree is used to retrieve the corresponding confidence and hesitation scores, so as to obtain the results of confidence and hesitation.
  • the server in addition to comparing and analyzing the fluency detection value and the fluency calibration value, the server also compares and analyzes the fluency detection value and the preset number of pauses, that is, the server determines whether the fluency calibration value is less than the fluency calibration value. value, and the fluency detection value is less than the preset number of pauses. If yes, it is determined that the concentration is high and the personality trait is enthusiastic and extroverted. If not, it is determined that the concentration is low and the personality trait is stable and introverted.
  • the preset fluency score decision tree is retrieved, and the corresponding concentration score and character trait score are obtained, so as to obtain the concentration result (high concentration or low concentration, and concentration score) and personality Trait Outcomes (Personality Trait Enthusiastic Extroversion or Calm Introverted Trait, and Trait Outcome).
  • the server classifies the result of emotion orientation, confidence level, hesitation level, concentration level and personality trait result, and obtains classification information. Generate visual charts for pointing results, confidence results, hesitation results, concentration results, personality traits results and classification information to obtain the analysis results of the interviewer's status, and write the analysis results of the interviewer's status into the evaluation text template to get the evaluation report .
  • the server generates a visual chart according to the results of emotional orientation, self-confidence, hesitation, concentration, and personality traits, obtains the analysis results of the interviewer's status, and generates an evaluation report according to the analysis results of the interviewer's status. Evaluate the optimization information of the report, and adjust the execution process of the interviewee's status analysis results according to the optimization information.
  • the server After the server gets the evaluation report, it sends the evaluation report to the interviewer's terminal, and through the interviewer's terminal, according to the preset optimization adjustment strategy, analyzes the evaluation report and obtains optimization information, or through the interviewer's terminal in the interviewer's terminal. Input the optimization information based on the evaluation report on the display interface of the terminal. After the interviewer's terminal obtains the optimization information, it sends the priority information to the server, where the optimization information may include but not limited to the evaluation report's score and the optimization opinion.
  • the server after the server receives the optimization information, according to the optimization information, it adjusts the execution process of the interviewer's status analysis result, and adds or deletes the algorithm or model used in the interviewer's status analysis result, adjusts the network structure and uses it.
  • the object is adjusted to realize the continuous optimization of the execution process of the interviewee's status analysis results, and improve the accuracy of the interviewer's status analysis results.
  • the voice signal can be calculated quickly and effectively.
  • the intermediate features of the voice signal of the remote interviewee to be processed, the calculation amount is small, the parameters are small, the robustness is strong, based on statistical signal processing, the interpretability is strong, the physical meaning is clear, and there is no need for too many prior assumptions, flexible use Improves the efficiency of remote interview assessments.
  • the voice-based intelligent interview evaluation method in the embodiment of the present application has been described above, and the voice-based intelligent interview evaluation device in the embodiment of the present application is described below. Please refer to FIG. 3, the voice-based intelligent interview evaluation device in the embodiment of the present application.
  • One embodiment includes:
  • the endpoint detection module 301 is used to obtain the voice signal of the remote interviewee to be processed, perform endpoint detection on the voice signal of the remote interviewer to be processed, obtain valid voice paragraphs, and divide the valid voice paragraphs into undetermined according to a preset calibration period mark the speech paragraph and the speech paragraph to be detected;
  • the feature extraction module 302 is used for extracting the speech features of the speech segment to be calibrated and the speech segment to be detected, respectively, to obtain the calibration speech feature and the detected speech feature;
  • the calculation module 303 is used to calculate the statistical value of the calibration voice feature and the detected voice feature respectively, and obtain the calibration feature value and the detection feature value;
  • the analysis and generation module 304 is configured to compare and analyze the detection feature value and the calibration feature value to obtain an analysis result of the interviewee's status, and generate an evaluation report according to the analysis result of the interviewer's status.
  • each module in the above-mentioned voice-based intelligent interview evaluation device corresponds to each step in the above-mentioned embodiment of the above-mentioned voice-based intelligent interview evaluation method, and the functions and implementation process thereof will not be repeated here.
  • the voice signal can be calculated quickly and effectively.
  • the intermediate features of the voice signal of the remote interviewee to be processed, the calculation amount is small, the parameters are small, the robustness is strong, based on statistical signal processing, the interpretability is strong, the physical meaning is clear, and there is no need for too many prior assumptions, flexible use Improves the efficiency of remote interview assessments.
  • FIG. 4 another embodiment of the voice-based intelligent interview evaluation device in the embodiment of the present application includes:
  • the endpoint detection module 301 is used to obtain the voice signal of the remote interviewee to be processed, perform endpoint detection on the voice signal of the remote interviewer to be processed, obtain valid voice paragraphs, and divide the valid voice paragraphs into undetermined according to a preset calibration period mark the speech paragraph and the speech paragraph to be detected;
  • the feature extraction module 302 is used for extracting the speech features of the speech segment to be calibrated and the speech segment to be detected, respectively, to obtain the calibration speech feature and the detected speech feature;
  • the calculation module 303 is used to calculate the statistical value of the calibration voice feature and the detected voice feature respectively, and obtain the calibration feature value and the detection feature value;
  • the analysis and generation module 304 is used to compare and analyze the detection feature value and the calibration feature value, obtain the analysis result of the interviewee's status, and generate an evaluation report according to the analysis result of the interviewer's status;
  • analysis and generation module 304 specifically includes:
  • the obtaining unit 3041 is used to obtain the volume scale range value based on the volume scale value in the scale feature value, and the intonation scale range value based on the tone scale value in the detection feature value, and the scale feature value includes the volume scale value, intonation Calibration value, dragging tone calibration value, speech speed calibration value and fluency calibration value, detection feature values include volume detection value, intonation detection value, dragging sound detection value, speech speed detection value and fluency detection value;
  • the first comparative analysis unit 3042 is used to compare and analyze the volume detection value and the volume level range value, and compare and analyze the intonation detection value and the intonation level range value to obtain an emotional orientation result;
  • the second comparative analysis unit 3043 is configured to perform comparative analysis between the detection value of drag and the preset range of drag, and compare and analyze the detection value of speech speed with the range of preset speech speed, so as to obtain the result of confidence level and the result of hesitation level , the preset drag tone range value includes drag tone scale value and/or drag tone preset value, and the preset speech rate range value includes speech rate scale value and/or preset speech rate value;
  • the third comparative analysis unit 3044 is configured to perform comparative analysis on the fluency detection value and the fluency calibration value to obtain the concentration result and the character trait result;
  • the generating unit 3045 is configured to generate a visual chart according to the results of emotional orientation, self-confidence, hesitation, concentration and personality, obtain the analysis results of the interviewer's status, and generate an evaluation report according to the analysis results of the interviewer's status.
  • the feature extraction module 302 can also be specifically used for:
  • the first identification and extraction unit 3021 is used to perform frame-by-frame processing respectively on the speech paragraphs to be calibrated and the speech paragraphs to be detected, and based on the identification and extraction of time-domain energy features, to obtain the calibration volume feature and the detection volume feature;
  • the second identification and extraction unit 3022 is used for identifying and extracting the speech segment to be calibrated and the speech segment to be detected based on the pitch period information and the pitch frequency information, respectively, to obtain the calibration intonation feature and the detection intonation feature;
  • the first calculation unit 3023 is used to perform envelope extraction, peak-valley calculation, and speech rate calculation in turn for the speech paragraphs to be calibrated and the speech paragraphs to be detected, to obtain the calibration dragging sound feature and the calibration speech speed feature, and detect dragging sound. feature and detect speech rate features;
  • the second computing unit 3024 is used to calculate the number of times of slides and pauses for the speech segment to be calibrated and the speech segment to be detected, respectively, according to the preset observation window length, to obtain the calibration fluency feature and the detection fluency feature;
  • Determining unit 3025 is used to determine the scaled volume feature, the scaled intonation feature, the scaled drag feature, the scaled speech speed feature and the scaled fluency feature as the scaled voice feature, and will detect the volume feature, the detected intonation feature, The detection of the dragging sound feature, the detection of the speech rate feature, and the detection of the fluency feature are determined as the detected speech feature.
  • the first computing unit 3023 can also be specifically used for:
  • the number of hangovers is determined, and the characteristic of scaled hangovers is obtained, and the number of hangovers is determined according to the number of detected syllables and the length of detected syllables to obtain the characteristics of detected hangovers;
  • the characteristics of the scaled speech rate are calculated, and according to the number of detected syllables and the duration of the detected segment, the characteristics of the detected speech rate are calculated.
  • the obtaining unit 3041 can also be specifically used for:
  • the level multiples of the volume calibration value in the calibration feature value and the intonation calibration value in the detection feature value are calculated respectively, to obtain the first volume calibration level value and the first intonation calibration level value
  • Presetting the second multiple calculating the level multiples of the volume scaled value and the intonation scaled value respectively, to obtain the second volume scaled level value and the second intonation scaled level value
  • the intonation scale interval is used to obtain the intonation level range value.
  • the endpoint detection module 301 can also be specifically used for:
  • Noise reduction processing and signal enhancement processing are performed on the interviewee's voice signal to obtain the remote interviewee's voice signal to be processed.
  • the voice-based intelligent interview evaluation device further includes:
  • the adjustment module 305 is configured to obtain optimization information based on the evaluation report, and adjust the execution process of the interviewee's status analysis result according to the optimization information.
  • each module and each unit in the above-mentioned voice-based intelligent interview evaluation device corresponds to each step in the above-mentioned embodiment of the above-mentioned voice-based intelligent interview evaluation method, and the functions and implementation process thereof will not be repeated here.
  • the voice signal can be calculated quickly and effectively.
  • the intermediate features of the voice signal of the remote interviewee to be processed, the calculation amount is small, the parameters are small, the robustness is strong, based on statistical signal processing, the interpretability is strong, the physical meaning is clear, and there is no need for too many prior assumptions, flexible use Improves the efficiency of remote interview assessments.
  • the voice-based intelligent interview evaluation device 500 may vary greatly due to different configurations or performances, and may include one or more than one Central processing units (CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532.
  • CPU Central processing units
  • storage media 530 eg, one or more mass storage devices
  • the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the voice-based intelligent interview assessment device 500 .
  • the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the voice-based intelligent interview assessment device 500 .
  • the voice-based intelligent interview assessment device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 531 For example Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • FIG. 5 does not constitute a limitation on the voice-based intelligent interview evaluation device, and may include more or less components than those shown in the figure, or a combination of certain some components, or a different arrangement of components.
  • the present application also provides a voice-based intelligent interview evaluation device, comprising: a memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line; the at least one processor The processor invokes the instructions in the memory, so that the voice-based intelligent interview evaluation device executes the steps in the above-mentioned voice-based intelligent interview evaluation method.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium may also be a volatile computer-readable storage medium.
  • the speech feature extraction is performed on the speech segment to be calibrated and the speech segment to be detected, respectively, to obtain the calibrated speech feature and the detected speech feature;
  • the detection characteristic value and the calibration characteristic value are compared and analyzed to obtain the analysis result of the interviewee's condition, and an evaluation report is generated according to the analysis result of the interviewer's condition.
  • the computer-readable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one function, and the like; Use the created data, etc.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

本申请涉及人工智能技术领域,提供一种基于语音的智能面试评估方法、装置、设备及存储介质,用于提高远程面谈评估的效率。基于语音的智能面试评估方法包括:对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,将有效语音段落划分为待定标语音段落和待检测语音段落;提取待定标语音段落的定标语音特征和待检测语音段落的检测语音特征;计算定标语音特征的定标特征值和检测语音特征的检测特征值;将检测特征值与定标特征值进行对比分析得到面试者状况分析结果,生成面试者状况分析结果的评估报告。此外,本申请还涉及区块链技术,待处理的远程面试者语音信号可存储于区块链中。

Description

基于语音的智能面试评估方法、装置、设备及存储介质
本申请要求于2021年2月25日提交中国专利局、申请号为202110209019.7、发明名称为“基于语音的智能面试评估方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及人工智能的智能决策领域,尤其涉及一种基于语音的智能面试评估方法、装置、设备及存储介质。
背景技术
随着网络技术和硬件设备的发展,数据的采集和传输越来越便捷,面试的说话人双方可以在电脑等设备上进行远程面谈。对于远程面谈中面试者的表现情况判断,为了解决人为判断所带来的主观影响和操作时长问题,采用了语音技术、机器学习和自然语言处理等智能处理技术,对远程面谈过程的各种信息进行了采集、分析和评估。
但是,发明人意识到上述方式中,是对面试者的答案文本和语音文本进行采集、分析和评估,且需综合除了语音之外的其他信息,以此得到面试者的素质和专业特质,造成了计算量大、参数多和可解释性弱的问题,从而导致了远程面谈评估的效率低。
发明内容
本申请提供一种基于语音的智能面试评估方法、装置、设备及存储介质,用于提高远程面谈评估的效率。
本申请第一方面提供了一种基于语音的智能面试评估方法,包括:
获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
本申请第二方面提供了一种基于语音的智能面试评估设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
本申请第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音 段落和待检测语音段落;
分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
本申请第四方面提供了一种基于语音的智能面试评估装置,包括:
端点检测模块,用于获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
特征提取模块,用于分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
计算模块,用于分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
分析生成模块,用于将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
本申请提供的技术方案中,获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划分为待定标语音段落和待检测语音段落;分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值;将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。本申请实施例中,通过计算对待处理的远程面试者语音信号的定标特征值以及检测特征值,将检测特征值与定标特征值进行对比分析,仅需要语音信号,便能够快速有效地计算出待处理的远程面试者语音信号的中间特征,计算量小,参数少,鲁棒性较强,基于统计信号处理,可解释性强,物理意义明确,无需太多先验假设,使用灵活,提高了远程面谈评估的效率。
附图说明
图1为本申请实施例中基于语音的智能面试评估方法的一个实施例示意图;
图2为本申请实施例中基于语音的智能面试评估方法的另一个实施例示意图;
图3为本申请实施例中基于语音的智能面试评估装置的一个实施例示意图;
图4为本申请实施例中基于语音的智能面试评估装置的另一个实施例示意图;
图5为本申请实施例中基于语音的智能面试评估设备的一个实施例示意图。
具体实施方式
本申请实施例提供了一种基于语音的智能面试评估方法、装置、设备及存储介质,提高了远程面谈评估的效率。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例 中基于语音的智能面试评估方法的一个实施例包括:
101、获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划分为待定标语音段落和待检测语音段落。
可以理解的是,本申请的执行主体可以为基于语音的智能面试评估装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。
在进行远程面试过程中,可通过麦克风或其他录音设备实时采集面试者的声音信号,即时域波形信号x(n)=[x1(t),x2(t),…,xN(t)],N表示采样点数,t表示采样时刻,服务器可通过接收麦克风或其他录音设备发送的面试者的声音信号,对该面试者的声音信号进行降噪增强处理,得到待处理的远程面试者语音信号;服务器也可从预置数据库中提取经过数据预处理的远程面试者语音信号,或者接收处理终端发送的待处理的远程面试者语音信号。
服务器调用预置的语音激活检测(voice activity detection,VAD)算法,检测待处理的远程面试者语音信号的端点,根据端点对待处理的远程面试者语音信号进行分割,得到有效语音段落。服务器按照预设的定标时段将有效语音段落划分为待定标语音段落和待检测语音段落,例如:预设的定标时段为语音信号的前20秒,则将前20秒(前M个)有效语音段落划分为待定标语音段落,将第20秒(第M+1个)之后的有效语音段落划分为待检测语音段落。
102、分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征。
其中,语音特征提取的特征包括但不限于音量特征、语调特征、拖音特征、语速特征和流利度特征。服务器可通过预置的语音特征模型,分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征,该语音特征模型为由按照预设的连接关系将音量特征、语调特征、拖音特征、语速特征和流利度特征分别对应的网络结构进行连接构造而成的模型,该语音特征模型能够用于提取音量特征、语调特征、拖音特征、语速特征和流利度特征。
服务器可通过分别对待定标语音段落和待检测语音段落进行分帧,帧间可以有重叠,逐帧计算分帧后的定标语音段落和待检测语音段落的音量特征,得到定标音量特征和检测音量特征,逐帧计算分帧后的定标语音段落和待检测语音段落的语调特征,得到定标语调特征和检测语调特征;服务器通过预置的幅度解调算法,分别对待定标语音段落和待检测语音段落进行包络的检测和提取,得到定标信号包络和检测信号包络,分别通过定标信号包络和检测信号包络,计算定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;服务器通过分别计算待定标语音段落和待检测语音段落基于预设时间长度内的停顿次数,得到定标流利度特征和检测流利度特征,从而得到定标语音特征和检测语音特征。
103、分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值。
服务器得到定标语音特征和检测语音特征后,计算定标语音特征的定标特征向量和检测语音特征的检测特征向量,通过预置的统计算法,计算定标特征向量的最值、均值、标准差和分位数,将定标语音特征和定标特征值写入预置的表格Excel中,得到定标特征值,同理可得检测特征值。其中,统计值包括但不限于最值、均值、标准差和分位数。
104、将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。
例如,服务器可通过预置的非线性模型,将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果;服务器也可按照预设的对比分析策略,将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,其中,面试者状况分析结果包括但不限于 情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果,该对比分析策略包括情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果的划分条件,其中,通过音量检测值与音量定标值,以及语调检测值与语调定标值的对比分析,能够得到情绪指向结果,通过拖音检测值与拖音定标值以及语速检测值与语速定标值的对比分析,能够得到自信程度结果和犹豫程度结果,通过流利度检测值与流利度定标值的对比分析,能够得到性格特质结果,生成情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果的键值,对预置的评估结果散列表进行键值对匹配,得到对应的面试者状况分析结果,该评估结果散列表包括情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果对应的等级分值、等级描述和评估结果,如:等级分值包括1-3、4-5、6-8和9-10,等级分值分别对应的等级描述分别为特差、差、良和优,对应的评估结果分别为不录用、不录用、不录用但放入储备库和录用。
服务器从预置的评估报告模块中根据面试者状况分析结果匹配对应的目标评估报告模块,通将目标评估报告模块进行合成,得到评估报告模板,将面试者状况分析结果写入评估报告模板中,得到评估报告。其中,情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果均可包括但不限于分值和/或程度描述词。
本申请实施例中,通过计算对待处理的远程面试者语音信号的定标特征值以及检测特征值,将检测特征值与定标特征值进行对比分析,仅需要语音信号,便能够快速有效地计算出待处理的远程面试者语音信号的中间特征,计算量小,参数少,鲁棒性较强,基于统计信号处理,可解释性强,物理意义明确,无需太多先验假设,使用灵活,提高了远程面谈评估的效率。
请参阅图2,本申请实施例中基于语音的智能面试评估方法的另一个实施例包括:
201、获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划分为待定标语音段落和待检测语音段落。
具体地,服务器获取初始远程面谈语音信号,对初始远程面谈语音信号进行声纹识别和声纹特征提取,得到声纹特征集;将声纹特征集与预置的面试官声纹特征信息进行匹配,得到匹配声纹特征,根据匹配声纹特征从声纹特征集中获取目标声纹特征;从初始远程面谈语音信号中,提取目标声纹特征对应的面试者语音信号;对面试者语音信号进行降噪处理和信号增强处理,得到待处理的远程面试者语音信号。
服务器通过接收预置终端或移动设备发送的远程面谈语音信号,得到初始远程面谈语音信号,对初始远程面谈语音信号进行声纹识别和声纹特征提取,得到声纹特征集,服务器可通过生成声纹特征集的对比知识图谱,并生成预置的面试官声纹特征信息的参考知识图谱,分别对对比知识图谱和参考知识图谱进行随机游走,得到对比声纹序列和参考声纹序列,计算对比声纹序列和参考声纹序列之间的相似度,判断相似度是否大于预设阈值,若是,则将大于预设阈值判定为匹配声纹特征(即为面试者官声纹特征),将声纹特征集中的匹配声纹特征删除,得到目标声纹特征,若否,则判定不存在匹配声纹特征,即该全都是面试者的声纹特征,并将声纹特征集确定为目标声纹特征;服务器得到目标声纹特征后,提取初始远程面谈语音信号中目标声纹特征对应的面试者语音信号,对面试者语音信号进行降噪处理和信号增强处理,以提高待处理的远程面试者语音信号的语音信噪比和质量。
202、分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征。
具体地,服务器分别对待定标语音段落和待检测语音段落依次进行分帧处理,以及基于时域能量特征的识别和提取,得到定标音量特征和检测音量特征;分别对待定标语音段落和待检测语音段落进行基于基音周期信息和基音频率信息的识别和提取,得到定标语调 特征和检测语调特征;分别对待定标语音段落和待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;按照预设的观察窗长度,分别对待定标语音段落和待检测语音段落依次进行滑动和停顿次数计算,得到定标流利度特征和检测流利度特征;将定标音量特征、定标语调特征、定标拖音特征、定标语速特征和定标流利度特征确定为定标语音特征,并将检测音量特征、检测语调特征、检测拖音特征、检测语速特征和检测流利度特征确定为检测语音特征。
服务器可通过分别对待定标语音段落和待检测语音段落进行分帧处理,帧间可以有重叠,通过快速傅里叶变换算法(fast fourier transformation,FFT),对每一帧定标语音段落和每一帧待检测语音段落分别进行快速傅里叶变换处理,得到处理后的待定标语音段落和处理后的待检测语音段落,分别计算处理后的待定标语音段落和处理后的待检测语音段落的基于时域的能量,并根据基于时域的能量进行特征提取,得到定标音量特征和检测音量特征;
服务器通过预置的基于短时自相关法和短时平均幅度差的算法,分别逐帧计算经过分帧处理后的待定标语音段落和待检测语音段落的基频信息(包括基音周期信息和基音频率信息),通过预置的声道模型和声管模型,分别逐帧计算经过分帧处理后的待定标语音段落和待检测语音段落的共振峰,将分帧处理后的待定标语音段落的基频和共振峰确定为定标语调特征,将分帧处理后的待检测语音段落的基频和共振峰确定为检测语调特征。
服务器分别对待定标语音段落和待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标语音特征中的定标拖音特征和定标语速特征,以及检测语音特征中的检测拖音特征和检测语速特征。
服务器按照预设的观察窗长度,分别对待定标语音段落和待检测语音段落依次进行滑动,得到滑动定标语音和滑动检测语音,计算滑动定标语音中相邻两端点之间的时长,得到定标时长,并计算滑动检测语音中相邻两端点之间的时长,得到检测时长,判断定标时长是否大于预设时长,若是,则判定为停顿,若否,则判定为非停顿,计算停顿的次数,得到定标流利度特征;判断检测时长是否大于预设时长,若是,则判定为停顿,若否,则判定为非停顿,计算停顿的次数,得到检测流利度特征。
具体地,服务器提取待定标语音段落的定标信号包络,以及待检测语音段落的检测信号包络;分别对定标信号包络和检测信号包络进行峰值谷值计算,得到定标音节数目和定标音节长度,以及检测音节数目和检测音节长度;根据定标音节数目和定标音节长度确定拖音数目,得到定标拖音特征,并根据检测音节数目和检测音节长度确定拖音数目,得到检测拖音特征;计算待定标语音段落的定标语段时长,以及待检测语音段落的检测语段时长;根据定标音节数目和定标语段时长,计算定标语速特征,并根据检测音节数目和检测语段时长,计算检测语速特征。
例如,服务器通过预置的幅度解调算法,分别对待定标语音段落和待检测语音段落进行包络检测,并提取检测所得的包络信息,得到定标信号包络和检测信号包络,计算定标信号包络中的峰值以及峰值相邻的两个谷值,得到定标音节数目NS1,计算两个谷值之间的时长,得到定标音节长度,同理可得检测定标音节数目NS2和检测定标音节长度,判断定标音节数目是否小于预设数目和/或定标音节长度小于预设长度,若是,则判定该音节为拖音,继续对别的音节进行判断,直到检测完音节数目中的最后一个音节,统计拖音的数量,得到定标拖音特征,若否,则判定该音节为非拖音,继续对别的音节进行判断,直到检测完音节数目中的最后一个音节,统计拖音的数量,得到定标拖音特征,同理可得检测拖音特征,得到定标语段时长T1和检测语段时长T2,通过计算NS1/T1=S1,得到定标语速特征,通过计算NS2/T2=S2,得到检测语速特征。
203、分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值。
步骤203的执行过程与上述步骤103的执行过程类似,在此不再赘述。
204、获取基于定标特征值中音量定标值的音量等级范围值,以及基于检测特征值中语调定标值的语调等级范围值,定标特征值包括音量定标值、语调定标值、拖音定标值、语速定标值和流利度定标值,检测特征值包括音量检测值、语调检测值、拖音检测值、语速检测值和流利度检测值。
具体地,服务器通过预设第一倍数,分别计算定标特征值中音量定标值和检测特征值中语调定标值的等级倍数,得到第一音量定标等级值和第一语调定标等级值,以及通过预设第二倍数,分别计算音量定标值和语调定标值的等级倍数,得到第二音量定标等级值和第二语调定标等级值;根据第一音量定标等级值和第二音量定标等级值确定多个音量定标区间,得到音量等级范围值,并根据第一语调定标等级值和第二语调定标等级值确定多个语调定标区间,得到语调等级范围值。
例如,预设第一倍数为α,预设第二倍数为β,音量定标值和语调定标值分别为Q和W,计算得到第一音量定标等级值αQ、第二音量定标等级值βQ、第一语调定标等级值αW和第二语调定标等级值βW,将音量等级范围值为[-∞,αQ]、[αQ,βQ]和[βQ,+∞],语调等级范围值为[-∞,αW]、[αW,βW]和[βW,+∞]。
205、将音量检测值与音量等级范围值进行对比分析,并将语调检测值和语调等级范围值进行对比分析,得到情绪指向结果。
服务器按照预设的情绪指向判断策略,将音量检测值与音量等级范围值进行判断分析,并将语调检测值和语调等级范围值进行判断分析,得到情绪指向结果,其中,预设的情绪指向判断策略如下:若音量定标值Q处于音量等级范围值[-∞,αQ]中且语调定标值W处于语调等级范围值[-∞,αW]中,则判定情绪指向为第一等级,若音量定标值Q处于音量等级范围值[αQ,βQ]中,且语调定标值W处于语调等级范围值[αW,βW]中,则判定情绪指向为第二等级,若音量定标值Q处于音量等级范围值[βQ,+∞]中,且语调定标值W处于语调等级范围值[βW,+∞]中,则判定情绪指向为第三等级,从而得到情绪指向结果。
206、将拖音检测值与预设拖音范围值进行对比分析,并将语速检测值与预设语速范围值进行对比分析,得到自信程度结果和犹豫程度结果,预设拖音范围值包括拖音定标值和/或拖音预设值,预设语速范围值包括语速定标值和/或预设语速值。
例如,判断拖音检测值是否小于或等于预设拖音范围值,若是,则判定拖音检测值较小;若否,则判定拖音检测值较大,判断语速检测值是否小于或等于预设语速范围值,若是,则判定语速检测值较小,若否,则判定语速检测值较大,若为拖音检测值较大且语速检测值较小,则判定犹豫程度较大和自信程度较小,若拖音检测值较小且语速检测值较大,则判定犹豫程度较小和自信程度较大,并根据拖音检测值和语速检测值,对预置的自信犹豫分值决策树进行检索,得到对应的自信程度分值和犹豫程度分值,从而得到自信程度结果和犹豫程度结果。
207、将流利度检测值与流利度定标值进行对比分析,得到专注度结果和性格特质结果。
例如,服务器除了将流利度检测值与流利度定标值进行对比分析之外,还将流利度检测值与预设停顿次数进行对比分析,即服务器判断流利度定标值是否小于流利度定标值,且流利度检测值小于预设停顿次数,若是,则判定专注度高和性格特质为热情外向,若否,则判定专注度低和性格特质为平稳内向,并根据流利度检测值和停顿次数,对预置的流利度分值决策树进行检索,得到对应的专注度分值和性格特质分值,从而得到专注度结果(专注度高或专注度低,以及专注度分值)和性格特质结果(性格特质为热情外向或性格特质为平稳内向,以及性格特质结果)。
208、根据情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果生成可视化图表,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。
服务器通过预置的线性判别分析算法,对情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果进行分类,得到分类信息,该分类信息可包括但不限于表现类型,根据情绪指向结果、自信程度结果、犹豫程度结果、专注度结果、性格特质结果和分类信息生成可视化图表,从而得到面试者状况分析结果,将该面试者状况分析结果写入评估文本模板中,得到评估报告。
具体地,服务器根据情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果生成可视化图表,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告之后,还获取基于评估报告的优化信息,根据优化信息对面试者状况分析结果的执行过程进行调整。
服务器得到评估报告后,将该评估报告发送至面试官的终端,通过面试官的终端,根据预设的优化调整策略,对该评估报告进行分析,得到优化信息,或者通过面试官在面试官的终端的显示界面上输入基于评估报告的优化信息,面试官的终端得到优化信息后,将该优先信息发送至服务器,其中,该优化信息可包括但不限于对评估报告的评分以及该优化的意见信息,服务器接收到该优化信息后,根据该优化信息,对面试者状况分析结果的执行过程进行调整,以及对面试者状况分析结果所采用的算法或模型进行增加或删除、网络结构调整和运用对象调整,以实现面试者状况分析结果的执行过程的不断优化,提高了面试者状况分析结果的准确性。
本申请实施例中,通过计算对待处理的远程面试者语音信号的定标特征值以及检测特征值,将检测特征值与定标特征值进行对比分析,仅需要语音信号,便能够快速有效地计算出待处理的远程面试者语音信号的中间特征,计算量小,参数少,鲁棒性较强,基于统计信号处理,可解释性强,物理意义明确,无需太多先验假设,使用灵活,提高了远程面谈评估的效率。
上面对本申请实施例中基于语音的智能面试评估方法进行了描述,下面对本申请实施例中基于语音的智能面试评估装置进行描述,请参阅图3,本申请实施例中基于语音的智能面试评估装置一个实施例包括:
端点检测模块301,用于获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划分为待定标语音段落和待检测语音段落;
特征提取模块302,用于分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
计算模块303,用于分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值;
分析生成模块304,用于将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。
上述基于语音的智能面试评估装置中各个模块的功能实现与上述基于语音的智能面试评估方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
本申请实施例中,通过计算对待处理的远程面试者语音信号的定标特征值以及检测特征值,将检测特征值与定标特征值进行对比分析,仅需要语音信号,便能够快速有效地计算出待处理的远程面试者语音信号的中间特征,计算量小,参数少,鲁棒性较强,基于统计信号处理,可解释性强,物理意义明确,无需太多先验假设,使用灵活,提高了远程面谈评估的效率。
请参阅图4,本申请实施例中基于语音的智能面试评估装置的另一个实施例包括:
端点检测模块301,用于获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划 分为待定标语音段落和待检测语音段落;
特征提取模块302,用于分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
计算模块303,用于分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值;
分析生成模块304,用于将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告;
其中,分析生成模块304具体包括:
获取单元3041,用于获取基于定标特征值中音量定标值的音量等级范围值,以及基于检测特征值中语调定标值的语调等级范围值,定标特征值包括音量定标值、语调定标值、拖音定标值、语速定标值和流利度定标值,检测特征值包括音量检测值、语调检测值、拖音检测值、语速检测值和流利度检测值;
第一对比分析单元3042,用于将音量检测值与音量等级范围值进行对比分析,并将语调检测值和语调等级范围值进行对比分析,得到情绪指向结果;
第二对比分析单元3043,用于将拖音检测值与预设拖音范围值进行对比分析,并将语速检测值与预设语速范围值进行对比分析,得到自信程度结果和犹豫程度结果,预设拖音范围值包括拖音定标值和/或拖音预设值,预设语速范围值包括语速定标值和/或预设语速值;
第三对比分析单元3044,用于将流利度检测值与流利度定标值进行对比分析,得到专注度结果和性格特质结果;
生成单元3045,用于根据情绪指向结果、自信程度结果、犹豫程度结果、专注度结果和性格特质结果生成可视化图表,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。
可选的,特征提取模块302还可以具体用于:
第一识别提取单元3021,用于分别对待定标语音段落和待检测语音段落依次进行分帧处理,以及基于时域能量特征的识别和提取,得到定标音量特征和检测音量特征;
第二识别提取单元3022,用于分别对待定标语音段落和待检测语音段落进行基于基音周期信息和基音频率信息的识别和提取,得到定标语调特征和检测语调特征;
第一计算单元3023,用于分别对待定标语音段落和待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;
第二计算单元3024,用于按照预设的观察窗长度,分别对待定标语音段落和待检测语音段落依次进行滑动和停顿次数计算,得到定标流利度特征和检测流利度特征;
确定单元3025,用于将定标音量特征、定标语调特征、定标拖音特征、定标语速特征和定标流利度特征确定为定标语音特征,并将检测音量特征、检测语调特征、检测拖音特征、检测语速特征和检测流利度特征确定为检测语音特征。
可选的,第一计算单元3023还可以具体用于:
提取待定标语音段落的定标信号包络,以及待检测语音段落的检测信号包络;
分别对定标信号包络和检测信号包络进行峰值谷值计算,得到定标音节数目和定标音节长度,以及检测音节数目和检测音节长度;
根据定标音节数目和定标音节长度确定拖音数目,得到定标拖音特征,并根据检测音节数目和检测音节长度确定拖音数目,得到检测拖音特征;
计算待定标语音段落的定标语段时长,以及待检测语音段落的检测语段时长;
根据定标音节数目和定标语段时长,计算定标语速特征,并根据检测音节数目和检测 语段时长,计算检测语速特征。
可选的,获取单元3041还可以具体用于:
通过预设第一倍数,分别计算定标特征值中音量定标值和检测特征值中语调定标值的等级倍数,得到第一音量定标等级值和第一语调定标等级值,以及通过预设第二倍数,分别计算音量定标值和语调定标值的等级倍数,得到第二音量定标等级值和第二语调定标等级值;
根据第一音量定标等级值和第二音量定标等级值确定多个音量定标区间,得到音量等级范围值,并根据第一语调定标等级值和第二语调定标等级值确定多个语调定标区间,得到语调等级范围值。
可选的,端点检测模块301还可以具体用于:
获取初始远程面谈语音信号,对初始远程面谈语音信号进行声纹识别和声纹特征提取,得到声纹特征集;
将声纹特征集与预置的面试官声纹特征信息进行匹配,得到匹配声纹特征,根据匹配声纹特征从声纹特征集中获取目标声纹特征;
从初始远程面谈语音信号中,提取目标声纹特征对应的面试者语音信号;
对面试者语音信号进行降噪处理和信号增强处理,得到待处理的远程面试者语音信号。
可选的,基于语音的智能面试评估装置,还包括:
调整模块305,用于获取基于评估报告的优化信息,根据优化信息对面试者状况分析结果的执行过程进行调整。
上述基于语音的智能面试评估装置中各模块和各单元的功能实现与上述基于语音的智能面试评估方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。
本申请实施例中,通过计算对待处理的远程面试者语音信号的定标特征值以及检测特征值,将检测特征值与定标特征值进行对比分析,仅需要语音信号,便能够快速有效地计算出待处理的远程面试者语音信号的中间特征,计算量小,参数少,鲁棒性较强,基于统计信号处理,可解释性强,物理意义明确,无需太多先验假设,使用灵活,提高了远程面谈评估的效率。
上面图3和图4从模块化功能实体的角度对本申请实施例中的基于语音的智能面试评估装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于语音的智能面试评估设备进行详细描述。
图5是本申请实施例提供的一种基于语音的智能面试评估设备的结构示意图,该基于语音的智能面试评估设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于语音的智能面试评估设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存储介质530通信,在基于语音的智能面试评估设备500上执行存储介质530中的一系列指令操作。
基于语音的智能面试评估设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的基于语音的智能面试评估设备结构并不构成对基于语音的智能面试评估设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请还提供一种基于语音的智能面试评估设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述基于语音的智能面试评估设备执行上述基于语音的智能面试评估方法中的步骤。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行如下步骤:
获取待处理的远程面试者语音信号,对待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将有效语音段落划分为待定标语音段落和待检测语音段落;
分别对待定标语音段落和待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
分别计算定标语音特征和检测语音特征的统计值,得到定标特征值和检测特征值;
将检测特征值与定标特征值进行对比分析,得到面试者状况分析结果,并根据面试者状况分析结果生成评估报告。
进一步地,计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (20)

  1. 一种基于语音的智能面试评估方法,其中,所述基于语音的智能面试评估方法包括:
    获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
    分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
    分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
    将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
  2. 根据权利要求1所述的基于语音的智能面试评估方法,其中,所述分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征,包括:
    分别对所述待定标语音段落和所述待检测语音段落依次进行分帧处理,以及基于时域能量特征的识别和提取,得到定标音量特征和检测音量特征;
    分别对所述待定标语音段落和所述待检测语音段落进行基于基音周期信息和基音频率信息的识别和提取,得到定标语调特征和检测语调特征;
    分别对所述待定标语音段落和所述待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;
    按照预设的观察窗长度,分别对所述待定标语音段落和所述待检测语音段落依次进行滑动和停顿次数计算,得到定标流利度特征和检测流利度特征;
    将所述定标音量特征、所述定标语调特征、所述定标拖音特征、所述定标语速特征和所述定标流利度特征确定为定标语音特征,并将所述检测音量特征、所述检测语调特征、所述检测拖音特征、所述检测语速特征和所述检测流利度特征确定为检测语音特征。
  3. 根据权利要求2所述的基于语音的智能面试评估方法,其中,所述分别对所述待定标语音段落和所述待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征,包括:
    提取所述待定标语音段落的定标信号包络,以及所述待检测语音段落的检测信号包络;
    分别对所述定标信号包络和所述检测信号包络进行峰值谷值计算,得到定标音节数目和定标音节长度,以及检测音节数目和检测音节长度;
    根据所述定标音节数目和所述定标音节长度确定拖音数目,得到定标拖音特征,并根据所述检测音节数目和所述检测音节长度确定拖音数目,得到检测拖音特征;
    计算所述待定标语音段落的定标语段时长,以及所述待检测语音段落的检测语段时长;
    根据所述定标音节数目和所述定标语段时长,计算定标语速特征,并根据所述检测音节数目和所述检测语段时长,计算检测语速特征。
  4. 根据权利要求1所述的基于语音的智能面试评估方法,其中,所述将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告,包括:
    获取基于所述定标特征值中音量定标值的音量等级范围值,以及基于所述检测特征值中语调定标值的语调等级范围值,所述定标特征值包括音量定标值、语调定标值、拖音定标值、语速定标值和流利度定标值,所述检测特征值包括音量检测值、语调检测值、拖音检测值、语速检测值和流利度检测值;
    将所述音量检测值与所述音量等级范围值进行对比分析,并将所述语调检测值和所述 语调等级范围值进行对比分析,得到情绪指向结果;
    将所述拖音检测值与预设拖音范围值进行对比分析,并将所述语速检测值与预设语速范围值进行对比分析,得到自信程度结果和犹豫程度结果,所述预设拖音范围值包括所述拖音定标值和/或拖音预设值,所述预设语速范围值包括所述语速定标值和/或预设语速值;
    将所述流利度检测值与所述流利度定标值进行对比分析,得到专注度结果和性格特质结果;
    根据所述情绪指向结果、所述自信程度结果、所述犹豫程度结果、所述专注度结果和所述性格特质结果生成可视化图表,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
  5. 根据权利要求4所述的基于语音的智能面试评估方法,其中,所述获取基于所述定标特征值中音量定标值的音量等级范围值,以及基于所述检测特征值中语调定标值的语调等级范围值,包括:
    通过预设第一倍数,分别计算所述定标特征值中音量定标值和所述检测特征值中语调定标值的等级倍数,得到第一音量定标等级值和第一语调定标等级值,以及通过预设第二倍数,分别计算所述音量定标值和所述语调定标值的等级倍数,得到第二音量定标等级值和第二语调定标等级值;
    根据所述第一音量定标等级值和所述第二音量定标等级值确定多个音量定标区间,得到音量等级范围值,并根据所述第一语调定标等级值和所述第二语调定标等级值确定多个语调定标区间,得到语调等级范围值。
  6. 根据权利要求1所述的基于语音的智能面试评估方法,其中,所述获取待处理的远程面试者语音信号,包括:
    获取初始远程面谈语音信号,对所述初始远程面谈语音信号进行声纹识别和声纹特征提取,得到声纹特征集;
    将所述声纹特征集与预置的面试官声纹特征信息进行匹配,得到匹配声纹特征,根据所述匹配声纹特征从所述声纹特征集中获取目标声纹特征;
    从所述初始远程面谈语音信号中,提取所述目标声纹特征对应的面试者语音信号;
    对所述面试者语音信号进行降噪处理和信号增强处理,得到待处理的远程面试者语音信号。
  7. 根据权利要求1-6中任一项所述的基于语音的智能面试评估方法,其中,所述将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告之后,还包括:
    获取基于所述评估报告的优化信息,根据所述优化信息对所述面试者状况分析结果的执行过程进行调整。
  8. 一种基于语音的智能面试评估设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
    分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
    分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
    将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根 据所述面试者状况分析结果生成评估报告。
  9. 根据权利要求8所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    分别对所述待定标语音段落和所述待检测语音段落依次进行分帧处理,以及基于时域能量特征的识别和提取,得到定标音量特征和检测音量特征;
    分别对所述待定标语音段落和所述待检测语音段落进行基于基音周期信息和基音频率信息的识别和提取,得到定标语调特征和检测语调特征;
    分别对所述待定标语音段落和所述待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;
    按照预设的观察窗长度,分别对所述待定标语音段落和所述待检测语音段落依次进行滑动和停顿次数计算,得到定标流利度特征和检测流利度特征;
    将所述定标音量特征、所述定标语调特征、所述定标拖音特征、所述定标语速特征和所述定标流利度特征确定为定标语音特征,并将所述检测音量特征、所述检测语调特征、所述检测拖音特征、所述检测语速特征和所述检测流利度特征确定为检测语音特征。
  10. 根据权利要求9所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    提取所述待定标语音段落的定标信号包络,以及所述待检测语音段落的检测信号包络;
    分别对所述定标信号包络和所述检测信号包络进行峰值谷值计算,得到定标音节数目和定标音节长度,以及检测音节数目和检测音节长度;
    根据所述定标音节数目和所述定标音节长度确定拖音数目,得到定标拖音特征,并根据所述检测音节数目和所述检测音节长度确定拖音数目,得到检测拖音特征;
    计算所述待定标语音段落的定标语段时长,以及所述待检测语音段落的检测语段时长;
    根据所述定标音节数目和所述定标语段时长,计算定标语速特征,并根据所述检测音节数目和所述检测语段时长,计算检测语速特征。
  11. 根据权利要求8所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    获取基于所述定标特征值中音量定标值的音量等级范围值,以及基于所述检测特征值中语调定标值的语调等级范围值,所述定标特征值包括音量定标值、语调定标值、拖音定标值、语速定标值和流利度定标值,所述检测特征值包括音量检测值、语调检测值、拖音检测值、语速检测值和流利度检测值;
    将所述音量检测值与所述音量等级范围值进行对比分析,并将所述语调检测值和所述语调等级范围值进行对比分析,得到情绪指向结果;
    将所述拖音检测值与预设拖音范围值进行对比分析,并将所述语速检测值与预设语速范围值进行对比分析,得到自信程度结果和犹豫程度结果,所述预设拖音范围值包括所述拖音定标值和/或拖音预设值,所述预设语速范围值包括所述语速定标值和/或预设语速值;
    将所述流利度检测值与所述流利度定标值进行对比分析,得到专注度结果和性格特质结果;
    根据所述情绪指向结果、所述自信程度结果、所述犹豫程度结果、所述专注度结果和所述性格特质结果生成可视化图表,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
  12. 根据权利要求11所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    通过预设第一倍数,分别计算所述定标特征值中音量定标值和所述检测特征值中语调定标值的等级倍数,得到第一音量定标等级值和第一语调定标等级值,以及通过预设第二 倍数,分别计算所述音量定标值和所述语调定标值的等级倍数,得到第二音量定标等级值和第二语调定标等级值;
    根据所述第一音量定标等级值和所述第二音量定标等级值确定多个音量定标区间,得到音量等级范围值,并根据所述第一语调定标等级值和所述第二语调定标等级值确定多个语调定标区间,得到语调等级范围值。
  13. 根据权利要求8所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    获取初始远程面谈语音信号,对所述初始远程面谈语音信号进行声纹识别和声纹特征提取,得到声纹特征集;
    将所述声纹特征集与预置的面试官声纹特征信息进行匹配,得到匹配声纹特征,根据所述匹配声纹特征从所述声纹特征集中获取目标声纹特征;
    从所述初始远程面谈语音信号中,提取所述目标声纹特征对应的面试者语音信号;
    对所述面试者语音信号进行降噪处理和信号增强处理,得到待处理的远程面试者语音信号。
  14. 根据权利要求8-13中任一项所述的基于语音的智能面试评估设备,所述处理器执行所述计算机程序时还实现以下步骤:
    获取基于所述评估报告的优化信息,根据所述优化信息对所述面试者状况分析结果的执行过程进行调整。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
    获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
    分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
    分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
    将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
  16. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行如下步骤:
    分别对所述待定标语音段落和所述待检测语音段落依次进行分帧处理,以及基于时域能量特征的识别和提取,得到定标音量特征和检测音量特征;
    分别对所述待定标语音段落和所述待检测语音段落进行基于基音周期信息和基音频率信息的识别和提取,得到定标语调特征和检测语调特征;
    分别对所述待定标语音段落和所述待检测语音段落依次进行包络提取、峰值谷值计算和语速计算,得到定标拖音特征和定标语速特征,以及检测拖音特征和检测语速特征;
    按照预设的观察窗长度,分别对所述待定标语音段落和所述待检测语音段落依次进行滑动和停顿次数计算,得到定标流利度特征和检测流利度特征;
    将所述定标音量特征、所述定标语调特征、所述定标拖音特征、所述定标语速特征和所述定标流利度特征确定为定标语音特征,并将所述检测音量特征、所述检测语调特征、所述检测拖音特征、所述检测语速特征和所述检测流利度特征确定为检测语音特征。
  17. 根据权利要求16所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行如下步骤:
    提取所述待定标语音段落的定标信号包络,以及所述待检测语音段落的检测信号包络;
    分别对所述定标信号包络和所述检测信号包络进行峰值谷值计算,得到定标音节数目和定标音节长度,以及检测音节数目和检测音节长度;
    根据所述定标音节数目和所述定标音节长度确定拖音数目,得到定标拖音特征,并根据所述检测音节数目和所述检测音节长度确定拖音数目,得到检测拖音特征;
    计算所述待定标语音段落的定标语段时长,以及所述待检测语音段落的检测语段时长;
    根据所述定标音节数目和所述定标语段时长,计算定标语速特征,并根据所述检测音节数目和所述检测语段时长,计算检测语速特征。
  18. 根据权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行如下步骤:
    获取基于所述定标特征值中音量定标值的音量等级范围值,以及基于所述检测特征值中语调定标值的语调等级范围值,所述定标特征值包括音量定标值、语调定标值、拖音定标值、语速定标值和流利度定标值,所述检测特征值包括音量检测值、语调检测值、拖音检测值、语速检测值和流利度检测值;
    将所述音量检测值与所述音量等级范围值进行对比分析,并将所述语调检测值和所述语调等级范围值进行对比分析,得到情绪指向结果;
    将所述拖音检测值与预设拖音范围值进行对比分析,并将所述语速检测值与预设语速范围值进行对比分析,得到自信程度结果和犹豫程度结果,所述预设拖音范围值包括所述拖音定标值和/或拖音预设值,所述预设语速范围值包括所述语速定标值和/或预设语速值;
    将所述流利度检测值与所述流利度定标值进行对比分析,得到专注度结果和性格特质结果;
    根据所述情绪指向结果、所述自信程度结果、所述犹豫程度结果、所述专注度结果和所述性格特质结果生成可视化图表,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
  19. 根据权利要求18所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行如下步骤:
    通过预设第一倍数,分别计算所述定标特征值中音量定标值和所述检测特征值中语调定标值的等级倍数,得到第一音量定标等级值和第一语调定标等级值,以及通过预设第二倍数,分别计算所述音量定标值和所述语调定标值的等级倍数,得到第二音量定标等级值和第二语调定标等级值;
    根据所述第一音量定标等级值和所述第二音量定标等级值确定多个音量定标区间,得到音量等级范围值,并根据所述第一语调定标等级值和所述第二语调定标等级值确定多个语调定标区间,得到语调等级范围值。
  20. 一种基于语音的智能面试评估装置,其中,所述基于语音的智能面试评估装置包括:
    端点检测模块,用于获取待处理的远程面试者语音信号,对所述待处理的远程面试者语音信号进行端点检测,得到有效语音段落,并按照预设的定标时段,将所述有效语音段落划分为待定标语音段落和待检测语音段落;
    特征提取模块,用于分别对所述待定标语音段落和所述待检测语音段落进行语音特征提取,得到定标语音特征和检测语音特征;
    计算模块,用于分别计算所述定标语音特征和所述检测语音特征的统计值,得到定标特征值和检测特征值;
    分析生成模块,用于将所述检测特征值与所述定标特征值进行对比分析,得到面试者状况分析结果,并根据所述面试者状况分析结果生成评估报告。
PCT/CN2021/109701 2021-02-25 2021-07-30 基于语音的智能面试评估方法、装置、设备及存储介质 WO2022179048A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110209019.7A CN112786054A (zh) 2021-02-25 2021-02-25 基于语音的智能面试评估方法、装置、设备及存储介质
CN202110209019.7 2021-02-25

Publications (1)

Publication Number Publication Date
WO2022179048A1 true WO2022179048A1 (zh) 2022-09-01

Family

ID=75761863

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109701 WO2022179048A1 (zh) 2021-02-25 2021-07-30 基于语音的智能面试评估方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112786054A (zh)
WO (1) WO2022179048A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116280A (zh) * 2023-08-08 2023-11-24 无锡爱视智能科技有限责任公司 一种基于人工智能的语音数据智能管理系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112786054A (zh) * 2021-02-25 2021-05-11 深圳壹账通智能科技有限公司 基于语音的智能面试评估方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060017340A (ko) * 2004-08-20 2006-02-23 동아시테크주식회사 온라인 외국어 인터뷰 학습 및 평가 시스템과 그 시스템을이용한 인터뷰 학습 및 평가 방법
CN110827796A (zh) * 2019-09-23 2020-02-21 平安科技(深圳)有限公司 基于语音的面试者判定方法、装置、终端及存储介质
CN111126553A (zh) * 2019-12-25 2020-05-08 平安银行股份有限公司 智能机器人面试方法、设备、存储介质及装置
CN111554324A (zh) * 2020-04-01 2020-08-18 深圳壹账通智能科技有限公司 智能化语言流利度识别方法、装置、电子设备及存储介质
CN112786054A (zh) * 2021-02-25 2021-05-11 深圳壹账通智能科技有限公司 基于语音的智能面试评估方法、装置、设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440864A (zh) * 2013-07-31 2013-12-11 湖南大学 基于语音的人格特征预测方法
CN106663383B (zh) * 2014-06-23 2020-04-28 因特维欧研发股份有限公司 分析受试者的方法和系统
KR101779358B1 (ko) * 2016-11-30 2017-09-18 동서대학교산학협력단 스마트폰 기반 음성인식 어플리케이션 제어 방법
CN109637520B (zh) * 2018-10-16 2023-08-22 平安科技(深圳)有限公司 基于语音分析的敏感内容识别方法、装置、终端及介质
CN110070332A (zh) * 2019-03-13 2019-07-30 平安城市建设科技(深圳)有限公司 基于人工智能的面试方法、装置、设备及可读存储介质
CN111862946B (zh) * 2019-05-17 2024-04-19 北京嘀嘀无限科技发展有限公司 一种订单处理方法、装置、电子设备及存储介质
CN110378228A (zh) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 面审视频数据处理方法、装置、计算机设备和存储介质
CN110211591B (zh) * 2019-06-24 2021-12-21 卓尔智联(武汉)研究院有限公司 基于情感分类的面试数据分析方法、计算机装置及介质
CN110688499A (zh) * 2019-08-13 2020-01-14 深圳壹账通智能科技有限公司 数据处理方法、装置、计算机设备和存储介质
CN111222837A (zh) * 2019-10-12 2020-06-02 中国平安财产保险股份有限公司 智能化面试的方法、系统、设备及计算机存储介质
CN110867193A (zh) * 2019-11-26 2020-03-06 广东外语外贸大学 一种段落英语口语评分方法及系统
CN111429899A (zh) * 2020-02-27 2020-07-17 深圳壹账通智能科技有限公司 基于人工智能的语音响应处理方法、装置、设备及介质
CN111681681A (zh) * 2020-05-22 2020-09-18 深圳壹账通智能科技有限公司 语音情绪识别方法、装置、电子设备及存储介质
CN112000776A (zh) * 2020-08-27 2020-11-27 中国平安财产保险股份有限公司 基于语音语义的话题匹配方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060017340A (ko) * 2004-08-20 2006-02-23 동아시테크주식회사 온라인 외국어 인터뷰 학습 및 평가 시스템과 그 시스템을이용한 인터뷰 학습 및 평가 방법
CN110827796A (zh) * 2019-09-23 2020-02-21 平安科技(深圳)有限公司 基于语音的面试者判定方法、装置、终端及存储介质
CN111126553A (zh) * 2019-12-25 2020-05-08 平安银行股份有限公司 智能机器人面试方法、设备、存储介质及装置
CN111554324A (zh) * 2020-04-01 2020-08-18 深圳壹账通智能科技有限公司 智能化语言流利度识别方法、装置、电子设备及存储介质
CN112786054A (zh) * 2021-02-25 2021-05-11 深圳壹账通智能科技有限公司 基于语音的智能面试评估方法、装置、设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116280A (zh) * 2023-08-08 2023-11-24 无锡爱视智能科技有限责任公司 一种基于人工智能的语音数据智能管理系统及方法
CN117116280B (zh) * 2023-08-08 2024-04-09 无锡爱视智能科技有限责任公司 一种基于人工智能的语音数据智能管理系统及方法

Also Published As

Publication number Publication date
CN112786054A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
Gomez-Alanis et al. A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection
Muckenhirn et al. Towards directly modeling raw speech signal for speaker verification using CNNs
Ajmera et al. Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
CN107492382B (zh) 基于神经网络的声纹信息提取方法及装置
CN108922541B (zh) 基于dtw和gmm模型的多维特征参数声纹识别方法
WO2022179048A1 (zh) 基于语音的智能面试评估方法、装置、设备及存储介质
Vashkevich et al. Classification of ALS patients based on acoustic analysis of sustained vowel phonations
Zhang et al. I-vector based physical task stress detection with different fusion strategies
CN110767239A (zh) 一种基于深度学习的声纹识别方法、装置及设备
Ananthi et al. SVM and HMM modeling techniques for speech recognition using LPCC and MFCC features
Zhang et al. Voice biometric identity authentication system based on android smart phone
Karthikeyan Adaptive boosted random forest-support vector machine based classification scheme for speaker identification
Lopez-Otero et al. A study of acoustic features for the classification of depressed speech
Ankışhan A new approach for detection of pathological voice disorders with reduced parameters
Nasr et al. Text-independent speaker recognition using deep neural networks
Kalimoldayev et al. Voice verification and identification using i-vector representation
Vivaracho-Pascual et al. Client threshold prediction in biometric signature recognition by means of Multiple Linear Regression and its use for score normalization
Kanisha et al. Speech recognition with advanced feature extraction methods using adaptive particle swarm optimization
Ghonem et al. Classification of stuttering events using i-vector
Balpande et al. Speaker recognition based on mel-frequency cepstral coefficients and vector quantization
CN114220419A (zh) 一种语音评价方法、装置、介质及设备
Sleit et al. A histogram based speaker identification technique
Alwahed et al. ARABIC SPEECH RECOGNITION BASED ON KNN, J48, AND LVQ
Elbarougy et al. An improved speech emotion classification approach based on optimal voiced unit
Kita et al. Personal Identification with Face and Voice Features Extracted through Kinect Sensor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927477

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.10.2023)