CN115796653A - Interview speech evaluation method and system - Google Patents

Interview speech evaluation method and system Download PDF

Info

Publication number
CN115796653A
CN115796653A CN202211438104.1A CN202211438104A CN115796653A CN 115796653 A CN115796653 A CN 115796653A CN 202211438104 A CN202211438104 A CN 202211438104A CN 115796653 A CN115796653 A CN 115796653A
Authority
CN
China
Prior art keywords
interviewer
speech
interview
voice
pronunciation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211438104.1A
Other languages
Chinese (zh)
Inventor
徐赞
陈启实
陈晚云
尹建树
许宁
李联凯
余文涛
岳子力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202211438104.1A priority Critical patent/CN115796653A/en
Publication of CN115796653A publication Critical patent/CN115796653A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides an interview speech evaluation method and system, and the interview speech evaluation method comprises the following steps: carrying out Hash comparison on pronunciation in the speech of the interviewee and the original text through a voice recognition system; scoring the pronunciation of the interviewer according to the comparison result; according to the number of phonemes per sentence (N) the interviewer reads over U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); according to example doFinally unifying the speech speed of the sentence (P) to obtain the average deviation degree (S); comparing the average deviation degree with a set speech speed; and determining the interviewer's speech rate score. By analyzing the voice and the intonation of the interviewer respectively, after the audio of the interviewer is obtained, similarity matching can be carried out on the audio of the reader after word segmentation and the standard phoneme. Therefore, the interviewer can accurately obtain the speaking effect.

Description

Interview speech evaluation method and system
Technical Field
The application relates to the technical field of photovoltaics, in particular to an interview speech evaluation method and system.
Background
In the interviewing process, the interviewee can give good impression to interviewee with self-confident and smiling expressions, so that the success rate of interviewing is improved; and the emotional expressions of anxiety, irritability or anger can give the interviewer a bad impression, thereby reducing the success rate of interviewing. In the actual interviewing process, an interviewer often generates a phenomenon of pronouncing due to tension and inexperience, so that an interviewer feels a poor impression, and further the interviewing fails. In order to improve the success rate of interviewing, an interviewer simulates the interviewing process and trains own speech, however, the method for artificially training the speech has large subjectivity and limited training effect on the interviewer due to the lack of effective reference substances or marked evaluation standards.
Disclosure of Invention
The application provides an interview speech evaluation method which is used for improving accuracy of judgment of interview speech.
In a first aspect, an interview speech evaluation method is provided, which includes the following steps:
collecting the speech of an interviewer through a voice recognition system;
carrying out Hash comparison on pronunciation in the speech of the interviewer and the original text through a voice recognition system; scoring the interviewer's pronunciation according to the comparison result;
according to the number of phonemes per sentence (N) read by the interviewer U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G);
finally, uniformly obtaining an average deviation degree (S) according to the speech speeds of different sentences (P) in the example;
comparing the average deviation degree with a set speech rate; and determining the interviewer's speech rate score.
In the technical scheme, the voice and the tone of the interviewee are respectively analyzed, and after the audio of the interviewee is obtained, similarity matching can be carried out on the voice and the tone of the interviewee with the standard phoneme after the voice of the reader is subjected to word segmentation. After all the words are matched, the corresponding scores can be obtained. And in the aspect of the speech rate, the balance judgment can be carried out according to the number and time of the reading contents of the readers, and whether the speech rate of a specific sentence in the reading contents is close to the standard speech rate or not is respectively judged in a sentence manner instead of adopting the whole time. Therefore, the interviewer can accurately obtain the speaking effect.
In a specific embodiment, the number of phonemes per sentence (N) according to the interviewer's interpretation is U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); the method specifically comprises the following steps:
determining the offset according to a formula:
Figure BDA0003946372670000021
in a specific embodiment, the average deviation (S) is obtained according to the speech rate of different sentences (P) in the example; specifically, the method comprises the following steps of;
obtaining the average deviation degree according to a formula:
Figure BDA0003946372670000022
in a specific embodiment, the set speech rate is:
dubbing voices according to voice professionals, and analyzing the intonation of the voice professionals to obtain a oscillogram; and extracting intonation features in the oscillogram to obtain the set speech rate.
In a specific embodiment, the pronunciation in the speech of the interviewer is hashed with the original text by the speech recognition system; scoring the interviewer's pronunciation according to the comparison result; the method specifically comprises the following steps:
and converting the pronunciation of the interviewer into a waveform, matching the waveform with a waveform diagram of a professional, and scoring the pronunciation of the interviewer according to a matching result.
In a particular embodiment or step, the interviewer's speech is collected by a speech recognition system; the method specifically comprises the following steps:
and after the pronunciation of the interviewer is collected, the voice is added into the neural network model for operation, and the voice map of the user is obtained.
In a particular embodiment, the method further comprises:
classifying different disciplines in different industries;
carrying out secondary classification on the male and female voices; after ensuring that each subject and each sex have more than 30 excellent samples and common samples;
analyzing waveforms of the excellent sample and the general sample, and forming a model map of the tone after Hash matching and training of a convolutional neural network;
matching the voice map of the user with the model map to obtain a more excellent or general voice map;
and corresponding scoring is performed according to the different degrees of closeness to the model.
In a second aspect, there is provided an interview utterance evaluation system, including:
a voice recognition system: collecting the speech of an interviewee;
evaluation system: collecting the collected face by a voice recognition systemCarrying out Hash comparison on pronunciation in the speech of the testee and the original text; scoring the interviewer's pronunciation according to the comparison result; according to the number of phonemes per sentence (N) the interviewer reads over U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); finally, the average deviation degree (S) is obtained uniformly according to the speech speed of different sentences (P) in the example; comparing the average deviation degree with a set speech rate; and determining the interviewer's speech rate score.
In the technical scheme, the voice and the tone of the interviewee are respectively analyzed, and after the audio of the interviewee is obtained, similarity matching can be carried out on the voice and the tone of the interviewee with the standard phoneme after the voice of the reader is subjected to word segmentation. After all the words are matched, the corresponding scores can be obtained. And in the aspect of the speech rate, the balance judgment can be carried out according to the number and the time of the contents read by the reader, and whether the speech rate of a specific sentence in the contents read by the reader is close to the standard speech rate or not is judged in a sentence-by-sentence manner instead of the whole time. Therefore, the interviewer can accurately obtain the speaking effect.
In a particular embodiment, the evaluation system is particularly adapted to: determining the offset according to a formula:
Figure BDA0003946372670000031
in a particular embodiment, the evaluation system is used in particular for: obtaining the average deviation degree according to a formula:
Figure BDA0003946372670000032
in a third aspect, an electronic device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements a method for performing the first aspect and any one of the possible designs of the first aspect when executing the program.
In a fourth aspect, a non-transitory computer-readable storage medium is provided, which stores computer instructions for causing the computer to perform the first aspect and any one of the possible design methods of the first aspect.
In a fifth aspect, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the possible designs of the first aspect and the first aspect of the present application.
In addition, the technical effects brought by any one of the possible design manners in the third aspect to the fifth aspect may be referred to the effects brought by different design manners in the method portion, and are not described herein again.
Drawings
Fig. 1 is a flowchart of an interview utterance evaluation method in the prior art;
fig. 2 is a block diagram illustrating a structure of an interview speech evaluation system according to an embodiment of the present disclosure;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The present application is described in further detail below with reference to the figures and examples. The features and advantages of the present application will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
In addition, the technical features related to the different embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.
To facilitate understanding of the interview utterance evaluation method provided by the embodiment of the present application, an application scenario thereof is first described. The interview speech evaluation method provided by the embodiment of the application is used for evaluating the speech effect of an interviewer and improving the speech effect of a training interviewer. There is no standard reference training method, and an interview utterance evaluation method is provided in the embodiments of the present application, which will be described in detail below with reference to specific embodiments.
In the embodiment of the application, interview voices of interviewers are collected and extracted by constructing an interview speaking standard (the speaking standard comprises standard voice speed and voice tone), virtual listeners are constructed through an algorithm, and the voices, the voice speed and the voice tone are analyzed by the virtual listeners. The voice analysis carries out word segmentation and voice segmentation through an original sample, carries out voice waveform matching on the word segmentation and the word segmentation of an interviewer, carries out voice scoring through similarity, and simultaneously considers the different pronunciations of males and females, so that the form of phonemes is adopted for carrying out voice accuracy judgment. Intonation partials are sampled by collecting accurate sample samples, maps are built according to the change of sound waveforms, different maps are analyzed after sampling is carried out according to sex and classification samples, similarity judgment is carried out according to the maps of voices of interviewers, and the judged similarity is given to assigning marks. The speech rate is compared with the standard phoneme according to the quantity and time of phonemes read by the interviewer, the accuracy of the speech rate is judged, and an offset is given. According to different requirements of different interview directions on the speed and tone of the voice, scores can be given after certain weight is given. Accurately and objectively evaluating the speech of the interviewer and improving the interviewing capability of the interviewer. The details will be described below.
Referring to fig. 1, fig. 1 shows a flowchart of an interview speech evaluation method provided by an embodiment of the present application. The interview speech evaluation method provided by the embodiment of the application comprises the following steps of:
step 001: collecting the speech of an interviewer through a voice recognition system;
specifically, the pronunciation mainly refers to the accuracy of pronunciation, and pronunciation accuracy of self-introduction of students requires that listeners can understand meanings and hear pronunciation of words, so that a set of Kaldi voice recognition system is trained, external equipment can be freely expanded, and voice recording can be carried out through various access equipment microphones.
In addition, the system can also be used for adding the voice into the neural network model for operation after collecting the pronunciation of the interviewer to obtain the voice map of the user.
Step 002: carrying out Hash comparison on pronunciation in the speech of the interviewee and the original text through a voice recognition system; scoring the pronunciation of the interviewer according to the comparison result;
specifically, the Kaldi speech recognition system simulates a presence listener, forms a text of a machine listener through understanding and judgment of the listener, and performs hash comparison with an original text to realize pronunciation accuracy judgment.
And during specific judgment, converting the pronunciation of the interviewer into a waveform, matching the waveform with a waveform diagram of a professional, and scoring the pronunciation of the interviewer according to a matching result.
Step 003: comparing the Number (NU) and Time (TU) of phonemes of each sentence read by the interviewer with the number (N) and time (T) of standard phonemes, judging the accuracy of the speech rate and giving an offset (G);
specifically, in the aspect of speech speed, the best hundred-character reading speed is found by extracting the length and the word number of the interview voice for analysis and comparison, different students are invited to read the hundred-character reading speed, and the best speed is determined. Because interview morphemes are often not constant, a sentence morpheme comparison mode is adopted for unified scoring.
The above-mentioned number of phonemes per sentence according to the interviewer's interpretation (N) U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); the method comprises the following specific steps: determining the offset according to a formula:
Figure BDA0003946372670000051
step 004: finally, uniformly obtaining an average deviation degree (S) according to the speech speeds of different sentences (P) in the example;
specifically, the average deviation degree (S) is obtained according to the speech speed of different sentences (P) in the example finally and uniformly; specifically, the method comprises the following steps of; obtaining the average deviation degree according to a formula:
Figure BDA0003946372670000052
after the average deviation degree is obtained, the voice can be given according to the requirements of different industries on the speed and the intonation of the voice.
Step 005: comparing the average deviation degree with a set speech rate; and determining the interviewer's speech rate score.
Specifically, the speech rate is set according to dubbing of speech by a speech professional, and the intonation of the speech professional is analyzed to obtain a oscillogram; and extracting intonation features in the oscillogram to obtain the set speech rate.
The tone mainly shows the rhythm of the whole speech and the feeling of resisting the yangtong frustration. In the aspect, a professional is asked to dub the voice, the intonation of the voice is analyzed, intonation feature points are extracted from the oscillogram, the matching degree of the intonation feature points and the feature points of students is calculated, and reasonable scores are given.
In the method, different disciplines of different industries are classified; carrying out secondary classification on the male and female voices; after ensuring that each subject and each sex have more than 30 excellent samples and common samples; analyzing waveforms of the excellent sample and the general sample, and forming a model map of the tone after Hash matching and training of a convolutional neural network; matching the voice map of the user with the model map to obtain a voice map closer to excellence or generality; and corresponding scoring is performed according to the different degrees of closeness to the model.
Specifically, after classifying different subjects in different industries, performing secondary classification on the voices of the male and the female. After ensuring that each subject and each gender have 30 or more excellent samples and general samples, analyzing waveforms of the excellent samples and the general samples, and forming a tone map after hash matching and training of a convolutional neural network. The map reflects the common characteristics of excellent samples and common characteristics of general samples. Therefore, after the pronunciation of the user is collected, the voice is added into the neural network model for operation, the voice map of the user is obtained, the map is matched with the model map, and the result is closer to excellence or generality. And corresponding scoring is performed according to the different degrees of closeness to the model.
In the method, the self-introduction of job hunting of college students is not reciting and professional broadcasting, and is an effective expression close to spoken language, and the voice related analysis data is extracted through a deep learning technology by collecting the actual voice pronunciation data and the actual self-introduction of the students, and a scoring model is established.
In tone recognition, the method adopts a large number of excellent expression samples and general expression samples of actual interviews, classifies and cleans the excellent expression samples and general expression samples, performs atlas training of a convolutional neural network after each gender of each category has sufficient samples, can obtain characteristic atlases of excellent voice and general voice after multiple rounds of training, converts audio recorded by a user into waveforms according to the gender of the user after the user selects reading content, and matches the waveforms with the excellent voice and the general voice. Scoring is based on the similarity to the respective excellent speech and general speech.
In speech recognition, the method adopts a virtual listener form, extracts phonemes read by standard speech, and performs word segmentation on standard audio. After the reader audio is obtained, similarity matching can be carried out on the reader audio after word segmentation and the standard phoneme. After all the words are matched, the corresponding scores can be obtained. And in the aspect of the speech rate, the balance judgment can be carried out according to the number and time of the reading contents of the readers, and whether the speech rate of a specific sentence in the reading contents is close to the standard speech rate or not is respectively judged in a sentence manner instead of adopting the whole time.
It can be seen from the above description that, in the embodiment of the present application, by analyzing the voice and the intonation of the interviewee respectively, after obtaining the audio of the interviewee, similarity matching can be performed with the standard phoneme after the audio of the reader is participated. After all the words are matched, the corresponding scores can be obtained. And in the aspect of the speech rate, the balance judgment can be carried out according to the number and time of the reading contents of the readers, and whether the speech rate of a specific sentence in the reading contents is close to the standard speech rate or not is respectively judged in a sentence manner instead of adopting the whole time. Therefore, the interviewer can accurately obtain the speaking effect.
As shown in fig. 2, an embodiment of the present application further provides an interview speech evaluation system 20, which includes: a speech recognition system 10 and an evaluation system 20. Wherein, the voice recognition system 10 is used for collecting the speech of the interviewer; the evaluation system 20 is used for performing hash comparison on pronunciation in the speech of the interviewer and the original text through the speech recognition system 10; scoring the interviewer's pronunciation according to the comparison result; according to the number of phonemes per sentence (N) read by the interviewer U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); finally, the average deviation degree (S) is obtained uniformly according to the speech speed of different sentences (P) in the example; comparing the average deviation degree with a set speech speed; and determining the interviewer's speech rate score. In particular, reference is made to the description relating to the method.
In the technical scheme, the voice and the tone of the interviewer are respectively analyzed, and after the audio of the interviewer is obtained, similarity matching can be carried out on the audio of the reader and the standard phoneme after word segmentation. After all the words are matched, the corresponding scores can be obtained. And in the aspect of the speech rate, the balance judgment can be carried out according to the number and time of the reading contents of the readers, and whether the speech rate of a specific sentence in the reading contents is close to the standard speech rate or not is respectively judged in a sentence manner instead of adopting the whole time. Therefore, the interviewer can accurately obtain the speaking effect.
In a particular embodiment, the evaluation system 20 is particularly adapted to: determining the offset according to a formula:
Figure BDA0003946372670000071
in particular, reference is made to the description relating to the method.
In a particular embodiment, the evaluation system 20 is particularly adapted to: obtaining the average deviation degree according to a formula:
Figure BDA0003946372670000072
in particular, reference is made to the description relating to the method.
An embodiment of the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a method for implementing any one of the above possible designs when executing the program.
The present embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the above possible design methods.
Embodiments of the present application further provide a computer program product, which includes instructions that, when executed on a computer, cause the computer to perform any one of the possible design methods described above.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
For convenience of description, the above devices are described as being divided into various modules by functions, which are described separately. Of course, the functionality of the various modules may be implemented in the same one or more pieces of software and/or hardware in implementing one or more embodiments of the present description.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 3 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called by the processor 1010 for execution.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the description. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. An interview utterance evaluation method, comprising the steps of:
collecting the speech of an interviewer through a voice recognition system;
carrying out Hash comparison on pronunciation in the speech of the interviewer and the original text through a voice recognition system; scoring the pronunciation of the interviewer according to the comparison result;
according to the number of phonemes per sentence (N) the interviewer reads over U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G);
finally, the average deviation degree (S) is obtained uniformly according to the speech speed of different sentences (P) in the example;
comparing the average deviation degree with a set speech rate; and determining the interviewer's speech rate score.
2. The interview utterance evaluation method according to claim 2, wherein the number of phonemes per sentence (N) according to interviewer's spoken utterance is U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); the method specifically comprises the following steps:
determining the offset according to a formula:
Figure FDA0003946372660000011
3. the interview utterance evaluation method according to claim 2, wherein the average degree of deviation (S) is finally obtained uniformly according to the speech velocities of different sentences (P) in the example; specifically, the method comprises the following steps of;
obtaining the average deviation degree according to a formula:
Figure FDA0003946372660000012
4. the interview utterance evaluation method according to claim 1, wherein the set speech rate is:
dubbing voices according to voice professionals, and analyzing the intonation of the voice professionals to obtain a oscillogram; and extracting intonation features in the oscillogram to obtain the set speech rate.
5. The interview utterance evaluation method of claim 4, wherein the collected interviewer utterance is hashed to the original text by a speech recognition system; scoring the interviewer's pronunciation according to the comparison result; the method specifically comprises the following steps:
and converting the pronunciation of the interviewee into a waveform, matching the waveform with a oscillogram of a professional, and scoring the pronunciation of the interviewee according to a matching result.
6. The interview utterance evaluation method according to claims 1-5, wherein the interviewer's utterance is collected by a voice recognition system; the method specifically comprises the following steps:
and after the pronunciation of the interviewer is collected, the voice is added into the neural network model for operation, and the voice map of the user is obtained.
7. The interview utterance evaluation method of claim 6, further comprising:
classifying different disciplines in different industries;
carrying out secondary classification on the male and female voices; after ensuring that each subject and each sex have more than 30 excellent samples and common samples;
analyzing waveforms of the excellent sample and the general sample, and forming a model map of the tone after Hash matching and training of a convolutional neural network;
matching the voice map of the user with the model map to obtain a more excellent or general voice map;
and corresponding scoring is performed according to the different degrees of closeness to the model.
8. An interview utterance evaluation system, comprising:
a speech recognition system: collecting the speech of an interviewer;
evaluation system: carrying out Hash comparison on pronunciation in the speech of the interviewer and the original text through a voice recognition system; scoring the interviewer's pronunciation according to the comparison result; according to the number of phonemes per sentence (N) read by the interviewer U ) And time (T) U ) Comparing with the number (N) of standard phonemes and the time (T), judging the accuracy of the speech rate and giving an offset (G); finally, uniformly obtaining an average deviation degree (S) according to the speech speeds of different sentences (P) in the example; comparing the average deviation degree with a set speech speed; and determining the interviewer's speech rate score.
9. The interview utterance evaluation system of claim 8, wherein the evaluation system is specifically configured to: determining the offset according to a formula:
Figure FDA0003946372660000021
10. the interview utterance evaluation system of claim 9, wherein the evaluation system is specifically configured to: obtaining the average deviation degree according to a formula:
Figure FDA0003946372660000022
CN202211438104.1A 2022-11-16 2022-11-16 Interview speech evaluation method and system Pending CN115796653A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211438104.1A CN115796653A (en) 2022-11-16 2022-11-16 Interview speech evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211438104.1A CN115796653A (en) 2022-11-16 2022-11-16 Interview speech evaluation method and system

Publications (1)

Publication Number Publication Date
CN115796653A true CN115796653A (en) 2023-03-14

Family

ID=85438351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211438104.1A Pending CN115796653A (en) 2022-11-16 2022-11-16 Interview speech evaluation method and system

Country Status (1)

Country Link
CN (1) CN115796653A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117414135A (en) * 2023-10-20 2024-01-19 郑州师范学院 Behavioral and psychological abnormality detection method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109686383A (en) * 2017-10-18 2019-04-26 腾讯科技(深圳)有限公司 A kind of speech analysis method, device and storage medium
CN110265051A (en) * 2019-06-04 2019-09-20 福建小知大数信息科技有限公司 The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco
US20190385480A1 (en) * 2018-06-18 2019-12-19 Pearson Education, Inc. System to evaluate dimensions of pronunciation quality
DE102020134752A1 (en) * 2020-12-22 2022-06-23 Digi Sapiens - Digital Learning GmbH METHOD OF EVALUATING THE QUALITY OF READING A TEXT, COMPUTER PROGRAM PRODUCT, COMPUTER READABLE MEDIA AND EVALUATION DEVICE

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109686383A (en) * 2017-10-18 2019-04-26 腾讯科技(深圳)有限公司 A kind of speech analysis method, device and storage medium
US20190385480A1 (en) * 2018-06-18 2019-12-19 Pearson Education, Inc. System to evaluate dimensions of pronunciation quality
CN110265051A (en) * 2019-06-04 2019-09-20 福建小知大数信息科技有限公司 The sightsinging audio intelligent scoring modeling method of education is sung applied to root LeEco
DE102020134752A1 (en) * 2020-12-22 2022-06-23 Digi Sapiens - Digital Learning GmbH METHOD OF EVALUATING THE QUALITY OF READING A TEXT, COMPUTER PROGRAM PRODUCT, COMPUTER READABLE MEDIA AND EVALUATION DEVICE

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI WANG: "English Speech Recognition and Pronunciation Quality Evaluation Model Based on Neural Network", 《SCIENTIFIC PROGRAMMING》, vol. 2022, no. 2249722, pages 1 - 10 *
梁维谦,王国梁,刘加,刘润生: "基于音素的发音质量评价算法", 清华大学学报(自然科学版), vol. 45, no. 01, pages 5 - 8 *
黄羿博;张秋余;袁占亭;杨仲平: "融合MFCC和LPCC的语音感知哈希算法", 华中科技大学学报. 自然科学版, vol. 43, no. 2, pages 124 - 128 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117414135A (en) * 2023-10-20 2024-01-19 郑州师范学院 Behavioral and psychological abnormality detection method, system and storage medium

Similar Documents

Publication Publication Date Title
JP6902010B2 (en) Audio evaluation methods, devices, equipment and readable storage media
US8935167B2 (en) Exemplar-based latent perceptual modeling for automatic speech recognition
US11282503B2 (en) Voice conversion training method and server and computer readable storage medium
Weinberger et al. The Speech Accent Archive: towards a typology of English accents
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
US9087519B2 (en) Computer-implemented systems and methods for evaluating prosodic features of speech
US9489864B2 (en) Systems and methods for an automated pronunciation assessment system for similar vowel pairs
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
JP5007401B2 (en) Pronunciation rating device and program
JP2008158055A (en) Language pronunciation practice support system
CN110970036A (en) Voiceprint recognition method and device, computer storage medium and electronic equipment
Pravena et al. Development of simulated emotion speech database for excitation source analysis
CN111326177B (en) Voice evaluation method, electronic equipment and computer readable storage medium
KR20210071713A (en) Speech Skill Feedback System
Tirronen et al. The effect of the MFCC frame length in automatic voice pathology detection
CN104700831B (en) The method and apparatus for analyzing the phonetic feature of audio file
CN115796653A (en) Interview speech evaluation method and system
CN111785299B (en) Voice evaluation method, device, equipment and computer storage medium
CN112116181B (en) Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device
Kanwal et al. Identifying the evidence of speech emotional dialects using artificial intelligence: A cross-cultural study
CN115312030A (en) Display control method and device of virtual role and electronic equipment
Płonkowski Using bands of frequencies for vowel recognition for Polish language
JP2006201491A (en) Pronunciation grading device, and program
CN109344221B (en) Recording text generation method, device and equipment
CN110223206B (en) Lesson specialty direction determining method and system and lesson matching method and system for analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination