CN103559892A - Method and system for evaluating spoken language - Google Patents

Method and system for evaluating spoken language Download PDF

Info

Publication number
CN103559892A
CN103559892A CN201310554703.4A CN201310554703A CN103559892A CN 103559892 A CN103559892 A CN 103559892A CN 201310554703 A CN201310554703 A CN 201310554703A CN 103559892 A CN103559892 A CN 103559892A
Authority
CN
China
Prior art keywords
score
evaluation
test feature
feature
different
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310554703.4A
Other languages
Chinese (zh)
Other versions
CN103559892B (en
Inventor
王士进
刘丹
陈进
魏思
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310554703.4A priority Critical patent/CN103559892B/en
Publication of CN103559892A publication Critical patent/CN103559892A/en
Application granted granted Critical
Publication of CN103559892B publication Critical patent/CN103559892B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to the technical field of voice signal processing, and discloses a method and system for evaluating a spoken language. The method comprises the step of receiving a voice signal to be evaluated, the step of respectively obtaining voice fragments corresponding to basic voice units in the voice signal through at least two different voice recognition systems, the step of respectively extracting evaluating characteristics corresponding to different characteristics types from the voice fragments, the step of calculating original scores of the evaluating characteristics, the step of optimizing and fusing the original scores obtained based on the different voice recognition systems according to the characteristic types to obtain the comprehensive score of the evaluating characteristics, and the step of calculating the score of the voice signal according to the comprehensive score of the different evaluating characteristics. By means of the method and system for evaluating the spoken language, the accuracy of evaluating the spoken language can be improved, and abnormal grading can be reduced.

Description

Spoken evaluating method and system
Technical field
The present invention relates to voice process technology field, be specifically related to a kind of spoken evaluating method and system.
Background technology
As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with the aggravation of socioeconomic development and the trend of globalization, people have proposed more and more higher requirement to the objectivity of the efficiency of language learning and language assessment, fairness and scale test.Traditional artificial spoken language proficiency evaluating method is very limited Faculty and Students on instructional blocks of time and space, at aspects such as qualified teachers' strength, teaching place, funds expenditures, also has gap and the imbalance on many hardware; Artificial evaluation and test cannot be avoided evaluator's self individual deviation, thereby can not guarantee the unification of standards of grading, sometimes even cannot accurately reflect measured's true horizon; And for extensive oral test, need a large amount of human and material resources and financial support, limited assessment test regular, scale.For this reason, industry has been developed some language teachings and evaluating system in succession.
In the prior art, spoken evaluating system adopts single recognizer to carry out speech recognition (as question-and-answer problem) or speech text alignment (as reading aloud topic) to the voice signal receiving conventionally, thereby obtains the voice snippet that each basic voice unit is corresponding.System is extracted and is described the feature that each basic voice unit pronunciation standard degree or fluency etc. are weighed spoken evaluating standard respectively from each voice snippet subsequently, finally based on described feature, by forecast analysis, obtains evaluating and testing final score.
While using the sound pick-up outfit of high-fidelity under quiet environment, speech recognition system is owing to providing higher recognition accuracy thereby follow-up spoken evaluation and test also can provide comparatively objective and accurate result.Yet in actual applications particularly for extensive SET, playback environ-ment unavoidably can be subject to the impact of the factors such as examination hall noise, neighbourhood noise, and speech recognition accuracy rate declines and to cause there will be in spoken evaluation and test process a certain proportion of abnormal scoring voice.It is real practical that obvious this phenomenon is difficult to extensive SET Computer automatic scoring, range of application and the popularization of spoken evaluating system have been limited, to a lot of vital examinations, cannot apply, once otherwise occur that abnormal scoring will cause the accident of marking examination papers.
Summary of the invention
The embodiment of the present invention provides a kind of spoken evaluating method and system, to improve the accuracy of spoken evaluation and test, reduces scoring extremely.
For this reason, the invention provides following technical scheme:
An evaluating method, comprising:
Receive voice signal to be evaluated;
Utilize at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit in described voice signal;
From described voice snippet, extract respectively the evaluation and test feature of corresponding different characteristic type;
Calculate the original score of described evaluation and test feature;
According to described characteristic type, the described original score obtaining based on different phonetic recognition system is optimized to fusion, obtains the integrate score of described evaluation and test feature;
According to the integrate score of difference evaluation and test feature, calculate the score of described voice signal.
Preferably, described characteristic type comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features.
Preferably, the original score of the described evaluation and test feature of described calculating comprises:
Load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate described evaluation and test feature corresponding to the similarity of described score in predicting model, and the original score using described similarity as described evaluation and test feature.
Preferably, the score in predicting model of the same characteristic type of corresponding different topic types is different.
Preferably, describedly according to described characteristic type, the described original score obtaining based on different phonetic recognition system is optimized to fusion, the integrate score that obtains described evaluation and test feature comprises:
For the original score of the evaluation and test feature obtaining based on different phonetic recognition system of same characteristic type, get wherein maximum score or meta score or average, as the integrate score of described evaluation and test feature.
An evaluating system, comprising:
Receiver module, for receiving voice signal to be evaluated;
Voice snippet acquisition module, for utilizing at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit of described voice signal;
Characteristic extracting module, for extracting respectively the evaluation and test feature of corresponding different characteristic type from described voice snippet;
Computing module, for calculating the original score of described evaluation and test feature;
Optimization fusion module, for according to described characteristic type, the described original score obtaining based on different phonetic recognition system being optimized to fusion, obtains the integrate score of described evaluation and test feature;
Grading module, for calculating the score of described voice signal according to the integrate score of difference evaluation and test feature.
Preferably, described characteristic type comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features.
Preferably, described computing module comprises:
Loading unit, for loading the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Similarity calculated, for calculating described evaluation and test feature corresponding to the similarity of described score in predicting model, and the original score using described similarity as described evaluation and test feature.
Preferably, the score in predicting model of the same characteristic type of corresponding different topic types is different.
Preferably, described grading module, specifically for the original score of the evaluation and test feature obtaining based on different phonetic recognition system for same characteristic type, gets wherein maximum score or meta score or average, as the integrate score of described evaluation and test feature.
Spoken evaluating method and system that the embodiment of the present invention provides, by adopting the more voice recognition system comprehensive mode of marking respectively, identification and the abnormal situation of evaluation and test feature extraction that single system scoring brings have been reduced, and then reduced the error score that identification error brings, realized the accurately evaluation and test comprehensively to user's spoken language proficiency.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to the accompanying drawing of required use in embodiment be briefly described below, apparently, the accompanying drawing the following describes is only some embodiment that record in the present invention, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of the spoken evaluating method of the embodiment of the present invention;
Fig. 2 is the process flow diagram that builds score in predicting model in the embodiment of the present invention;
Fig. 3 is the structural representation of the spoken evaluating system of the embodiment of the present invention.
Embodiment
In order to make those skilled in the art person understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.
For being subject to the decline of such environmental effects speech recognition accuracy rate can cause occurring in spoken evaluation and test process the problem of a certain proportion of abnormal scoring voice in prior art, the embodiment of the present invention provides a kind of spoken evaluating method and system, first to voice signal to be evaluated, adopt multiple voice recognition method to identify, obtain multiple recognition result; Then from every kind of recognition result, extract respectively the evaluation and test feature based on different characteristic type, and calculate respectively scoring according to described evaluation and test feature; According to characteristic type, the described scoring of each recognition result is optimized and merges the comprehensive grading that obtains different characteristic type subsequently; Finally the comprehensive grading of different characteristic type is changed to the final score of determining described voice signal.
As shown in Figure 1, be the process flow diagram of the spoken evaluating method of the embodiment of the present invention, comprise the following steps:
Step 101, receives voice signal to be evaluated.
Step 102, adopts at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit in described voice signal.
Described basic voice unit can be syllable, phoneme etc.Different speech recognition systems by the acoustic feature based on different as based on MFCC(Mel-Frequency Cepstrum Coefficients, Mel-cepstrum coefficient) acoustic model of feature, based on PLP(Perceptual Linear Predictive, perception linear prediction) acoustic model of feature etc., or adopt different acoustic models as HMM-GMM(Hidden Markov Model-Gaussian Mixture Model, hidden Markov model-gauss hybrid models), based on DBN(Dynamic BeyesianNetwork, dynamic bayesian network) neural network acoustic model etc., even adopting different decoding processes searches for as Viterbi, A* search etc., voice signal is decoded.Like this, can obtain basic voice unit and the corresponding voice segment sequence of described voice signal.
Particularly, for question-and-answer problem etc., not having the voice signal of text marking to obtain the text that described voice signal is corresponding by continuous speech recognition is basic voice unit sequence, and the corresponding voice snippet of each basic voice unit.For reading aloud topic, wait the voice signal with model answer to adopt voice alignment thereof to obtain the time boundary of the corresponding voice snippet of each basic voice unit.
Because different speech recognition systems has different decoding advantages, between its recognition result, often there is certain complementarity.
Step 103 is extracted respectively the evaluation and test feature of corresponding different characteristic type from described voice snippet.
Described characteristic type can comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features etc.Wherein:
Described integrity feature is for describing basic voice unit sequence corresponding to described voice segment sequence corresponding to the text integrity degree of model answer.
In embodiments of the present invention, can, by described basic voice unit sequence is mated with the model answer network building in advance, obtain optimal path, using the matching degree of optimal path and voice unit sequence as integrity feature.
It should be noted that, for different topic types, the form of described model answer network can be different, such as, to reading aloud topic type, its model answer is topic face words sequence, and for semi-open topic types such as question-and-answer problems, its model answer often consists of the core words of determining and other complementary connection words.In addition due to the uncertainty of answer, its expression-form is often more, and corresponding model answer network consists of a plurality of model answers conventionally, shows as the model answer of a plurality of Answer Sentence formulas or grid configuration.
Certainly, when model answer is not unique, can also build according to the probability of occurrence of each model answer the model answer network of a Weight, and select corresponding weighted registration rate to calculate the matching degree of optimal path and voice unit sequence, using the matching degree of corresponding each voice unit as integrity feature.
Further, in the answer network of semi-open topic type, in answer definite core words check on one's answers the importance that correctness describes will be far above other connectivity words, in order to highlight the check on one's answers importance of integrity degree of core words, can be respectively to core words and connect the weight that words arranges different numerical value, in the model answer network of Weight, search for the optimal path of described basic voice unit sequence, and using the cumulative score of optimal path as matching degree.
Described pronunciation accuracy feature is for describing the pronunciation standard degree of each voice snippet.Particularly, can calculate respectively each voice snippet corresponding to its similarity of the default pronunciation acoustic model of corresponding basic voice unit, using described similarity as pronunciation accuracy feature.
Described fluency feature is for describing the smoothness of user's statement statement, includes but not limited to the average word speed of statement (as the ratio of voice duration and voice unit number etc.), the average flow length of statement, effectively pause ratio of statement etc.In addition, in order to compensate the difference of different speaker in word speed, can also adopt phoneme section feature, all pronunciation parts are normalized to rear common composition fluency feature.Particularly, can be by the duration discrete probability distribution of statistics context-free phoneme, the logarithm probability of duration scoring after calculating normalization, obtains segment length's scoring of phoneme.
Described prosodic features, for describing the rhythm feature of user pronunciation, comprises the features such as pitch variation fluctuating.Particularly, can extract the fundamental frequency characteristic sequence of each voice snippet, also can further obtain subsequently its dynamic change characterization, as extracted the prosodic features as a supplement such as first order difference, second order difference.
The evaluation and test feature of above-mentioned corresponding different characteristic type has been described respectively the feature of active user's pronunciation from different perspectives, has each other certain complementarity.
Step 104, calculates every kind of original score of evaluating and testing feature.
Evaluation and test feature for different characteristic type can load respectively corresponding score in predicting model and calculate described evaluation and test feature corresponding to the similarity of this score in predicting model, the original score using described similarity as described evaluation and test feature.
It should be noted that, in actual applications, can also load corresponding score in predicting model according to difference topic type, the score in predicting model of the same characteristic type of corresponding different topic types can be identical, also can be different, thus fineness and the accuracy of scoring further improved.The structure of each score in predicting model will describe in detail in the back.
Step 105, is optimized fusion according to described characteristic type to the described original score obtaining based on different phonetic recognition system, obtains the integrate score of described evaluation and test feature.
Because different speech recognition systems has adopted different recognizers or acoustic model, often there is different recognition results, the evaluation and test feature of the same characteristic type extracting based on different phonetic segment accordingly is also not quite similar, and the score of evaluation and test feature also exists certain complementarity (integrality, accuracy, fluency, the rhythm etc.).
In embodiments of the present invention, the original score of the evaluation and test feature for same characteristic type first obtaining for different phonetic recognition system is optimized fusion, weighs the user pronunciation level of this evaluation and test characteristic present comprehensively.Particularly, can adopt and get maximum according to the demand of difference examination and the number of speech recognition system, get median, the mode such as average is optimized fusion to described score.Such as, if the original phase-splitting difference that obtains of the evaluation and test feature obtaining based on different phonetic recognition system is in the threshold value of setting, the mean value of each original score is evaluated and tested to the integrate score of feature as this; If the original score of this evaluation and test feature that the original score of the evaluation and test feature that certain or some speech recognition systems obtain obtains higher than other speech recognition systems, gets near mean value maximal value wherein or maximal value as the integrate score of this evaluation and test feature.
By above-mentioned integrate score, can reduce to a certain extent the score abnormal conditions that individual voice recognition system is abnormal or evaluation and test feature extraction causes extremely.
Step 106, calculates the score of described voice signal according to the integrate score of difference evaluation and test feature.
After the fusion process of above-mentioned steps 105, can obtain the integrate score of different evaluation and test features.In embodiments of the present invention, can consider that the integrate score of dissimilar evaluation and test feature has certain correlativity from practical application, the conversion method based on linear regression, calculates PTS, i.e. the score of computing voice signal as follows:
S = 1 N Σ i = 1 N w i s i
Wherein, w ithe correlation parameter of respectively evaluating and testing feature, w ifor positive number, by system, set in advance and meet
Figure 20131055470341000021
s iit is the integrate score of respectively evaluating and testing feature; N is the number of integrate score.
Visible, the spoken evaluating method of the embodiment of the present invention, by adopting the more voice recognition system comprehensive mode of marking respectively, identification and the abnormal situation of evaluation and test feature extraction that single system scoring brings have been reduced, and then reduced the error score that identification error brings, realized the accurately evaluation and test comprehensively to user's spoken language proficiency.
Before mention, when calculating the score of evaluation and test feature, need to load the score in predicting model corresponding with the characteristic type of described evaluation and test feature.It should be noted that, described score in predicting model in advance off-line builds.
In embodiments of the present invention, score in predicting model arranges respectively for each characteristic type, its input is that the evaluation and test feature of a certain special characteristic of correspondence that extracts from voice snippet is (as integrity feature, pronunciation accuracy feature etc.), output is mark, is actually the mapping of having set up from evaluation and test feature to scoring.It should be noted that, every kind of evaluation and test feature has all been set up respectively to a score in predicting model.Further, the identical scoring characteristic type of corresponding different topic types, also can set up respectively corresponding score in predicting model.
As shown in Figure 2, be the process flow diagram that builds score in predicting model in the embodiment of the present invention, comprise the following steps:
Step 201, gathers scoring training data.
Particularly, can collect respectively to each exercise question a plurality of users' answer speech data, as scoring training data.
Step 202, manually marks described training data, comprises text marking and cutting and the artificial marking of spoken evaluation and test etc.
Described text marking refers to the conversion from speech-to-text.Cutting refers to by artificial monitoring, and continuous speech signal is divided, and determines the voice snippet that each basic voice unit is corresponding.The artificial marking of spoken evaluation and test refers to by the mode of artificial audiometry marks to spoken language proficiency.
In actual applications, can to above-mentioned different evaluation and test feature, mark respectively respectively, described evaluation and test feature comprises integrity feature, pronunciation accuracy feature, fluency feature, prosodic features etc.
Step 203, extracts respectively the evaluation and test feature of different characteristic type according to annotation results.
That is to say, according to the basic voice unit in annotation results and corresponding voice snippet, from described voice snippet, according to the mode of introducing, extract respectively the evaluation and test feature of different characteristic type above.
Step 204, utilizes described evaluation and test feature to build respectively the score in predicting model relevant to described characteristic type.
Particularly, can utilize forecasting techniques training under the guidance of artificial scoring to obtain the parameter of score in predicting model, then obtain score in predicting model.Further, can also set up respectively the score in predicting model relevant to topic type according to different test question types.
In embodiments of the present invention, need respectively specific evaluation and test feature to be set up to independent score in predicting model.Building process is roughly as follows:
First suppose that score in predicting model is for the mapping function of evaluation and test feature.As to integrity feature, its intrinsic dimensionality is 1, and this forecast model is linear function y=a*x+b, the pronunciation accuracy feature of x for extracting wherein, and y is the evaluation and test score of prediction, a, b is prediction model parameters.
Then from the training data obtaining in advance, extract the integrity feature X and the corresponding artificial integrity feature scoring Y that obtain each sample.Then at LSE(Least Squares Error, least mean-square error) or MSE(Mean Squared Error) under criterion training obtain a, the prediction model parameters of b.
Certainly score in predicting model is not limited to above-mentioned linear mapping function, can also adopt NN(Neural Network, neural network) etc. the method for statistical model, be not described in detail here.
Correspondingly, the embodiment of the present invention also provides a kind of spoken evaluating system, as shown in Figure 3, is the structural representation of this system.
In this embodiment, described system comprises:
Receiver module 301, for receiving voice signal to be evaluated.
Voice snippet acquisition module 302, for utilizing at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit of described voice signal.
Above-mentioned basic voice unit can be syllable, phoneme etc.Different speech recognition systems by the acoustic feature based on different as the acoustic model based on MFCC feature, acoustic model based on PLP feature etc., or adopt different acoustic models as HMM-GMM, neural network acoustic model based on DBN etc., even adopt different decoding processes to search for as Viterbi, A *search etc., decode to voice signal.Like this, can obtain basic voice unit and the corresponding voice segment sequence of described voice signal.
Particularly, for question-and-answer problem etc., not having the voice signal of text marking to obtain the text that described voice signal is corresponding by continuous speech recognition is basic voice unit sequence, and the corresponding voice snippet of each basic voice unit.For reading aloud topic, wait the voice signal with model answer to adopt voice alignment thereof to obtain the time boundary of the corresponding voice snippet of each basic voice unit.
Because different speech recognition systems has different decoding advantages, between its recognition result, often there is certain complementarity.
Characteristic extracting module 303, for extracting respectively the evaluation and test feature of corresponding different characteristic type from described voice snippet.
Described characteristic type can comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features etc., being defined in of various characteristic types has been described in detail above, do not repeat them here.
Computing module 304, for calculating the original score of described evaluation and test feature.
Optimization fusion module 305, for according to described characteristic type, the described original score obtaining based on different phonetic recognition system being optimized to fusion, obtains the integrate score of described evaluation and test feature.
Because different speech recognition systems has adopted different recognizers or acoustic model, often there is different recognition results, the evaluation and test feature of the same characteristic type extracting based on different phonetic segment accordingly is also not quite similar, and the score of evaluation and test feature also exists certain complementarity.
For this reason, in embodiments of the present invention, the original score of the evaluation and test feature for same characteristic type that optimization fusion module 305 obtains for different phonetic recognition system is optimized fusion, weighs the user pronunciation level of this evaluation and test characteristic present comprehensively.Particularly, optimization fusion module 305 can adopt and get maximum according to the demand of difference examination and the number of speech recognition system, gets median, the mode such as average is optimized fusion to described score.Such as, if original the phase-splitting difference of the evaluation and test feature obtaining based on different phonetic recognition system in the threshold value of setting, the integrate score of optimization fusion module 305 using the mean value of each original score as this evaluation and test feature; If the original score of this evaluation and test feature that the original score of the evaluation and test feature that certain or some speech recognition systems obtain obtains higher than other speech recognition systems, optimization fusion module 305 is got near mean value maximal value wherein or maximal value as the integrate score of this evaluation and test feature.
By above-mentioned integrate score, can reduce to a certain extent the score abnormal conditions that individual voice recognition system is abnormal or evaluation and test feature extraction causes extremely.
Grading module 306, calculates the score of described voice signal according to the integrate score of difference evaluation and test feature.
Grading module 306 can be based on linear regression conversion method, calculate PTS, concrete account form elaborates in the spoken evaluating method of the embodiment of the present invention above, does not repeat them here.
Visible, the spoken evaluating system of the embodiment of the present invention, by adopting the more voice recognition system comprehensive mode of marking respectively, identification and the abnormal situation of evaluation and test feature extraction that single system scoring brings have been reduced, and then reduced the error score that identification error brings, realized the accurately evaluation and test comprehensively to user's spoken language proficiency.
It should be noted that, in embodiments of the present invention, above-mentioned computing module 304 specifically can utilize the score in predicting model of corresponding different evaluation and test features to calculate described evaluation and test feature corresponding to the similarity of this score in predicting model, the original score using described similarity as described evaluation and test feature.
For this reason, a kind of implementation of described computing module 304 comprises: loading unit and similarity calculated (not shown).Wherein:
Described loading unit, for loading the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Described similarity calculated, for calculating described evaluation and test feature corresponding to the similarity of described score in predicting model, and the original score using described similarity as described evaluation and test feature.
It should be noted that, in actual applications, can also load corresponding score in predicting model according to difference topic type, the score in predicting model of the same characteristic type of corresponding different topic types can be identical, also can be different, thus fineness and the accuracy of scoring further improved.The structure of each score in predicting model will describe in detail in the back.
Above-mentioned score in predicting model in advance off-line builds, and concrete building process is described in detail above, does not repeat them here.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.System embodiment described above is only schematic, wherein said module or unit as separating component explanation can or can not be also physically to separate, the parts that show as module or unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to the some or all functions of the some or all parts in the spoken evaluating system of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing like this program of the present invention can be stored on computer-readable medium, or can have the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
Above the embodiment of the present invention is described in detail, has applied embodiment herein the present invention is set forth, the explanation of above embodiment is just for helping to understand method and apparatus of the present invention; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (10)

1. a spoken evaluating method, is characterized in that, comprising:
Receive voice signal to be evaluated;
Utilize at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit in described voice signal;
From described voice snippet, extract respectively the evaluation and test feature of corresponding different characteristic type;
Calculate the original score of described evaluation and test feature;
According to described characteristic type, the described original score obtaining based on different phonetic recognition system is optimized to fusion, obtains the integrate score of described evaluation and test feature;
According to the integrate score of difference evaluation and test feature, calculate the score of described voice signal.
2. method according to claim 1, is characterized in that, described characteristic type comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features.
3. method according to claim 1, is characterized in that, the original score of the described evaluation and test feature of described calculating comprises:
Load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate described evaluation and test feature corresponding to the similarity of described score in predicting model, and the original score using described similarity as described evaluation and test feature.
4. method according to claim 3, is characterized in that, the score in predicting model of the same characteristic type of corresponding different topic types is different.
5. according to the method described in claim 1 to 4 any one, it is characterized in that, describedly according to described characteristic type, the described original score obtaining based on different phonetic recognition system is optimized to fusion, the integrate score that obtains described evaluation and test feature comprises:
For the original score of the evaluation and test feature obtaining based on different phonetic recognition system of same characteristic type, get wherein maximum score or meta score or average, as the integrate score of described evaluation and test feature.
6. a spoken evaluating system, is characterized in that, comprising:
Receiver module, for receiving voice signal to be evaluated;
Voice snippet acquisition module, for utilizing at least two kinds of different speech recognition systems to obtain respectively voice snippet corresponding to each basic voice unit of described voice signal;
Characteristic extracting module, for extracting respectively the evaluation and test feature of corresponding different characteristic type from described voice snippet;
Computing module, for calculating the original score of described evaluation and test feature;
Optimization fusion module, for according to described characteristic type, the described original score obtaining based on different phonetic recognition system being optimized to fusion, obtains the integrate score of described evaluation and test feature;
Grading module, for calculating the score of described voice signal according to the integrate score of difference evaluation and test feature.
7. system according to claim 6, is characterized in that, described characteristic type comprise following one or more: integrity feature, pronunciation accuracy feature, fluency feature, prosodic features.
8. system according to claim 6, is characterized in that, described computing module comprises:
Loading unit, for loading the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Similarity calculated, for calculating described evaluation and test feature corresponding to the similarity of described score in predicting model, and the original score using described similarity as described evaluation and test feature.
9. system according to claim 8, is characterized in that, the score in predicting model of the same characteristic type of corresponding different topic types is different.
10. according to the system described in claim 6 to 9 any one, it is characterized in that,
Described grading module, specifically for the original score of the evaluation and test feature obtaining based on different phonetic recognition system for same characteristic type, gets wherein maximum score or meta score or average, as the integrate score of described evaluation and test feature.
CN201310554703.4A 2013-11-08 2013-11-08 Oral evaluation method and system Active CN103559892B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310554703.4A CN103559892B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310554703.4A CN103559892B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Publications (2)

Publication Number Publication Date
CN103559892A true CN103559892A (en) 2014-02-05
CN103559892B CN103559892B (en) 2016-02-17

Family

ID=50014119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310554703.4A Active CN103559892B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Country Status (1)

Country Link
CN (1) CN103559892B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575490A (en) * 2014-12-30 2015-04-29 苏州驰声信息科技有限公司 Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
CN104978971A (en) * 2014-04-08 2015-10-14 安徽科大讯飞信息科技股份有限公司 Oral evaluation method and system
CN106157974A (en) * 2015-04-07 2016-11-23 富士通株式会社 Text recites quality assessment device and method
CN106297828A (en) * 2016-08-12 2017-01-04 苏州驰声信息科技有限公司 The detection method of a kind of mistake utterance detection based on degree of depth study and device
CN107301862A (en) * 2016-04-01 2017-10-27 北京搜狗科技发展有限公司 A kind of audio recognition method, identification model method for building up, device and electronic equipment
CN107316255A (en) * 2017-04-07 2017-11-03 苏州清睿教育科技股份有限公司 A kind of efficient competition method competed online that shuttles
CN108831212A (en) * 2018-06-28 2018-11-16 深圳语易教育科技有限公司 A kind of oral English teaching auxiliary device and method
CN108831503A (en) * 2018-06-07 2018-11-16 深圳习习网络科技有限公司 A kind of method and device for oral evaluation
CN109102824A (en) * 2018-07-06 2018-12-28 北京比特智学科技有限公司 Voice error correction method and device based on human-computer interaction
CN109326162A (en) * 2018-11-16 2019-02-12 深圳信息职业技术学院 A kind of spoken language exercise method for automatically evaluating and device
CN110209561A (en) * 2019-05-09 2019-09-06 北京百度网讯科技有限公司 Evaluating method and evaluating apparatus for dialogue platform
CN110349453A (en) * 2019-06-26 2019-10-18 广东粤图之星科技有限公司 A kind of English learning system and method based on e-sourcing library
CN111105813A (en) * 2019-12-31 2020-05-05 科大讯飞股份有限公司 Reading scoring method, device, equipment and readable storage medium
CN111128238A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Mandarin assessment method and device
CN113096690A (en) * 2021-03-25 2021-07-09 北京儒博科技有限公司 Pronunciation evaluation method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618702B1 (en) * 2002-06-14 2003-09-09 Mary Antoinette Kohler Method of and device for phone-based speaker recognition
JP2006337667A (en) * 2005-06-01 2006-12-14 Ntt Communications Kk Pronunciation evaluating method, phoneme series model learning method, device using their methods, program and recording medium
CN101727903A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems
US20100145698A1 (en) * 2008-12-01 2010-06-10 Educational Testing Service Systems and Methods for Assessment of Non-Native Spontaneous Speech
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN102354495A (en) * 2011-08-31 2012-02-15 中国科学院自动化研究所 Testing method and system of semi-opened spoken language examination questions

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6618702B1 (en) * 2002-06-14 2003-09-09 Mary Antoinette Kohler Method of and device for phone-based speaker recognition
JP2006337667A (en) * 2005-06-01 2006-12-14 Ntt Communications Kk Pronunciation evaluating method, phoneme series model learning method, device using their methods, program and recording medium
CN101727903A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
US20100145698A1 (en) * 2008-12-01 2010-06-10 Educational Testing Service Systems and Methods for Assessment of Non-Native Spontaneous Speech
CN102354495A (en) * 2011-08-31 2012-02-15 中国科学院自动化研究所 Testing method and system of semi-opened spoken language examination questions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MOHAMED ELMAHDY ET AL: "Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition", 《INTERNATIONAL JOURNAL OF COMPUTATIONAL LINGUISTICS (IJCL)》, vol. 3, no. 1, 31 December 2012 (2012-12-31) *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978971A (en) * 2014-04-08 2015-10-14 安徽科大讯飞信息科技股份有限公司 Oral evaluation method and system
CN104978971B (en) * 2014-04-08 2019-04-05 科大讯飞股份有限公司 A kind of method and system for evaluating spoken language
CN104575490B (en) * 2014-12-30 2017-11-07 苏州驰声信息科技有限公司 Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
CN104575490A (en) * 2014-12-30 2015-04-29 苏州驰声信息科技有限公司 Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
CN106157974A (en) * 2015-04-07 2016-11-23 富士通株式会社 Text recites quality assessment device and method
CN107301862A (en) * 2016-04-01 2017-10-27 北京搜狗科技发展有限公司 A kind of audio recognition method, identification model method for building up, device and electronic equipment
CN106297828B (en) * 2016-08-12 2020-03-24 苏州驰声信息科技有限公司 Detection method and device for false sounding detection based on deep learning
CN106297828A (en) * 2016-08-12 2017-01-04 苏州驰声信息科技有限公司 The detection method of a kind of mistake utterance detection based on degree of depth study and device
CN107316255A (en) * 2017-04-07 2017-11-03 苏州清睿教育科技股份有限公司 A kind of efficient competition method competed online that shuttles
CN108831503A (en) * 2018-06-07 2018-11-16 深圳习习网络科技有限公司 A kind of method and device for oral evaluation
CN108831212A (en) * 2018-06-28 2018-11-16 深圳语易教育科技有限公司 A kind of oral English teaching auxiliary device and method
CN109102824A (en) * 2018-07-06 2018-12-28 北京比特智学科技有限公司 Voice error correction method and device based on human-computer interaction
CN109326162A (en) * 2018-11-16 2019-02-12 深圳信息职业技术学院 A kind of spoken language exercise method for automatically evaluating and device
CN110209561A (en) * 2019-05-09 2019-09-06 北京百度网讯科技有限公司 Evaluating method and evaluating apparatus for dialogue platform
CN110209561B (en) * 2019-05-09 2024-02-09 北京百度网讯科技有限公司 Evaluation method and evaluation device for dialogue platform
CN110349453A (en) * 2019-06-26 2019-10-18 广东粤图之星科技有限公司 A kind of English learning system and method based on e-sourcing library
CN111105813A (en) * 2019-12-31 2020-05-05 科大讯飞股份有限公司 Reading scoring method, device, equipment and readable storage medium
CN111128238A (en) * 2019-12-31 2020-05-08 云知声智能科技股份有限公司 Mandarin assessment method and device
CN111105813B (en) * 2019-12-31 2022-09-02 科大讯飞股份有限公司 Reading scoring method, device, equipment and readable storage medium
CN113096690A (en) * 2021-03-25 2021-07-09 北京儒博科技有限公司 Pronunciation evaluation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN103559892B (en) 2016-02-17

Similar Documents

Publication Publication Date Title
CN103559892B (en) Oral evaluation method and system
CN103559894B (en) Oral evaluation method and system
CN101826263B (en) Objective standard based automatic oral evaluation system
CN101740024B (en) Method for automatic evaluation of spoken language fluency based on generalized fluency
CN103594087B (en) Improve the method and system of oral evaluation performance
US8392190B2 (en) Systems and methods for assessment of non-native spontaneous speech
CN109785698B (en) Method, device, electronic equipment and medium for oral language level evaluation
CN101246685B (en) Pronunciation quality evaluation method of computer auxiliary language learning system
CN105845134A (en) Spoken language evaluation method through freely read topics and spoken language evaluation system thereof
US9489864B2 (en) Systems and methods for an automated pronunciation assessment system for similar vowel pairs
US9262941B2 (en) Systems and methods for assessment of non-native speech using vowel space characteristics
CN108766415B (en) Voice evaluation method
CN102214462A (en) Method and system for estimating pronunciation
CN104464423A (en) Calibration optimization method and system for speaking test evaluation
CN109697988B (en) Voice evaluation method and device
CN103985392A (en) Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
CN109979486B (en) Voice quality assessment method and device
Yin et al. Automatic cognitive load detection from speech features
CN112802456A (en) Voice evaluation scoring method and device, electronic equipment and storage medium
Ghanem et al. Pronunciation features in rating criteria
Gao et al. Spoken english intelligibility remediation with pocketsphinx alignment and feature extraction improves substantially over the state of the art
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
CN115116474A (en) Spoken language scoring model training method, scoring method, device and electronic equipment
Barczewska et al. Detection of disfluencies in speech signal
CN113763992A (en) Voice evaluation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant