CN103559894B - Oral evaluation method and system - Google Patents

Oral evaluation method and system Download PDF

Info

Publication number
CN103559894B
CN103559894B CN201310554431.8A CN201310554431A CN103559894B CN 103559894 B CN103559894 B CN 103559894B CN 201310554431 A CN201310554431 A CN 201310554431A CN 103559894 B CN103559894 B CN 103559894B
Authority
CN
China
Prior art keywords
evaluation
voice
test feature
unit
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310554431.8A
Other languages
Chinese (zh)
Other versions
CN103559894A (en
Inventor
魏思
王士进
刘丹
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xunfei Yi Heard Network Technology Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310554431.8A priority Critical patent/CN103559894B/en
Publication of CN103559894A publication Critical patent/CN103559894A/en
Application granted granted Critical
Publication of CN103559894B publication Critical patent/CN103559894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to voice process technology field, disclose a kind of oral evaluation method and system, the method comprises: receive voice signal to be evaluated; At least two kinds of different speech recognition systems are utilized to obtain the voice snippet that in described voice signal, each basic voice unit is corresponding respectively; The voice snippet obtained is merged, obtains the efficient voice fragment sequence of corresponding described voice signal; Evaluation and test feature is extracted from described efficient voice fragment sequence; Mark according to described evaluation and test feature.Utilize the present invention, the accuracy of oral evaluation can be improved, reduce abnormal scoring.

Description

Oral evaluation method and system
Technical field
The present invention relates to voice process technology field, be specifically related to a kind of oral evaluation method and system.
Background technology
As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with the aggravation of socioeconomic development and the trend of globalization, people propose more and more higher requirement to the objectivity of the efficiency of language learning and language assessment, fairness and scale test.Traditional artificial spoken language proficiency evaluating method makes Faculty and Students at instructional blocks of time and is spatially very limited, and also there is the gap on many hardware and imbalance in teacher strength, teaching place, funds expenditure etc.; Artificial evaluation and test cannot avoid the individual deviation of evaluator self, thus can not ensure the unification of standards of grading, sometimes even accurately cannot reflect the true horizon of measured; And for extensive oral test, then need a large amount of human and material resources and financial support, limit assessment test that is regular, scale.For this reason, industry have developed some language teachings and evaluating system in succession.
In the prior art, oral evaluation system adopts single recognizer to carry out speech recognition (as question-and-answer problem) or speech text alignment (as reading aloud topic) to the voice signal received usually, thus obtains voice snippet corresponding to each basic voice unit.System is extracted respectively and is described the feature that each basic voice unit pronunciation standard degree or fluency etc. weigh oral evaluation standard from each voice snippet subsequently, finally obtains evaluating and testing final score by forecast analysis based on described feature.
When using the sound pick-up outfit of high-fidelity under quiet environment, speech recognition system due to can provide higher recognition accuracy thus follow-up oral evaluation also can provide comparatively objective and accurate result.But in actual applications particularly for extensive SET, playback environ-ment is inevitably subject to the impact of the factors such as examination hall noise, neighbourhood noise, and speech recognition accuracy rate declines and causes there will be in oral evaluation process a certain proportion of exception and to mark voice.It is real practical that obvious this phenomenon makes extensive SET Computer automatic scoring be difficult to, limit oral evaluation systematic difference scope and popularization, cannot apply a lot of vital examination, otherwise once occur that abnormal scoring will cause accident of marking examination papers.
Summary of the invention
The embodiment of the present invention provides a kind of oral evaluation method and system, to improve the accuracy of oral evaluation, reduces abnormal scoring.
For this reason, the invention provides following technical scheme:
A kind of oral evaluation method, comprising:
Receive voice signal to be evaluated;
At least two kinds of different speech recognition systems are utilized to obtain the voice snippet that in described voice signal, each basic voice unit is corresponding respectively;
The voice snippet obtained is merged, obtains the efficient voice fragment sequence of corresponding described voice signal;
Evaluation and test feature is extracted from described efficient voice fragment sequence;
Mark according to described evaluation and test feature.
Preferably, the described voice snippet to obtaining merges, and the efficient voice fragment sequence obtaining corresponding described voice signal comprises:
The text that voice snippet different phonetic recognition system obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result;
The set of different corresponding unit is generated successively according to described Optimum Matching result, described corresponding unit refers to that the voice snippet that the different phonetic recognition system of its correspondence obtains exists plyability in time, and can the recognition result unit of correct match-on criterion answer network;
Determine the optimum cell in described set;
Splice the optimum cell in described set successively, obtain the efficient voice fragment sequence of corresponding described voice signal.
Preferably, describedly determine that the optimum cell in described set comprises:
Calculate acoustic model probability or the pronunciation posterior probability of the voice snippet of each corresponding unit in described set respectively;
Select the corresponding unit with maximum acoustic model probability or pronunciation posterior probability as the optimum cell in described set.
Preferably, the corresponding a kind of characteristic type of described evaluation and test feature, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Describedly carry out scoring according to described evaluation and test feature and comprise:
Load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described voice signal.
Preferably, described evaluation and test feature comprise corresponding different characteristic type at least two groups evaluation and test features, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Describedly carry out scoring according to described evaluation and test feature and comprise:
For often organizing evaluation and test feature, load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described evaluation and test feature;
The score of described voice signal is calculated according to the score often organizing evaluation and test feature.
A kind of oral evaluation system, comprising:
Receiver module, for receiving voice signal to be evaluated;
Voice snippet acquisition module, obtains for utilizing at least two kinds of different speech recognition systems the voice snippet that in described voice signal, each basic voice unit is corresponding respectively;
Fusion Module, merges for the voice snippet obtained described voice snippet acquisition module, obtains the efficient voice fragment sequence of corresponding described voice signal;
Characteristic extracting module, for extracting evaluation and test feature from described efficient voice fragment sequence;
Grading module, for marking according to described evaluation and test feature.
Preferably, described Fusion Module comprises:
Matching unit, the text that the voice snippet for different phonetic recognition system being obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result;
Set generation unit, for generating the set of different corresponding unit successively according to described Optimum Matching result, described corresponding unit refers to that the voice snippet that the different phonetic recognition system of its correspondence obtains exists plyability in time, and can the recognition result unit of correct match-on criterion answer network;
Determining unit, for determining the optimum cell in described set;
Concatenation unit, for splicing the optimum cell in described set successively, obtains the efficient voice fragment sequence of corresponding described voice signal.
Preferably, described determining unit comprises:
Computing unit, for calculating acoustic model probability or the pronunciation posterior probability of the voice snippet of each corresponding unit in described set respectively;
Selection unit, for selecting the corresponding unit with maximum acoustic model probability or pronunciation posterior probability as the optimum cell in described set.
Preferably, the corresponding a kind of characteristic type of described evaluation and test feature, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Institute's scoring module comprises:
Loading unit, for loading the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Computing unit, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described voice signal.
Preferably, described evaluation and test feature comprise corresponding different characteristic type at least two groups evaluation and test features, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Institute's scoring module comprises:
Loading unit, for often organizing evaluation and test feature, loads the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
First computing unit, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described evaluation and test feature;
Second computing unit, for calculating the score of described voice signal according to the score often organizing evaluation and test feature.
The oral evaluation method and system that the embodiment of the present invention provides, adopt multiple voice recognition method to identify to voice signal to be evaluated, obtain multiple voice segment sequence; Then fusion is carried out to these voice segment sequence and obtain efficient voice fragment sequence, finally carry out oral evaluation according to described efficient voice fragment sequence and obtain evaluation result.The method and system investigate validity and the rationality of object by the accuracy rate and oral evaluation improving voice identification result, greatly reduce the ratio that scoring is abnormal, thus meet the application demand of extensive SET better.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the process flow diagram of embodiment of the present invention oral evaluation method;
Fig. 2 is the match information schematic diagram of different phonetic recognition system recognition result in the embodiment of the present invention;
Fig. 3 is the process flow diagram building score in predicting model in the embodiment of the present invention;
Fig. 4 is the structural representation of embodiment of the present invention oral evaluation system;
Fig. 5 is a kind of specific implementation structural representation of Fusion Module in embodiment of the present invention oral evaluation system;
Fig. 6 is a kind of specific implementation structural representation of grading module in the embodiment of the present invention;
Fig. 7 is the another kind of specific implementation structural representation of grading module in the embodiment of the present invention.
Embodiment
In order to the scheme making those skilled in the art person understand the embodiment of the present invention better, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.
Can cause by the decline of such environmental effects speech recognition accuracy rate the problem occurring a certain proportion of abnormal scoring voice in oral evaluation process in prior art, the embodiment of the present invention provides a kind of oral evaluation method and system, first adopt multiple voice recognition method to identify to voice signal to be evaluated, obtain multiple voice segment sequence; Then fusion is carried out to these voice segment sequence and obtain efficient voice fragment sequence, finally carry out spoken language scoring according to described efficient voice fragment sequence and obtain evaluation result.
As shown in Figure 1, be the process flow diagram of embodiment of the present invention oral evaluation method, comprise the following steps:
Step 101, receives voice signal to be evaluated.
Step 102, utilizes at least two kinds of different speech recognition systems to obtain the voice snippet that in described voice signal, each basic voice unit is corresponding respectively.
Described basic voice unit can be syllable, phoneme etc.Different speech recognition systems is by based on different acoustic features (as based on the acoustic model of MFCC feature, the acoustic model etc. based on PLP feature) or adopt different acoustic model (as band the HMM-GMM acoustic model in discrimination dwelling, the neural network acoustic model etc. based on DBN) to decode to voice signal.Like this, the voice segment sequence of corresponding described voice signal can be obtained.
Particularly, the voice signal of text marking is not had can be obtained each basic voice unit segment of text corresponding to described voice signal and correspondence by continuous speech recognition for question-and-answer problem etc.Voice alignment thereof is then adopted to obtain the time boundary of each basic voice unit for the voice signal reading aloud topic etc. and have model answer.
Because different speech recognition systems has different decoding advantages, between its recognition result, often there is certain complementarity.
Step 103, merges the voice snippet obtained, obtains the efficient voice fragment sequence of corresponding described voice signal.
Because individual voice recognition system may cause the recognition result of partial error, and the speech recognition system with complementary characteristic is owing to having certain complementarity, therefore can largely on avoid this problem, and then by the choose reasonable of each voice snippet is improved each voice snippet scoring accuracy and rationality.
In embodiments of the present invention, the text that the voice snippet that can first different phonetic recognition system be obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result.Particularly, described text can be adopted DTW(DynamicTimeWarping in model answer network, dynamic time consolidation) algorithm calculates the cumulative probability of historical path, selects the historical path with maximum probability to be optimal path when search terminates.Such as, the recognition result that speech recognition system 1 obtains is " ABCDE ", and model answer net mate, acquisition Optimum Matching result " A(+) BC (+) D (+) E (+) ", i.e. A, C, in D, E unit and answer matches, and B does not match.
Then, comprehensive described Optimum Matching result generates effective unit sequence, described effective unit refer to can with model answer network, and there is the recognition result unit of plyability in the voice snippet that the different phonetic recognition system of its correspondence obtains in time.Determine the efficient voice segment corresponding to each effective unit sequence, splice the voice snippet that described optimum cell is corresponding successively, obtain the efficient voice fragment sequence of corresponding described voice signal.
When determining efficient voice segment corresponding to each effective unit, due to the voice snippet of effective unit and correspondence all may be there is in different recognition results, for this reason, first the set of different corresponding unit can be generated successively according to described Optimum Matching result, calculate acoustic model probability or the pronunciation posterior probability of the voice snippet that in described set, each unit is corresponding respectively, select the corresponding unit with maximum probability score as the optimum cell in described set.Then voice snippet corresponding for the optimum cell in each set obtained is spliced in chronological order, the efficient voice fragment sequence of corresponding described voice signal can be obtained.
Such as: suppose to have two speech recognition systems to export recognition result as shown in Figure 2 respectively, the recognition result that wherein speech recognition system 1 obtains is " ABCDE ", and the recognition result that speech recognition system 2 obtains is " AFCGE ".By above-mentioned two kinds of recognition results respectively with model answer net mate, obtain Optimum Matching result " A(+) BC (+) D (+) E (+) " and " A(+) F(+) C(+) GE(+) ".Described (+) is and can mates with model answer, is correct recognition result.In Fig. 2, vertical line is for describing the time boundary of each voice snippet.
The efficient voice fragment sequence obtained by fusion is " AFCDE ".Recognition result accuracy after obvious fusion has had obvious lifting than the recognition result accuracy of individual voice recognition system.
Step 104, extracts evaluation and test feature from described efficient voice fragment sequence.
It should be noted that, in actual applications, according to application needs, the evaluation and test feature of a certain characteristic type can be extracted, such as: the evaluation and test feature of integrity feature, characteristic types such as pronunciation accuracy characteristics, fluency feature, prosodic features etc., and mark according to described evaluation and test feature.
Certainly, also can extract the evaluation and test feature of various features type, that is, the evaluation and test feature of extraction can have two or more sets simultaneously, often group evaluates and tests a kind of characteristic type of feature correspondence, such as: integrity feature, pronunciation accuracy characteristics, fluency feature or prosodic features etc.
Described integrity feature is for describing the text integrity degree of speech unit sequence corresponding to described voice segment sequence corresponding to model answer.
In embodiments of the present invention, can by described basic speech unit sequence be mated with the model answer network built in advance, acquisition optimal path, using the matching degree of optimal path and speech unit sequence as integrity feature.
It should be noted that, for different topic types, the form of described model answer network can be different, and such as, be exactly topic face to reading aloud topic, question-and-answer problem is exactly some keywords, and picture talk or statement topic etc. are exactly some kernel sentences etc.
Question-and-answer problem and statement topic etc. have certain uncertainty due to its answer, and belong to semi-open topic type, thus its model answer often arranges multiple different answer according to crucial words, and model answer network can be multiple answer entries in form.
For Open-ended Question type, its model answer comprises the sentence of crucial words often.The importance of obvious crucial words is higher than other auxiliary words, so can arrange larger weight to crucial words, and arranges less weight to other auxiliary words, to improve the rationality of semantic matches.Therefore, for Open-ended Question type, the model answer network of a Weight can also be built according to the probability of occurrence of words crucial in each model answer, and search acquisition and speech unit sequence have the optimal path of highest similarity in described model answer network, and then using the matching degree of each for correspondence consistent with unit in optimal path in speech unit sequence voice unit as integrity feature.Described matching degree refers to the weighting weight corresponding to voice unit of each coupling.
Described pronunciation accuracy characteristics is for describing the pronunciation standard degree of each voice snippet.Particularly, the similarity of each voice snippet corresponding to the pronunciation acoustic model preset can be calculated respectively, using described similarity as pronunciation accuracy characteristics.
Described fluency feature, for describing the smoothness of user's statement statement, includes but not limited to the average word speed of statement (ratio etc. as voice duration and voice unit number), the average flow length of statement, statement effectively pause ratio etc.In addition, in order to compensate the difference of different speaker in word speed, phoneme section feature can also be adopted, rear common composition fluency feature is normalized to all pronunciation parts.Particularly, can by the duration discrete probability distribution of statistics context-free phoneme, the log probability that after calculating normalization, duration is marked, obtains segment length's scoring of phoneme.
Described prosodic features, for describing the rhythm feature of user pronunciation, comprises the features such as pitch variation fluctuating.Particularly, the fundamental frequency characteristic sequence of each voice snippet can be extracted, obtain its dynamic change characterization subsequently, as extracted first order difference, second order difference etc. as prosodic features.
The evaluation and test feature of above-mentioned corresponding different characteristic type describes the feature of active user's pronunciation respectively from different perspectives, has certain complementarity each other.
Step 105, marks according to described evaluation and test feature.
Evaluation and test feature for different characteristic type can load corresponding score in predicting model respectively and calculate the similarity of described evaluation and test feature corresponding to this score in predicting model.
It should be noted that, in actual applications, can also load corresponding score in predicting model according to difference topic type, the score in predicting model of the same characteristic type of corresponding different topic type can be identical, also can be different, thus improve fineness and the accuracy of scoring further.The structure of each score in predicting model will describe in detail below.
If be only extracted a kind of evaluation and test feature of characteristic type, then can using the above-mentioned described evaluation and test feature calculated corresponding to the similarity of score in predicting model as the score of described voice signal.
If be extracted the evaluation and test feature of various features type, then need the score of the above-mentioned similarity calculated as corresponding evaluation and test feature, and then calculate the score of described voice signal according to the score often organizing evaluation and test feature.Particularly, from practical application, can consider that the score of dissimilar evaluation and test feature has certain correlativity, based on the conversion method of linear regression, calculate PTS, namely calculate the score of voice signal as follows:
S = 1 N Σ i = 1 N w i s i
Wherein, w ithe correlation parameter of each evaluation and test feature, w ifor positive number, pre-set by system and meet s iit is the integrate score of each evaluation and test feature; N is the number of integrate score.
Visible, the oral evaluation method of the embodiment of the present invention, adopts multiple voice recognition method to identify to voice signal to be evaluated, obtains multiple voice segment sequence; Then fusion is carried out to these voice segment sequence and obtain efficient voice fragment sequence, finally carry out oral evaluation according to described efficient voice fragment sequence and obtain evaluation result.The method investigates validity and the rationality of object by the accuracy rate and oral evaluation improving voice identification result, greatly reduces the ratio that scoring is abnormal, thus meets the application demand of extensive SET better.
Mentioning above, when calculating the score of evaluation and test feature, needing to load the score in predicting model corresponding with the characteristic type of described evaluation and test feature.It should be noted that, described score in predicting model can build by off-line in advance.
As shown in Figure 3, be the process flow diagram building score in predicting model in the embodiment of the present invention, comprise the following steps:
Step 301, gathers scoring training data.
Particularly, the answer speech data of multiple user can be collected respectively to each exercise question, as scoring training data.
Step 302, manually marks described training data, comprises text marking and cutting and oral evaluation and manually gives a mark.
Described text marking refers to the conversion from speech-to-text.Cutting refers to by artificial monitoring, divides continuous speech signal, determines the voice snippet that each basic voice unit is corresponding.Oral evaluation is manually given a mark and is referred to that the mode by artificial audiometry is marked to spoken language proficiency.
In actual applications, can mark respectively to above-mentioned different evaluation and test feature respectively, described evaluation and test feature comprises integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features etc.
Step 303, extracts the evaluation and test feature of different characteristic type respectively according to annotation results.
That is, according to the voice snippet of the basic voice unit in annotation results and correspondence, from described voice snippet, the evaluation and test feature of different characteristic type is extracted above respectively according to the mode introduced.
Step 304, utilizes described evaluation and test feature to build the score in predicting model relevant to described characteristic type respectively.
Particularly, forecasting techniques training under the guidance of artificial scoring can be utilized to obtain the parameter of score in predicting model, obtain score in predicting model.Further, the score in predicting model relevant to topic type can also be set up respectively according to different Testing gateway.
Correspondingly, the embodiment of the present invention also provides a kind of oral evaluation system, as shown in Figure 4, is a kind of structural representation of this system.
In this embodiment, described system comprises: receiver module 401, voice snippet acquisition module 402, Fusion Module 403, characteristic extracting module 404 and grading module 405.Wherein:
Receiver module 401, for receiving voice signal to be evaluated.
Voice snippet acquisition module 402, obtains for utilizing at least two kinds of different speech recognition systems the voice snippet that in described voice signal, each basic voice unit is corresponding respectively.
Question-and-answer problem etc. be there is no to the voice signal of text marking, each basic voice unit segment of text corresponding to described voice signal and correspondence can be obtained by continuous speech recognition.And for reading aloud topic etc. and have the voice signal of model answer, voice alignment thereof can be adopted to obtain the time boundary of each basic voice unit.
Because different speech recognition systems has different decoding advantages, between its recognition result, often there is certain complementarity.
Fusion Module 403, merges for the voice snippet obtained described voice snippet acquisition module 402, obtains the efficient voice fragment sequence of corresponding described voice signal.
Characteristic extracting module 404, for extracting evaluation and test feature from described efficient voice fragment sequence.
Grading module 405, for marking according to described evaluation and test feature.
Because individual voice recognition system may cause the recognition result of partial error, and the speech recognition system with complementary characteristic is owing to having certain complementarity, therefore can largely on avoid this problem, and then by the choose reasonable of each voice snippet is improved each voice snippet scoring accuracy and rationality.
For this reason, in embodiments of the present invention, a kind of specific implementation structure of described Fusion Module 403 as shown in Figure 5.
In this embodiment, described Fusion Module comprises:
Matching unit 501, the text that the voice snippet for different phonetic recognition system being obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result;
Set generation unit 502, for generating the set of different corresponding unit successively according to described Optimum Matching result, described corresponding unit refers to that the voice snippet that the different phonetic recognition system of its correspondence obtains exists plyability in time, and can the recognition result unit of correct match-on criterion answer network;
Determining unit 503, for determining the optimum cell in described set;
Concatenation unit 504, for splicing the optimum cell in described set successively, obtains the efficient voice fragment sequence of corresponding described voice signal.
Above-mentioned determining unit 503 can comprise: computing unit and selection unit (not shown).Wherein: described computing unit is used for acoustic model probability or the pronunciation posterior probability of the voice snippet calculating each corresponding unit in described set respectively; Described selection unit is for selecting the corresponding unit with maximum acoustic model probability or pronunciation posterior probability as the optimum cell in described set.
By the fusion of above-mentioned Fusion Module to voice snippet, the recognition result accuracy after merging is made to have larger lifting than the recognition result accuracy of individual voice recognition system.
Visible, the oral evaluation system of the embodiment of the present invention, adopts multiple voice recognition method to identify to voice signal to be evaluated, obtains multiple voice segment sequence; Then fusion is carried out to these voice segment sequence and obtain efficient voice fragment sequence, finally carry out oral evaluation according to described efficient voice fragment sequence and obtain evaluation result.The method investigates validity and the rationality of object by the accuracy rate and oral evaluation improving voice identification result, greatly reduces the ratio that scoring is abnormal, thus meets the application demand of extensive SET better.
It should be noted that, in actual applications, characteristic extracting module 404 can according to application needs, extract the evaluation and test feature of a certain characteristic type, such as: the evaluation and test feature of integrity feature, characteristic types such as pronunciation accuracy characteristics, fluency feature, prosodic features etc., and mark according to described evaluation and test feature.Certainly, also can extract the evaluation and test feature of various features type, that is, the evaluation and test feature of extraction can have two or more sets simultaneously, often group evaluates and tests a kind of characteristic type of feature correspondence, such as: integrity feature, pronunciation accuracy characteristics, fluency feature or prosodic features etc.
The concrete meaning of above-mentioned various types of evaluation and test feature and extracting mode illustrate existing above, do not repeat them here.The evaluation and test feature of these corresponding different characteristic types describes the feature of active user's pronunciation respectively from different perspectives, has certain complementarity each other.
When illustrating different for the evaluation and test feature extracted below respectively, the specific implementation of institute's scoring module.
As shown in Figure 6, be a kind of specific implementation structural representation of grading module in the embodiment of the present invention.
In this embodiment, institute's scoring module comprises:
Loading unit 601, for the score in predicting model that the characteristic type loaded with evaluate and test feature is corresponding;
Computing unit 602, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described voice signal.
The grading module of this embodiment, for the evaluation and test feature of a certain characteristic type that characteristic extracting module is extracted, by calculating the similarity of this evaluation and test feature corresponding to score in predicting model, and using the score of described similarity as described voice signal.
As shown in Figure 7, be the another kind of specific implementation structural representation of grading module in the embodiment of the present invention.
In this embodiment, institute's scoring module comprises:
Loading unit 701, for often organizing evaluation and test feature, loads the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
First computing unit 702, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described evaluation and test feature;
Second computing unit 703, for calculating the score of described voice signal according to the score often organizing evaluation and test feature.
Consider that the score of dissimilar evaluation and test feature has certain correlativity, the second computing unit 703 based on the conversion method of linear regression, can calculate PTS, namely calculates the score of voice signal as follows:
S = 1 N Σ i = 1 N w i s i
Wherein, w ithe correlation parameter of each evaluation and test feature, w ifor positive number, pre-set by system and meet s iit is the integrate score of each evaluation and test feature; N is the number of integrate score.
The grading module of this embodiment, for the evaluation and test feature of the multiple different characteristic type that characteristic extracting module is extracted, by calculating the similarity of this evaluation and test feature corresponding to score in predicting model, often organized the score of evaluation and test feature, then the score of voice signal is calculated according to the score often organizing evaluation and test feature, further increase validity and the rationality of oral evaluation, significantly reduce the ratio that scoring is abnormal.
It should be noted that, the score in predicting model that the characteristic type of above-mentioned and different evaluation and test feature is corresponding can build by off-line in advance, is described in detail, does not repeat them here above.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.System embodiment described above is only schematic, the wherein said module that illustrates as separating component or unit or can may not be and physically separate, parts as module or unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the oral evaluation system of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realize program of the present invention like this can store on a computer-readable medium, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
Being described in detail the embodiment of the present invention above, applying embodiment herein to invention has been elaboration, the explanation of above embodiment just understands method and apparatus of the present invention for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (8)

1. an oral evaluation method, is characterized in that, comprising:
Receive voice signal to be evaluated;
At least two kinds of different speech recognition systems are utilized to obtain the voice snippet that in described voice signal, each basic voice unit is corresponding respectively;
The voice snippet obtained is merged, obtains the efficient voice fragment sequence of corresponding described voice signal;
Evaluation and test feature is extracted from described efficient voice fragment sequence;
Mark according to described evaluation and test feature;
Wherein, the described voice snippet to obtaining merges, and the efficient voice fragment sequence obtaining corresponding described voice signal comprises:
The text that voice snippet different phonetic recognition system obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result;
The set of different corresponding unit is generated successively according to described Optimum Matching result, described corresponding unit refers to that the voice snippet that the different phonetic recognition system of its correspondence obtains exists plyability in time, and can the recognition result unit of correct match-on criterion answer network;
Determine the optimum cell in described set;
Splice the optimum cell in described set successively, obtain the efficient voice fragment sequence of corresponding described voice signal.
2. method according to claim 1, is characterized in that, describedly determines that the optimum cell in described set comprises:
Calculate acoustic model probability or the pronunciation posterior probability of the voice snippet of each corresponding unit in described set respectively;
Select the corresponding unit with maximum acoustic model probability or pronunciation posterior probability as the optimum cell in described set.
3. method according to claim 1, is characterized in that, the corresponding a kind of characteristic type of described evaluation and test feature, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Describedly carry out scoring according to described evaluation and test feature and comprise:
Load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described voice signal.
4. method according to claim 1, it is characterized in that, described evaluation and test feature comprise corresponding different characteristic type at least two groups evaluation and test features, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Describedly carry out scoring according to described evaluation and test feature and comprise:
For often organizing evaluation and test feature, load the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Calculate the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described evaluation and test feature;
The score of described voice signal is calculated according to the score often organizing evaluation and test feature.
5. an oral evaluation system, is characterized in that, comprising:
Receiver module, for receiving voice signal to be evaluated;
Voice snippet acquisition module, obtains for utilizing at least two kinds of different speech recognition systems the voice snippet that in described voice signal, each basic voice unit is corresponding respectively;
Fusion Module, merges for the voice snippet obtained described voice snippet acquisition module, obtains the efficient voice fragment sequence of corresponding described voice signal;
Characteristic extracting module, for extracting evaluation and test feature from described efficient voice fragment sequence;
Grading module, for marking according to described evaluation and test feature;
Described Fusion Module comprises:
Matching unit, the text that the voice snippet for different phonetic recognition system being obtained is corresponding carries out Dynamic Matching with the model answer network built in advance respectively, obtains Optimum Matching result;
Set generation unit, for generating the set of different corresponding unit successively according to described Optimum Matching result, described corresponding unit refers to that the voice snippet that the different phonetic recognition system of its correspondence obtains exists plyability in time, and can the recognition result unit of correct match-on criterion answer network;
Determining unit, for determining the optimum cell in described set;
Concatenation unit, for splicing the optimum cell in described set successively, obtains the efficient voice fragment sequence of corresponding described voice signal.
6. system according to claim 5, is characterized in that, described determining unit comprises:
Computing unit, for calculating acoustic model probability or the pronunciation posterior probability of the voice snippet of each corresponding unit in described set respectively;
Selection unit, for selecting the corresponding unit with maximum acoustic model probability or pronunciation posterior probability as the optimum cell in described set.
7. system according to claim 5, is characterized in that, the corresponding a kind of characteristic type of described evaluation and test feature, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Institute's scoring module comprises:
Loading unit, for loading the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
Computing unit, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described voice signal.
8. system according to claim 5, it is characterized in that, described evaluation and test feature comprise corresponding different characteristic type at least two groups evaluation and test features, described characteristic type be following any one: integrity feature, pronunciation accuracy characteristics, fluency feature, prosodic features;
Institute's scoring module comprises:
Loading unit, for often organizing evaluation and test feature, loads the score in predicting model corresponding with the characteristic type of described evaluation and test feature;
First computing unit, for calculating the similarity of described evaluation and test feature corresponding to described score in predicting model, and using the score of described similarity as described evaluation and test feature;
Second computing unit, for calculating the score of described voice signal according to the score often organizing evaluation and test feature.
CN201310554431.8A 2013-11-08 2013-11-08 Oral evaluation method and system Active CN103559894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310554431.8A CN103559894B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310554431.8A CN103559894B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Publications (2)

Publication Number Publication Date
CN103559894A CN103559894A (en) 2014-02-05
CN103559894B true CN103559894B (en) 2016-04-20

Family

ID=50014121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310554431.8A Active CN103559894B (en) 2013-11-08 2013-11-08 Oral evaluation method and system

Country Status (1)

Country Link
CN (1) CN103559894B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978971B (en) * 2014-04-08 2019-04-05 科大讯飞股份有限公司 A kind of method and system for evaluating spoken language
CN103928023B (en) * 2014-04-29 2017-04-05 广东外语外贸大学 A kind of speech assessment method and system
CN104464757B (en) * 2014-10-28 2019-01-18 科大讯飞股份有限公司 Speech evaluating method and speech evaluating device
CN104318921B (en) * 2014-11-06 2017-08-25 科大讯飞股份有限公司 Segment cutting detection method and system, method and system for evaluating spoken language
CN105845134B (en) * 2016-06-14 2020-02-07 科大讯飞股份有限公司 Spoken language evaluation method and system for freely reading question types
CN109697988B (en) * 2017-10-20 2021-05-14 深圳市鹰硕教育服务有限公司 Voice evaluation method and device
CN107894882B (en) * 2017-11-21 2021-02-09 南京硅基智能科技有限公司 Voice input method of mobile terminal
CN107945788B (en) * 2017-11-27 2021-11-02 桂林电子科技大学 Method for detecting pronunciation error and scoring quality of spoken English related to text
CN108597538B (en) * 2018-03-05 2020-02-11 标贝(北京)科技有限公司 Evaluation method and system of speech synthesis system
CN108829894B (en) * 2018-06-29 2021-11-12 北京百度网讯科技有限公司 Spoken word recognition and semantic recognition method and device
CN109308118B (en) * 2018-09-04 2021-12-14 安徽大学 Chinese eye writing signal recognition system based on EOG and recognition method thereof
CN109300474B (en) * 2018-09-14 2022-04-26 北京网众共创科技有限公司 Voice signal processing method and device
CN109273023B (en) * 2018-09-20 2022-05-17 科大讯飞股份有限公司 Data evaluation method, device and equipment and readable storage medium
CN110069772B (en) * 2019-03-12 2023-10-20 平安科技(深圳)有限公司 Device, method and storage medium for predicting scoring of question-answer content
CN111833853B (en) * 2020-07-01 2023-10-27 腾讯科技(深圳)有限公司 Voice processing method and device, electronic equipment and computer readable storage medium
CN111916108B (en) * 2020-07-24 2021-04-02 北京声智科技有限公司 Voice evaluation method and device
CN112331180A (en) * 2020-11-03 2021-02-05 北京猿力未来科技有限公司 Spoken language evaluation method and device
CN112951274A (en) * 2021-02-07 2021-06-11 脸萌有限公司 Voice similarity determination method and device, and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645271A (en) * 2008-12-23 2010-02-10 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101645271A (en) * 2008-12-23 2010-02-10 中国科学院声学研究所 Rapid confidence-calculation method in pronunciation quality evaluation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A POST-PROCESSING SYSTEM TO YIELD REDUCED WORD ERROR RATES:RECOGNIZER OUTPUT VOTING ERROR REDUCTION (ROVER);Jonathan G. Fiscus;《Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on》;19971217;全文 *
Spoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers’ Outputs;Satoshi Natori et al;《Journal of Information Processing》;20130430;第21卷(第2期);第176-179页 *

Also Published As

Publication number Publication date
CN103559894A (en) 2014-02-05

Similar Documents

Publication Publication Date Title
CN103559894B (en) Oral evaluation method and system
CN103559892B (en) Oral evaluation method and system
CN101740024B (en) Method for automatic evaluation of spoken language fluency based on generalized fluency
CN103594087B (en) Improve the method and system of oral evaluation performance
CN102568475B (en) System and method for assessing proficiency in Putonghua
CN101826263B (en) Objective standard based automatic oral evaluation system
CN101751919B (en) Spoken Chinese stress automatic detection method
CN102034475B (en) Method for interactively scoring open short conversation by using computer
CN101739869B (en) Priori knowledge-based pronunciation evaluation and diagnosis system
CN105845134A (en) Spoken language evaluation method through freely read topics and spoken language evaluation system thereof
US9262941B2 (en) Systems and methods for assessment of non-native speech using vowel space characteristics
US8447603B2 (en) Rating speech naturalness of speech utterances based on a plurality of human testers
CN102214462A (en) Method and system for estimating pronunciation
CN104464755A (en) Voice evaluation method and device
CN109346056A (en) Phoneme synthesizing method and device based on depth measure network
CN110415725B (en) Method and system for evaluating pronunciation quality of second language using first language data
Evanini et al. Overview of automated speech scoring
CN106297765A (en) Phoneme synthesizing method and system
CN104700831B (en) The method and apparatus for analyzing the phonetic feature of audio file
CN110349567A (en) The recognition methods and device of voice signal, storage medium and electronic device
CN109065024A (en) abnormal voice data detection method and device
Shashidhar et al. Automatic spontaneous speech grading: A novel feature derivation technique using the crowd
Li et al. Techware: Speaker and spoken language recognition resources [best of the web]
CN106297766A (en) Phoneme synthesizing method and system
CN112116181B (en) Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
CB03 Change of inventor or designer information

Inventor after: Wei Si

Inventor after: Wang Shijin

Inventor after: Liu Dan

Inventor after: Hu Yu

Inventor after: Liu Qingfeng

Inventor before: Wang Shijin

Inventor before: Liu Dan

Inventor before: Wei Si

Inventor before: Hu Yu

Inventor before: Liu Qingfeng

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171207

Address after: 510000 Guangzhou City, Guangzhou, Guangdong, Haizhuqu District Guangzhou Avenue South 788, self compiled 15 houses, 177 rooms

Patentee after: Guangzhou Xunfei Yi heard Network Technology Co. Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee before: Iflytek Co., Ltd.