CN103594087A - Method and system for improving oral evaluation performance - Google Patents

Method and system for improving oral evaluation performance Download PDF

Info

Publication number
CN103594087A
CN103594087A CN201310553383.0A CN201310553383A CN103594087A CN 103594087 A CN103594087 A CN 103594087A CN 201310553383 A CN201310553383 A CN 201310553383A CN 103594087 A CN103594087 A CN 103594087A
Authority
CN
China
Prior art keywords
self
adaptation
topic
data
acoustic model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310553383.0A
Other languages
Chinese (zh)
Other versions
CN103594087B (en
Inventor
高前勇
魏思
胡国平
刘丹
陈进
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310553383.0A priority Critical patent/CN103594087B/en
Publication of CN103594087A publication Critical patent/CN103594087A/en
Application granted granted Critical
Publication of CN103594087B publication Critical patent/CN103594087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and system for improving the oral evaluation performance. The method comprises the steps that speech data, to be evaluated, of a user are received, wherein the speech data comprise the speech data of reading questions and the speech data of semi-open-ended questions; grading is conducted on the reading questions according to the speech data of the reading questions; self-adaption effective data are obtained from a grading result; according to the self-adaption effective data, a preset acoustic model is optimized; the semi-open-ended questions are evaluated through the optimized acoustic model. By the adoption of the method and system for improving the oral evaluation performance, the accuracy of oral evaluation is effectively improved.

Description

Improve the method and system of spoken evaluation and test performance
Technical field
The present invention relates to voice process technology field, be specifically related to a kind of method and system that improve spoken evaluation and test performance.
Background technology
As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with the aggravation of socioeconomic development and the trend of globalization, people have proposed more and more higher requirement to the objectivity of the efficiency of language learning and language assessment, fairness and scale test.Traditional artificial spoken language proficiency evaluating method is very limited Faculty and Students on instructional blocks of time and space, at aspects such as qualified teachers' strength, teaching place, funds expenditures, also has gap and the imbalance on many hardware; Artificial evaluation and test cannot be avoided evaluator's self individual deviation, thereby can not guarantee the unification of standards of grading, sometimes even cannot accurately reflect measured's true horizon; And for extensive oral test, need a large amount of human and material resources and financial support, limited assessment test regular, scale.For this reason, industry has been developed some language teachings and evaluating system in succession.
Spoken evaluation and test is mainly concerned with two class topic types, reads aloud topic type and semi-open topic type.Wherein, reading aloud topic type refers to and requires user to read aloud the fluent degree that pre-set text is read aloud the standard degree of basic voice unit pronunciation and statement so as to investigating user; Semi-open topic type refers to by suggestion contents such as system plays image, video or short essays, and requires user to answer the test event of relevant issues or spoken repetition play content etc. according to these suggestion contents.
For the spoken language evaluation and test of semi-open topic type, in prior art, be mainly to utilize automatic speech recognition technology to carry out text identification to user speech content, then according to features such as recognition result statistics key vocabularies and phrase hit rates, carry out relevant scoring.Because the spoken evaluating standard of semi-open topic type is mainly to judge whether key vocabularies and phrase occur grammar mistake, therefore voice to be evaluated are carried out to speech recognition and obtain correct recognition result and seem particularly important, the accuracy that how to improve voice identification result in the spoken evaluation and test of semi-open property topic type is a major issue urgently to be resolved hurrily.
Summary of the invention
The embodiment of the present invention provides a kind of method and system that improve spoken evaluation and test performance, to improve the accuracy of spoken evaluation and test.
For this reason, the invention provides following technical scheme:
A method that improves spoken evaluation and test performance, comprising:
Receive user voice data to be evaluated, described speech data comprises: read aloud topic speech data and semi-open topic type speech data;
According to the described topic speech data of reading aloud, to respectively reading aloud topic, mark;
From appraisal result, be obtained from adaptation valid data;
According to described self-adaptation valid data, default acoustic model is optimized;
Utilize the acoustic model after optimizing to mark to half and half Open-ended Question.
Preferably, inscribing marks comprises to respectively reading aloud described in described basis, to read aloud topic speech data:
By described read aloud topic speech data with described in read aloud topic topic face text message carry out word tone and align, obtain speech signal segment corresponding to each basic voice unit in text word string;
Calculate the likelihood score of the speech signal segment that described basic voice unit is corresponding with it;
According to the posterior probability of the described basic voice unit of the described likelihood score statistics speech signal segment corresponding with it;
According to described posterior probability, calculate the score of respectively reading aloud topic.
Preferably, the described adaptation valid data that are obtained from from appraisal result comprise:
Select score higher than the speech data of reading aloud topic of the first thresholding of setting as self-adaptation valid data.
Preferably, described method also comprises:
Before default acoustic model being optimized according to described self-adaptation valid data, described self-adaptation valid data are carried out to voice unit equilibrium treatment, comprising:
Add up respectively the number of times of all kinds of bunches of appearance in every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
According to the number of times of described all kinds of bunches of appearance, utilization minimizes objective function and determines objective self-adapting statement;
Describedly according to described self-adaptation valid data, default acoustic model is optimized and is comprised: according to described objective self-adapting statement, default acoustic model is optimized.
Preferably, the described adaptation valid data that are obtained from from appraisal result comprise:
Select posterior probability higher than speech data corresponding to the basic voice unit of the second thresholding of setting as self-adaptation valid data.
Preferably, described method also comprises:
Before default acoustic model being optimized according to described self-adaptation valid data, described self-adaptation valid data are carried out to voice unit equilibrium treatment, comprising:
Add up respectively the number of times of all kinds of bunches of appearance in every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
According to the number of times of described all kinds of bunches of appearance, utilization minimizes objective function and determines the basic voice unit of objective self-adapting;
Describedly according to described self-adaptation valid data, default acoustic model is optimized and is comprised: according to the basic voice unit of described objective self-adapting, default acoustic model is optimized.
Preferably, describedly according to described self-adaptation valid data, default acoustic model is optimized and is comprised:
Employing is optimized default acoustic model based on the linear adaptive mode returning of maximum likelihood; Or
The adaptive mode of employing based on maximum a posteriori probability is optimized default acoustic model.
A system that improves spoken evaluation and test performance, comprising:
Receiver module, for receiving user voice data to be evaluated, described speech data comprises: read aloud topic speech data and semi-open topic type speech data;
Read aloud topic grading module, for reading aloud topic speech data described in basis, to respectively reading aloud topic, mark;
Self-adapting data extraction module, for being obtained from adaptation valid data from described appraisal result of reading aloud the output of topic grading module;
Model optimization module, for being optimized default acoustic model according to described self-adaptation valid data;
Semi-open topic grading module, for utilizing the acoustic model after optimization to mark to half and half Open-ended Question.
Preferably, described in, reading aloud topic grading module comprises:
Alignment unit, for by described read aloud topic speech data with described in read aloud topic topic face text message carry out word tone and align, obtain speech signal segment corresponding to each basic voice unit in text word string;
Likelihood score computing unit, for calculating the likelihood score of the speech signal segment that described basic voice unit is corresponding with it;
Posterior probability computing unit, for adding up the posterior probability of the speech signal segment that described basic voice unit is corresponding with it according to described likelihood score;
Score computing unit, for calculating the score of respectively reading aloud topic according to described posterior probability.
Preferably, described self-adapting data extraction module, specifically for select score higher than the speech data of reading aloud topic of the first thresholding of setting as self-adaptation valid data.
Preferably, described system also comprises:
The first balance module, for before default acoustic model being optimized according to described self-adapting data, carries out voice unit equilibrium treatment to described self-adaptation valid data; Described the first balance module comprises:
Statistic unit, for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
The first determining unit, for the number of times according to described all kinds of bunches of appearance, utilization minimizes objective function and determines objective self-adapting statement;
Described model optimization module, specifically for being optimized default acoustic model according to described objective self-adapting statement.
Preferably, described self-adapting data extraction module, specifically for select posterior probability higher than speech data corresponding to the basic voice unit of the second thresholding of setting as self-adaptation valid data.
Preferably, described system also comprises:
The second balance module, for before default acoustic model being optimized according to described self-adapting data, carries out voice unit equilibrium treatment to described self-adaptation valid data; Described the second balance module comprises:
Statistic unit, for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
The second determining unit, for the number of times according to described all kinds of bunches of appearance, utilization minimizes objective function and determines the basic voice unit of objective self-adapting;
Described model optimization module, specifically for being optimized default acoustic model according to the basic voice unit of described objective self-adapting.
Preferably, described model optimization module, is optimized default acoustic model specifically for adopting based on the linear adaptive mode returning of maximum likelihood; Or adopt the adaptive mode based on maximum a posteriori probability to be optimized default acoustic model.
The method and system of the spoken evaluation and test of the raising performance that the embodiment of the present invention provides, from examinee, read aloud topic voice and extract effective self-adapting data, and utilize these data to carry out Automatic Optimal to acoustic model, thereby general acoustic model is customized to the examinee model consistent with examinee's tone color, words person's independence model is changed into words person's correlation model, greatly improve speech recognition effect, thereby effectively improved the accuracy of the even whole spoken evaluating system scoring of semi-open topic type.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to the accompanying drawing of required use in embodiment be briefly described below, apparently, the accompanying drawing the following describes is only some embodiment that record in the present invention, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the schematic diagram of the evaluating method of double Open-ended Question type in prior art.
Fig. 2 is the process flow diagram that the embodiment of the present invention improves the method for spoken evaluation and test performance;
Fig. 3 is a kind of structural representation that the embodiment of the present invention improves the system of spoken evaluation and test performance;
Fig. 4 is the another kind of structural representation that the embodiment of the present invention improves the system of spoken evaluation and test performance;
Fig. 5 is the another kind of structural representation that the embodiment of the present invention improves the system of spoken evaluation and test performance.
Embodiment
In order to make those skilled in the art person understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.
First the spoken evaluating method of double Open-ended Question type in prior art is done to simple declaration below.As shown in Figure 1, be the schematic diagram of the evaluating method of double Open-ended Question type in prior art.
This evaluating method comprises the following steps:
Step 1: receive user voice signal input, i.e. examinee's phonetic entry.
Step 2: speech recognition, further also can comprise the noise reduction pre-service of voice signal etc.
Described phonic signal character refers to the vector that can characterize user pronunciation feature, conventionally can extract the 39 dimension MFCC(Mel Frequency Cepstrum Coefficient that match with training set, Mel frequency cepstral coefficient) feature etc.
Step 3: demoder, according to the phonic signal character extracting, is determined the content of text that voice signal is corresponding.
Particularly, system is searched for optimal path and is determined optimal identification result in search network.Described search network is launched into huge search volume by acoustic model, the language model of systemic presupposition by static state or dynamical fashion, and obtains N-Best decoded result by Viterbi algorithm.
Step 4: the content of text obtaining according to identification is determined active user's spoken language scoring.
General system can be hit the feature acquisition scorings such as ratio according to N-Best decoded result calculating keyword or phrase.
Wherein, acoustic model is for describing the mathematical model of each basic voice unit pronunciation characteristic, and in statistical model identification application, determining of its model parameter often will be added up and obtain on the training data of magnanimity, and concrete training process is as follows:
(1) gather training data;
(2) extract the acoustic feature of training data;
(3) acoustic model topological structure is set;
(4) acoustic model parameter training.
Language model training process is mainly: collect language model and train required training text, and adopt the N-Gram statistical language model of current main flow in the world as the topological structure of described language model, then adopt maximal possibility estimation to obtain the conditional probability distribution of all historical vocabulary of each word in training text.
Obviously, above-mentioned steps three Chinese version recognition results directly have influence on the effect of spoken evaluation and test, and more spoken evaluation and test is more reliable for recognition result.Under automatic speech recognition system arranges, demoder, according to default acoustic model, language model, is launched into huge search volume by static state or dynamical fashion, by Viterbi algorithm, obtains N-Best decoded result.Recognition accuracy is relevant with the accuracy of search network, particularly acoustic model and the property distinguished.Acoustic model is meticulousr, and test environment coupling more, and recognition accuracy is higher.
And the acoustic model that legacy system adopts is owing to being that system train in advance and obtained in mass data, often there is stronger universality, accordingly the identification of each concrete speaker is just short of to some extent.The spoken points-scoring system of double Open-ended Question type particularly, between different examinee's tone colors, there is larger difference, and examination hall environment is easily subject to many factors, often there is larger difference in test environment and training environment, the acoustic model of training in advance and examinee's tone color mismatch, cause speech recognition accuracy rate extremely low, N-Best recognition result and the correlativity between model answer that mutually deserved legacy speech recognition systems provides are often poor.
For this reason, the embodiment of the present invention provides a kind of method and system that improve spoken evaluation and test performance, the spoken language of particular user is being carried out in evaluation and test process, first the topic part of reading aloud of needs evaluation and test is carried out to normal spoken scoring, to appraisal result, efficient adaptive data are obtained in analysis, obtain user pronunciation customized information, then according to described self-adapting data, default acoustic model is optimized, acoustic model and user's tone color are matched, and then evaluate and test according to double Open-ended Question type of acoustic model and even whole spoken topic type after optimizing.
As shown in Figure 2, be the process flow diagram that the embodiment of the present invention improves the method for spoken evaluation and test performance, comprise the following steps:
Step 201, receives user voice data to be evaluated, and described speech data comprises: read aloud topic speech data and semi-open topic type speech data.
Step 202, marks to respectively reading aloud topic according to the described topic speech data of reading aloud.
Particularly, can carry out word tone with the topic face text message of reading aloud topic and align reading aloud topic speech data, obtain speech signal segment corresponding to each basic voice unit in text word string, then calculate the likelihood score of the speech signal segment that each basic voice unit is corresponding with it, according to the likelihood score calculating, add up the posterior probability of the speech signal segment that each basic voice unit is corresponding with it, according to described posterior probability, calculate the score of respectively reading aloud topic.
Described posterior probability refers to the probability of again revising after the information that obtains " result ".
Suppose that basic voice unit is M i, its corresponding speech signal segment is O i, basic voice unit M ithe speech signal segment O corresponding with it ilikelihood score be P (O i| M i), calculate basic voice unit M ithe speech signal segment O corresponding with it iposterior probability P (M i| O i) process as follows:
First computing voice signal segment O iwith respect to basic voice unit M ithe affiliated likelihood score of obscuring each basic voice unit in voice unit set:
P(O i|M j),j=1,2,...,i-1,i+1,...,K
Wherein, K is default voice unit number.
Obscuring voice unit set and can set in advance under each basic voice unit, such as, can be using all basic voice units as obscuring voice unit set.Further, that can also determine identical category according to the classification of the basic voice unit of current investigation obscures voice unit set, and as initial consonant in Chinese mandarin evaluation and test, replacing that phoneme only allows is initial consonant phoneme.Further, can also select to there is the homophylic elementary cell of pronunciation as obscuring voice unit set with the basic voice unit of current investigation.
Then, according to new probability formula, can obtain given sound bite O and basic voice unit M iposterior probability be:
P ( M i | O ) = P ( O | M i ) Σ j = 1 K P ( O | M j )
In embodiments of the present invention, comprehensively the posterior probability of each basic voice unit is marked to reading aloud topic.Particularly, can, using the described mean value of reading aloud the posterior probability of all basic voice units in topic as described score of reading aloud topic, read aloud must being divided into of topic:
P sent = 1 N Σ i = 1 N P ( M i | O )
Wherein, N is a number of reading aloud basic voice unit in topic.
Obviously, higher to illustrate that topic pronunciation read aloud in this sentence of examinee more accurate for integrate score.
Step 203 is obtained from adaptation valid data from appraisal result.
Adaptive reliable in order to guarantee acoustic model, require self-adaptation valid data correct as far as possible, and can reflect user's pronunciation characteristic.
In embodiments of the present invention, for different users, can obtain different self-adapting datas.Particularly, can have following several mode to obtain self-adapting data:
Mode one, by sentence degree of confidence, select, select the average posterior probability of sentence surpass appointed threshold T1(such as, the speech data of reading aloud topic T1=-0.85) is as effective self-adapting data.In general such sentence pronunciation is more correct, and quality is high, the selected self-adaptation valid data of doing of whole sentence.
Mode two, by basic voice unit degree of confidence, select, selecting posterior probability to surpass appointed threshold T2(different phonetic unit thresholding can be different, and T2 can be-0.7~-1.87) speech data corresponding to basic voice unit as self-adaptation valid data.
It should be noted that, above-mentioned thresholding T2 and thresholding T1 can be identical, also can be different.
Further, owing to selecting by sentence degree of confidence or selecting by basic voice unit degree of confidence the situation that self-adapting data all exists each voice unit self-adapting data skewness weighing apparatus, and may affect self-adaptation effect.For this reason, in embodiments of the present invention, can also further to the self-adaptation valid data that obtain, carry out voice unit equilibrium treatment, to improve adaptive robustness.
The self-adaptation valid data that obtain for above-mentioned first kind of way, the process of carrying out voice unit equilibrium treatment is as follows:
(1) add up respectively the number of times of all kinds of bunches of appearance in every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation.
Such as, the number of the self-adaptation statement obtaining is S, counts respectively the number of times of all kinds of bunches of appearance in each statement: F k(V i) and F k(C i), wherein, k=1,2 ..., S.
(2) utilize and to minimize objective function and determine objective self-adapting statement, described in minimize objective function and be:
obj = min { Σ k = 1 S p k { Σ j = 1 N Σ i = 1 M [ F k ( C j ) - F k ( V j ) ] 2 + Σ j = 1 M Σ i = 1 , i ≠ j M [ F k ( C j ) - F k ( C i ) ] 2 + Σ j = 1 N Σ i = 1 , i ≠ j N [ F k ( V j ) - F k ( V i ) ] 2 } }
Wherein, p i{ 0,1}, for describing the selected and unchecked situation of S statement, particularly, described objective function optimization can be by manually by 2 for ∈ sinferior searching loop and preferred statement as much as possible are p i=1 sets result.
By the method, can eliminate to a certain extent above-mentioned the first self-adaptation statement and select mode and by sentence, select the extremely unbalance phenomenon of basic language unit number of bringing merely.
For the above-mentioned second way, obtain self-adapting data, the process of carrying out voice unit equilibrium treatment is as follows:
The number of times of supposing each class bunch appearance in self-adaptation statement is F (V i) and F (C i), final actually select that to participate in adaptive each class bunch number be respectively f (V i), f (V i)≤F (V i) and f (C i), f (C i)≤F (C i), by minimizing objective function:
obj = min { Σ j = 1 N Σ i = 1 M [ f ( C j ) - f ( V j ) ] 2 + Σ j = 1 M Σ i = 1 , i ≠ j M [ f ( C j ) - f ( C i ) ] 2 + Σ j = 1 N Σ i = 1 , i ≠ j N [ f ( V j ) - f ( V i ) ] 2 } ,
Take vowel as example, there is the possible F (V that is chosen as in each class bunch i) plant, M vowel class bunch has like this therefore array mode, possible array mode is altogether
Figure BDA0000410735360000094
by traversal institute's combined situation likely, make target function value get minimum value, now select the adaptive voice unit that obtains for equilibrium situation.
Step 204, is optimized default acoustic model according to described self-adapting data.
Correspondingly, after obtaining objective self-adapting statement, just can to default acoustic model, be optimized according to described objective self-adapting statement.
Specifically can adopt based on linear (the Maximum Likelihood Linear Regression of recurrence of maximum likelihood, MLLR) or traditional adaptive approach such as maximum a posteriori probability (Maximum A Posteriori Linear Regression, MAPLR) carry out the self-adaptation of acoustic model.
Step 205, utilizes the acoustic model after optimizing to mark to half and half Open-ended Question.
It should be noted that, in actual applications, not only can utilize the double Open-ended Question type of acoustic model after optimization to mark, and then can also carry out revaluation to reading aloud topic, further improve the scoring accuracy of reading aloud topic.
The method of the spoken evaluation and test of the raising performance that the embodiment of the present invention provides, from examinee, read aloud topic voice and extract effective self-adapting data, and utilize these data to carry out Automatic Optimal to acoustic model, thereby general acoustic model is customized to the examinee model consistent with examinee's tone color, words person's independence model is changed into words person's correlation model, by automatic learning user pronunciation characteristic, improve the matching degree of predetermined acoustic model and user pronunciation feature, greatly improved speech recognition effect, thereby effectively improved the accuracy of the even whole spoken evaluating system scoring of semi-open topic type.
Correspondingly, the embodiment of the present invention also provides a kind of system that improves spoken evaluation and test performance, as shown in Figure 3, is a kind of structural representation of this system.
In this embodiment, described system comprises:
Receiver module 301, for receiving user voice data to be evaluated, described speech data comprises: read aloud topic speech data and semi-open topic type speech data;
Read aloud topic grading module 302, for reading aloud topic speech data described in basis, to respectively reading aloud topic, mark;
Self-adapting data extraction module 303, for being obtained from adaptation valid data from described appraisal result of reading aloud 302 outputs of topic grading module;
Model optimization module 304, for being optimized default acoustic model according to described self-adaptation valid data;
Semi-open topic grading module 305, for utilizing the acoustic model after optimization to mark to half and half Open-ended Question.
In actual applications, the above-mentioned topic grading module 302 of reading aloud can comprise:
Alignment unit, for by described read aloud topic speech data with described in read aloud topic topic face text message carry out word tone and align, obtain speech signal segment corresponding to each basic voice unit in text word string;
Likelihood score computing unit, for calculating the likelihood score of the speech signal segment that described basic voice unit is corresponding with it;
Posterior probability computing unit, for adding up the posterior probability of the speech signal segment that described basic voice unit is corresponding with it according to described likelihood score;
Score computing unit, for calculating the score of respectively reading aloud topic according to described posterior probability.
In addition, in embodiments of the present invention, self-adapting data extraction module 303 specifically can select score higher than the speech data of reading aloud topic of the first thresholding of setting as self-adaptation valid data, or select posterior probability higher than speech data corresponding to the basic voice unit of the second thresholding of setting as self-adaptation valid data.
Further, owing to selecting by sentence degree of confidence or selecting by basic voice unit degree of confidence the situation that self-adapting data all exists each voice unit self-adapting data skewness weighing apparatus, and may affect self-adaptation effect.For this reason, in the system of the embodiment of the present invention, can also further to the self-adaptation valid data that obtain, carry out voice unit equilibrium treatment, to improve adaptive robustness.
As shown in Figure 4, be the another kind of structural representation that the embodiment of the present invention improves the system of spoken evaluation and test performance.
From embodiment illustrated in fig. 3 different, in this embodiment, described system also comprises:
The first balance module 401, before default acoustic model being optimized according to self-adapting data in described model optimization module 304, carries out voice unit equilibrium treatment to described self-adaptation valid data.
This first balance module 401 specifically can comprise: statistic unit and the first determining unit.Wherein:
Described statistic unit is for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, and described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
Described the first determining unit is for according to the number of times of described all kinds of bunches of appearance, utilizes to minimize objective function and determine objective self-adapting statement, and detailed process can, with reference to the description in the inventive method embodiment above, not repeat them here.
Correspondingly, in this embodiment, model optimization module 304 is optimized default acoustic model according to described objective self-adapting statement.
As shown in Figure 5, be the another kind of structural representation that the embodiment of the present invention improves the system of spoken evaluation and test performance.
From embodiment illustrated in fig. 3 different, in this embodiment, described system also comprises:
The second balance module 501, before default acoustic model being optimized according to self-adapting data in described model optimization module 304, carries out voice unit equilibrium treatment to described self-adaptation valid data
This second balance module 501 comprises: statistic unit and the second determining unit.Wherein:
Described statistic unit is for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, and described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
Described the second determining unit is for according to the number of times of described all kinds of bunches of appearance, utilizes to minimize objective function and determine the basic voice unit of objective self-adapting, and detailed process can, with reference to the description in the inventive method embodiment above, not repeat them here.
Correspondingly, in this embodiment, model optimization module 304 is optimized default acoustic model according to the basic voice unit of described objective self-adapting.
The system of the spoken evaluation and test of the raising performance that the embodiment of the present invention provides, from examinee, read aloud topic voice and extract effective self-adapting data, and utilize these data to carry out Automatic Optimal to acoustic model, thereby general acoustic model is customized to the examinee model consistent with examinee's tone color, words person's independence model is changed into words person's correlation model, greatly improve speech recognition effect, thereby effectively improved the accuracy of the even whole spoken evaluating system scoring of semi-open topic type.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.System embodiment described above is only schematic, wherein said module or unit as separating component explanation can or can not be also physically to separate, the parts that show as module or unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize the some or all functions according to the some or all parts in the system of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing like this program of the present invention can be stored on computer-readable medium, or can have the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
Above the embodiment of the present invention is described in detail, has applied embodiment herein the present invention is set forth, the explanation of above embodiment is just for helping to understand method and apparatus of the present invention; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (14)

1. a method that improves spoken evaluation and test performance, is characterized in that, comprising:
Receive user voice data to be evaluated, described speech data comprises: read aloud topic speech data and semi-open topic type speech data;
According to the described topic speech data of reading aloud, to respectively reading aloud topic, mark;
From appraisal result, be obtained from adaptation valid data;
According to described self-adaptation valid data, default acoustic model is optimized;
Utilize the acoustic model after optimizing to mark to half and half Open-ended Question.
2. method according to claim 1, is characterized in that, reads aloud topic speech data described in described basis and marks and comprise respectively reading aloud topic:
By described read aloud topic speech data with described in read aloud topic topic face text message carry out word tone and align, obtain speech signal segment corresponding to each basic voice unit in text word string;
Calculate the likelihood score of the speech signal segment that described basic voice unit is corresponding with it;
According to the posterior probability of the described basic voice unit of the described likelihood score statistics speech signal segment corresponding with it;
According to described posterior probability, calculate the score of respectively reading aloud topic.
3. method according to claim 2, is characterized in that, the described adaptation valid data that are obtained from from appraisal result comprise:
Select score higher than the speech data of reading aloud topic of the first thresholding of setting as self-adaptation valid data.
4. method according to claim 3, is characterized in that, described method also comprises:
Before default acoustic model being optimized according to described self-adaptation valid data, described self-adaptation valid data are carried out to voice unit equilibrium treatment, comprising:
Add up respectively the number of times of all kinds of bunches of appearance in every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
According to the number of times of described all kinds of bunches of appearance, utilization minimizes objective function and determines objective self-adapting statement;
Describedly according to described self-adaptation valid data, default acoustic model is optimized and is comprised: according to described objective self-adapting statement, default acoustic model is optimized.
5. method according to claim 2, is characterized in that, the described adaptation valid data that are obtained from from appraisal result comprise:
Select posterior probability higher than speech data corresponding to the basic voice unit of the second thresholding of setting as self-adaptation valid data.
6. method according to claim 5, is characterized in that, described method also comprises:
Before default acoustic model being optimized according to described self-adaptation valid data, described self-adaptation valid data are carried out to voice unit equilibrium treatment, comprising:
Add up respectively the number of times of all kinds of bunches of appearance in every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
According to the number of times of described all kinds of bunches of appearance, utilization minimizes objective function and determines the basic voice unit of objective self-adapting;
Describedly according to described self-adaptation valid data, default acoustic model is optimized and is comprised: according to the basic voice unit of described objective self-adapting, default acoustic model is optimized.
7. according to the method described in claim 1 to 6 any one, it is characterized in that, describedly according to described self-adaptation valid data, default acoustic model be optimized and comprised:
Employing is optimized default acoustic model based on the linear adaptive mode returning of maximum likelihood; Or
The adaptive mode of employing based on maximum a posteriori probability is optimized default acoustic model.
8. a system that improves spoken evaluation and test performance, is characterized in that, comprising:
Receiver module, for receiving user voice data to be evaluated, described speech data comprises: read aloud topic speech data and semi-open topic type speech data;
Read aloud topic grading module, for reading aloud topic speech data described in basis, to respectively reading aloud topic, mark;
Self-adapting data extraction module, for being obtained from adaptation valid data from described appraisal result of reading aloud the output of topic grading module;
Model optimization module, for being optimized default acoustic model according to described self-adaptation valid data;
Semi-open topic grading module, for utilizing the acoustic model after optimization to mark to half and half Open-ended Question.
9. system according to claim 8, is characterized in that, described in read aloud topic grading module comprise:
Alignment unit, for by described read aloud topic speech data with described in read aloud topic topic face text message carry out word tone and align, obtain speech signal segment corresponding to each basic voice unit in text word string;
Likelihood score computing unit, for calculating the likelihood score of the speech signal segment that described basic voice unit is corresponding with it;
Posterior probability computing unit, for adding up the posterior probability of the speech signal segment that described basic voice unit is corresponding with it according to described likelihood score;
Score computing unit, for calculating the score of respectively reading aloud topic according to described posterior probability.
10. system according to claim 9, is characterized in that,
Described self-adapting data extraction module, specifically for select score higher than the speech data of reading aloud topic of the first thresholding of setting as self-adaptation valid data.
11. systems according to claim 10, is characterized in that, described system also comprises:
The first balance module, for before default acoustic model being optimized according to described self-adapting data, carries out voice unit equilibrium treatment to described self-adaptation valid data; Described the first balance module comprises:
Statistic unit, for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
The first determining unit, for the number of times according to described all kinds of bunches of appearance, utilization minimizes objective function and determines objective self-adapting statement;
Described model optimization module, specifically for being optimized default acoustic model according to described objective self-adapting statement.
12. systems according to claim 9, is characterized in that,
Described self-adapting data extraction module, specifically for select posterior probability higher than speech data corresponding to the basic voice unit of the second thresholding of setting as self-adaptation valid data.
13. systems according to claim 12, is characterized in that, described system also comprises:
The second balance module, for before default acoustic model being optimized according to described self-adapting data, carries out voice unit equilibrium treatment to described self-adaptation valid data; Described the second balance module comprises:
Statistic unit, for adding up respectively the number of times of all kinds of bunches of appearance of every self-adaptation valid data, described all kinds of bunches refer to the similarly basic voice unit set of pronunciation;
The second determining unit, for the number of times according to described all kinds of bunches of appearance, utilization minimizes objective function and determines the basic voice unit of objective self-adapting;
Described model optimization module, specifically for being optimized default acoustic model according to the basic voice unit of described objective self-adapting.
System described in 14. according to Claim 8 to 13 any one, is characterized in that,
Described model optimization module, is optimized default acoustic model specifically for adopting based on the linear adaptive mode returning of maximum likelihood; Or adopt the adaptive mode based on maximum a posteriori probability to be optimized default acoustic model.
CN201310553383.0A 2013-11-08 2013-11-08 Improve the method and system of oral evaluation performance Active CN103594087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310553383.0A CN103594087B (en) 2013-11-08 2013-11-08 Improve the method and system of oral evaluation performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310553383.0A CN103594087B (en) 2013-11-08 2013-11-08 Improve the method and system of oral evaluation performance

Publications (2)

Publication Number Publication Date
CN103594087A true CN103594087A (en) 2014-02-19
CN103594087B CN103594087B (en) 2016-10-12

Family

ID=50084194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310553383.0A Active CN103594087B (en) 2013-11-08 2013-11-08 Improve the method and system of oral evaluation performance

Country Status (1)

Country Link
CN (1) CN103594087B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464757A (en) * 2014-10-28 2015-03-25 科大讯飞股份有限公司 Voice evaluation method and device
CN104464423A (en) * 2014-12-19 2015-03-25 科大讯飞股份有限公司 Calibration optimization method and system for speaking test evaluation
CN106409291A (en) * 2016-11-04 2017-02-15 南京侃侃信息科技有限公司 Implementation method of voice search list
CN106652622A (en) * 2017-02-07 2017-05-10 广东小天才科技有限公司 Text training method and device
CN107240394A (en) * 2017-06-14 2017-10-10 北京策腾教育科技有限公司 A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
CN107808674A (en) * 2017-09-28 2018-03-16 上海流利说信息技术有限公司 A kind of method, medium, device and the electronic equipment of voice of testing and assessing
CN108417207A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 A kind of depth mixing generation network self-adapting method and system
CN109522301A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium
CN109658918A (en) * 2018-12-03 2019-04-19 广东外语外贸大学 A kind of intelligence Oral English Practice repetition topic methods of marking and system
CN110111775A (en) * 2019-05-17 2019-08-09 腾讯科技(深圳)有限公司 A kind of Streaming voice recognition methods, device, equipment and storage medium
CN110164414A (en) * 2018-11-30 2019-08-23 腾讯科技(深圳)有限公司 Method of speech processing, device and smart machine
CN110489756A (en) * 2019-08-23 2019-11-22 上海乂学教育科技有限公司 Conversational human-computer interaction spoken language evaluation system
TWI683290B (en) * 2018-06-28 2020-01-21 吳雲中 Spoken language teaching auxiliary method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1139318A1 (en) * 1999-09-27 2001-10-04 Kojima Co., Ltd. Pronunciation evaluation system
CN102034475A (en) * 2010-12-08 2011-04-27 中国科学院自动化研究所 Method for interactively scoring open short conversation by using computer
CN102354495A (en) * 2011-08-31 2012-02-15 中国科学院自动化研究所 Testing method and system of semi-opened spoken language examination questions
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1139318A1 (en) * 1999-09-27 2001-10-04 Kojima Co., Ltd. Pronunciation evaluation system
CN102034475A (en) * 2010-12-08 2011-04-27 中国科学院自动化研究所 Method for interactively scoring open short conversation by using computer
CN102354495A (en) * 2011-08-31 2012-02-15 中国科学院自动化研究所 Testing method and system of semi-opened spoken language examination questions
CN103065626A (en) * 2012-12-20 2013-04-24 中国科学院声学研究所 Automatic grading method and automatic grading equipment for read questions in test of spoken English

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
严可等: "面向大规模英语口语机考的复述题自动评分技术", 《清华大学学报(自然科学版)》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104464757B (en) * 2014-10-28 2019-01-18 科大讯飞股份有限公司 Speech evaluating method and speech evaluating device
CN104464757A (en) * 2014-10-28 2015-03-25 科大讯飞股份有限公司 Voice evaluation method and device
CN104464423A (en) * 2014-12-19 2015-03-25 科大讯飞股份有限公司 Calibration optimization method and system for speaking test evaluation
CN106409291B (en) * 2016-11-04 2019-12-17 南京侃侃信息科技有限公司 Method for implementing voice search list
CN106409291A (en) * 2016-11-04 2017-02-15 南京侃侃信息科技有限公司 Implementation method of voice search list
CN106652622A (en) * 2017-02-07 2017-05-10 广东小天才科技有限公司 Text training method and device
CN107240394A (en) * 2017-06-14 2017-10-10 北京策腾教育科技有限公司 A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system
CN107808674A (en) * 2017-09-28 2018-03-16 上海流利说信息技术有限公司 A kind of method, medium, device and the electronic equipment of voice of testing and assessing
CN108417207A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 A kind of depth mixing generation network self-adapting method and system
TWI683290B (en) * 2018-06-28 2020-01-21 吳雲中 Spoken language teaching auxiliary method and device
CN109522301A (en) * 2018-11-07 2019-03-26 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium
CN110164414A (en) * 2018-11-30 2019-08-23 腾讯科技(深圳)有限公司 Method of speech processing, device and smart machine
CN110164414B (en) * 2018-11-30 2023-02-14 腾讯科技(深圳)有限公司 Voice processing method and device and intelligent equipment
CN109658918A (en) * 2018-12-03 2019-04-19 广东外语外贸大学 A kind of intelligence Oral English Practice repetition topic methods of marking and system
CN110111775A (en) * 2019-05-17 2019-08-09 腾讯科技(深圳)有限公司 A kind of Streaming voice recognition methods, device, equipment and storage medium
CN110111775B (en) * 2019-05-17 2021-06-22 腾讯科技(深圳)有限公司 Streaming voice recognition method, device, equipment and storage medium
CN110489756A (en) * 2019-08-23 2019-11-22 上海乂学教育科技有限公司 Conversational human-computer interaction spoken language evaluation system

Also Published As

Publication number Publication date
CN103594087B (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN103594087B (en) Improve the method and system of oral evaluation performance
CN105845134B (en) Spoken language evaluation method and system for freely reading question types
US8392190B2 (en) Systems and methods for assessment of non-native spontaneous speech
CN103559892B (en) Oral evaluation method and system
CN103559894B (en) Oral evaluation method and system
US9489864B2 (en) Systems and methods for an automated pronunciation assessment system for similar vowel pairs
CN104952444B (en) A kind of Chinese's Oral English Practice method for evaluating quality that text is unrelated
CN108766415B (en) Voice evaluation method
US9262941B2 (en) Systems and methods for assessment of non-native speech using vowel space characteristics
CN104464423A (en) Calibration optimization method and system for speaking test evaluation
CN103985391A (en) Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
KR101487005B1 (en) Learning method and learning apparatus of correction of pronunciation by input sentence
CN103985392A (en) Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
CN102184654B (en) Reading supervision method and device
CN110415725B (en) Method and system for evaluating pronunciation quality of second language using first language data
CN110223678A (en) Audio recognition method and system
CN112802456A (en) Voice evaluation scoring method and device, electronic equipment and storage medium
CN110349567B (en) Speech signal recognition method and device, storage medium and electronic device
Tao et al. DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring.
Shashidhar et al. Automatic spontaneous speech grading: A novel feature derivation technique using the crowd
CN113205729A (en) Foreign student-oriented speech evaluation method, device and system
Zechner et al. Automatic scoring of children’s read-aloud text passages and word lists
CN115116474A (en) Spoken language scoring model training method, scoring method, device and electronic equipment
KR101487006B1 (en) Learning method and learning apparatus of correction of pronunciation for pronenciaion using linking
KR101487007B1 (en) Learning method and learning apparatus of correction of pronunciation by pronunciation analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant