CN103594087B

CN103594087B - Improve the method and system of oral evaluation performance

Info

Publication number: CN103594087B
Application number: CN201310553383.0A
Authority: CN
Inventors: 高前勇; 魏思; 胡国平; 刘丹; 陈进; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2013-11-08
Filing date: 2013-11-08
Publication date: 2016-10-12
Anticipated expiration: 2033-11-08
Also published as: CN103594087A

Abstract

The invention discloses a kind of method and system improving oral evaluation performance, the method includes: receive user voice data to be evaluated, and described speech data includes: read aloud topic speech data and semi-open topic type speech data；Mark to respectively reading aloud topic according to described topic speech data of reading aloud；Adaptation valid data it are obtained from from appraisal result；According to described self adaptation valid data, default acoustic model is optimized；Utilize the acoustic model after optimizing that half and half Open-ended Question is marked.Utilize the present invention, the accuracy of oral evaluation can be effectively improved.

Description

Improve the method and system of oral evaluation performance

Technical field

The present invention relates to voice process technology field, be specifically related to a kind of method improving oral evaluation performance and be System.

Background technology

As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with society Development that can be economic and the aggravation of the trend of globalization, people to the efficiency of language learning and the objectivity of language assessment, Fairness and scale test propose the highest requirement.Traditional artificial spoken language proficiency evaluating method makes Faculty and Students At instructional blocks of time and be spatially very limited, teacher strength, teaching the aspect such as place, funds expenditure there is also many firmly Gap on part and imbalance；The artificial individual deviation that cannot avoid evaluator self of evaluating and testing, thus it cannot be guaranteed that standards of grading Unification, the most even cannot accurately reflect the true horizon of measured；And for extensive oral test, then need substantial amounts of Human and material resources and financial support, limit regular, the assessment test of scale.To this end, industry have developed some languages in succession Teach by precept and learn and evaluating system.

Oral evaluation is mainly concerned with two class topic types, i.e. reads aloud topic type and semi-open topic type.Wherein, read aloud topic type to refer to want User is asked to read aloud pre-set text so as to the fluent journey investigating user's standard degree that basic voice unit is pronounced and statement is read aloud Degree；Semi-open topic type refers to by suggestion contents such as system plays image, video or short essays, and according to these suggestion content requirements User answers relevant issues or the spoken test event repeating broadcasting content etc..

For the oral evaluation of semi-open topic type, mainly by automatic speech recognition technology to user's language in prior art Sound content carries out text identification, then carries out relevant commenting according to the recognition result statistics feature such as key vocabularies and phrase hit rate Point.Judge whether key vocabularies and phrase syntax error occur owing to the oral evaluation standard of semi-open topic type essentially consists in, because of This carries out the speech recognition correct recognition result of acquisition and is particularly important voice to be evaluated, how to improve and inscribes in semi-open property In type oral evaluation, the accuracy of voice identification result is a urgently to be resolved hurrily major issue.

Summary of the invention

The embodiment of the present invention provides a kind of method and system improving oral evaluation performance, to improve the accurate of oral evaluation Property.

To this end, the present invention provides following technical scheme:

A kind of method improving oral evaluation performance, including:

Receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and semi-open topic type Speech data；

Mark to respectively reading aloud topic according to described topic speech data of reading aloud；

Adaptation valid data it are obtained from from appraisal result；

According to described self adaptation valid data, default acoustic model is optimized；

Utilize the acoustic model after optimizing that half and half Open-ended Question is marked.

Preferably, read aloud topic speech data described in described basis and carry out scoring and include respectively reading aloud topic:

Described topic speech data of reading aloud with the described topic face text message reading aloud topic, is carried out word tone and aligns, acquisition text word The speech signal segment that in string, each basic voice unit is corresponding；

Calculate the likelihood score of the corresponding speech signal segment of described basic voice unit；

The posterior probability of the corresponding speech signal segment of described basic voice unit is added up according to described likelihood score；

The score respectively reading aloud topic is calculated according to described posterior probability.

Preferably, the described adaptation valid data that are obtained from from appraisal result include:

Select the score speech data reading aloud topic higher than the first thresholding set as self adaptation valid data.

Preferably, described method also includes:

Before default acoustic model being optimized according to described self adaptation valid data, effective to described self adaptation Data carry out voice unit equilibrium treatment, including:

Adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to what pronunciation was similar to Basic voice unit set；

According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting statement；

Described being optimized default acoustic model according to described self adaptation valid data includes: according to described target certainly Adapt to statement default acoustic model is optimized.

Select the posterior probability speech data corresponding higher than the basic voice unit of the second thresholding set as self adaptation Valid data.

Preferably, described method also includes:

According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting basic voice list Unit；

Described being optimized default acoustic model according to described self adaptation valid data includes: according to described target certainly Adapt to basic voice unit default acoustic model is optimized.

Preferably, described being optimized default acoustic model according to described self adaptation valid data includes:

Use the adaptive mode linearly returned based on maximum likelihood that default acoustic model is optimized；Or

Use adaptive mode based on maximum a posteriori probability that default acoustic model is optimized.

A kind of system improving oral evaluation performance, including:

Receiver module, for receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data With semi-open topic type speech data；

Read aloud topic grading module, for according to described in read aloud topic speech data and mark to respectively reading aloud topic；

Self-adapting data extraction module, for being obtained from adaptation from the described appraisal result reading aloud topic grading module output Valid data；

Model optimization module, for being optimized default acoustic model according to described self adaptation valid data；

Semi-open topic grading module, half and half Open-ended Question is marked by the acoustic model after utilizing optimization.

Preferably, read aloud topic grading module described in include:

Alignment unit, for carrying out word tone pair by described topic speech data of reading aloud with the described topic face text message reading aloud topic Together, the speech signal segment that in text word string, each basic voice unit is corresponding is obtained；

Likelihood score computing unit, for calculating the likelihood of the corresponding speech signal segment of described basic voice unit Degree；

Posterior probability computing unit, for adding up, according to described likelihood score, the voice that described basic voice unit is corresponding The posterior probability of signal segment；

Score calculation unit, for calculating the score respectively reading aloud topic according to described posterior probability.

Preferably, described self-adapting data extraction module, specifically for selecting score higher than the first thresholding bright set The speech data of reading topic is as self adaptation valid data.

Preferably, described system also includes:

First balance module, for before default acoustic model being optimized according to described self-adapting data, right Described self adaptation valid data carry out voice unit equilibrium treatment；Described first balance module includes:

Statistic unit, for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches Refer to the basic voice unit set that pronunciation is similar；

First determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines target Self adaptation statement；

Described model optimization module, specifically for carrying out excellent according to described objective self-adapting statement to default acoustic model Change.

Preferably, described self-adapting data extraction module, specifically for selecting posterior probability higher than the second thresholding set Speech data corresponding to basic voice unit as self adaptation valid data.

Preferably, described system also includes:

Second balance module, for before default acoustic model being optimized according to described self-adapting data, right Described self adaptation valid data carry out voice unit equilibrium treatment；Described second balance module includes:

Second determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines target The basic voice unit of self adaptation；

Described model optimization module, specifically for according to the basic voice unit of described objective self-adapting to default acoustic mode Type is optimized.

Preferably, described model optimization module, the adaptive mode linearly returned based on maximum likelihood specifically for employing Default acoustic model is optimized；Or use adaptive mode based on maximum a posteriori probability to default acoustic model It is optimized.

The method and system of the raising oral evaluation performance that the embodiment of the present invention provides, read aloud topic voice from examinee and extract Effectively self-adapting data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to Examinee's model that examinee's tone color is consistent, changes into words person's correlation model by words person's independence model, drastically increases speech recognition Effect, thus it is effectively improved the accuracy of the even overall oral evaluation system scoring of semi-open topic type.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only described in the present invention A little embodiments, for those of ordinary skill in the art, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the schematic diagram of the evaluating method of double Open-ended Question type in prior art.

Fig. 2 is the flow chart that the embodiment of the present invention improves the method for oral evaluation performance；

Fig. 3 is a kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance；

Fig. 4 is the another kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance；

Fig. 5 is the another kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance.

Detailed description of the invention

In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and implement The embodiment of the present invention is described in further detail by mode.

First below the oral evaluation method of double Open-ended Question type in prior art is briefly described.As it is shown in figure 1, be The schematic diagram of the evaluating method of double Open-ended Question type in prior art.

This evaluating method comprises the following steps:

Step one: receive user voice signal input, i.e. examinee's phonetic entry.

Step 2: speech recognition, may also include the noise reduction pretreatment etc. of voice signal further.

Described phonic signal character refers to characterize the vector of user pronunciation feature, generally can extract and training set phase The 39 dimension MFCC(Mel Frequency Cepstrum Coefficient joined, Mel frequency cepstral coefficient) feature etc..

Step 3: decoder, according to the phonic signal character extracted, determines the content of text that voice signal is corresponding.

Specifically, system is searched for optimal path in search network and determines optimal identification result.Described search network by The acoustic model of systemic presupposition, language model are launched into huge search volume by static or dynamical fashion, and pass through Viterbi algorithm obtains N-Best decoded result.

Step 4: according to identifying that the content of text obtained determines the spoken scoring of active user.

General system can calculate the feature acquisition such as key word or phrase hit ratio according to N-Best decoded result and comment Point.

Wherein, acoustic model is for describing the mathematical model of each basic voice unit pronunciation characteristic, at statistical-simulation spectrometry In application, the determination of its model parameter often to be added up on the training data of magnanimity and obtain, and concrete training process is as follows:

(1) training data is gathered；

(2) acoustic features of training data is extracted；

(3) acoustic model topological structure is set；

(4) acoustic model parameters training.

Language model training process is mainly: collects the training text needed for language model training, and uses the world at present The N-Gram statistical language model of upper main flow, as the topological structure of described language model, then uses maximal possibility estimation to obtain The conditional probability distribution of each word all history vocabulary in training text.

Obviously, above-mentioned steps three Chinese version recognition result directly influences the effect of oral evaluation, and recognition result is the most correct Then oral evaluation is the most reliable.Under automatic speech recognition system is arranged, decoder according to preset acoustic model, language model, It is launched into huge search volume by static or dynamical fashion, obtains N-Best decoded result by Viterbi algorithm.Know Other accuracy rate is relevant with the accuracy of search network, particularly acoustic model and distinction.Acoustic model is the finest, and test wrapper Border more coupling, then recognition accuracy is the highest.

And the acoustic model that legacy system uses is owing to being that system is trained in mass data in advance and obtained, and often has Stronger universality, the identification to each concrete speaker accordingly has been short of.Particularly double Open-ended Question type spoken language scoring , there is bigger difference in system, and examination hall environment is easily subject to factors impact between different examinee's tone colors, test environment and Often there is bigger difference in training environment, the acoustic model of training in advance and examinee's tone color mismatch, causes speech recognition accurate Rate is extremely low, and the dependency between N-Best recognition result and model answer that mutually deserved legacy speech recognition systems is given is the most relatively Difference.

To this end, the embodiment of the present invention provides a kind of method and system improving oral evaluation performance, to particular user During spoken language is evaluated and tested, first needs evaluation and test is read aloud topic part and carries out normal spoken scoring, to appraisal result analysis Obtain efficient adaptive data, obtain user pronunciation customized information, then according to described self-adapting data to default sound Model is optimized, and makes acoustic model match with user's tone color, then further according to the acoustic model after optimizing to semi-open Topic type or even overall spoken topic type are evaluated and tested.

As in figure 2 it is shown, be the flow chart of the method for embodiment of the present invention raising oral evaluation performance, comprise the following steps:

Step 201, receives user voice data to be evaluated, and described speech data includes: read aloud topic speech data and half Open-ended Question type speech data.

Step 202, marks to respectively reading aloud topic according to described topic speech data of reading aloud.

Specifically, word tone can be carried out with the topic face text message reading aloud topic align reading aloud topic speech data, obtain literary composition The speech signal segment that in this word string, each basic voice unit is corresponding, then calculates the voice that each basic voice unit is corresponding The likelihood score of signal segment, adds up, according to calculated likelihood score, the speech signal segment that each basic voice unit is corresponding Posterior probability, calculate according to described posterior probability and respectively read aloud the score of topic.

Described posterior probability refers to the probability again revised after obtaining the information of " result ".

Assume that basic voice unit is M_i, the speech signal segment of its correspondence is O_i, basic voice unit M_iCorresponding Speech signal segment O_iLikelihood score be P (O_i|M_i), then calculate basic voice unit M_iCorresponding speech signal segment O_i's Posterior probability P (M_i|O_i) process as follows:

First speech signal segment O is calculated_iRelative to basic voice unit M_iAffiliated obscures each base in voice unit set The likelihood score of this voice unit:

P(O_i|M_j),j=1,2,...,i-1,i+1,...,K

Wherein, K is default voice unit number.

Obscuring voice unit set and can pre-set belonging to each basic voice unit, such as, can by all substantially Voice unit is as obscuring voice unit set.Further, it is also possible to according to the classification of the current basic voice unit investigated Determine identical category obscures voice unit set, and in evaluating and testing such as Mandarin Chinese, initial consonant replacement phoneme only allows to be initial sounds Element.Further, it is also possible to select and the current basic voice unit investigated has the homophylic elementary cell conduct of pronunciation Obscure voice unit set.

Then, sound bite O and basic voice unit M can be given according to new probability formula_iPosterior probability be:

P (M_{i} | O) = \frac{P (O | M_{i})}{Σ_{j = 1}^{K} P (O | M_{j})}

In embodiments of the present invention, can the posterior probability of comprehensive each basic voice unit mark to reading aloud topic.Tool Body ground, can read aloud obtaining of topic using the meansigma methods of the described posterior probability reading aloud the interior all basic voice units of topic as described Point, i.e. read aloud must being divided into of topic:

P_{sent} = \frac{1}{N} Σ_{i = 1}^{N} P (M_{i} | O)

Wherein, N is one and reads aloud the number of basic voice unit in topic.

Obviously, comprehensive score is the highest, illustrates that topic pronunciation read aloud in this sentence of examinee the most accurate.

Step 203, is obtained from adaptation valid data from appraisal result.

In order to ensure that acoustic model is adaptive reliably, it is desirable to self adaptation valid data are correct as far as possible, and can The pronunciation characteristic of reflection user.

In embodiments of the present invention, for different users, different self-adapting datas can be obtained.Specifically, permissible Have following several ways obtain self-adapting data:

Mode one, select by sentence confidence level, i.e. select the average posterior probability of sentence to exceed appointed threshold T1(ratio Such as, T1=-0.85) the speech data reading aloud topic as effective self-adapting data.The pronunciation calibration of the most such sentence Really, quality is high, and whole sentence is selected makees self adaptation valid data.

Mode two, select by basic voice unit confidence level, i.e. select posterior probability to exceed appointed threshold T2(different Voice unit thresholding can be different, and T2 can be-0.7～-1.87) speech data corresponding to basic voice unit as adaptive Answer valid data.

It should be noted that above-mentioned thresholding T2 can be identical with thresholding T1, it is also possible to different.

Further, owing to carrying out selecting or selecting self-adapting data by basic voice unit confidence level by sentence confidence level All there is the situation of each voice unit self-adapting data skewness weighing apparatus, thereby increases and it is possible to affect self adaptation effect.To this end, in the present invention In embodiment, it is also possible to further the self adaptation valid data obtained are carried out voice unit equilibrium treatment, to improve self adaptation Robustness.

The self adaptation valid data obtained for above-mentioned first kind of way, carry out the process of voice unit equilibrium treatment such as Under:

(1) adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to class As basic voice unit set.

Such as, the number of the self adaptation statement of acquisition is S, counts in each statement all kinds of bunches of number of times occurred respectively: F_k(V_i) and F_k(C_i), wherein, k=1,2 ..., S.

(2) utilize and minimize object function and determine objective self-adapting statement, described in minimize object function and be:

obj = \min {Σ_{k = 1}^{S} p_{k} {Σ_{j = 1}^{N} Σ_{i = 1}^{M} {[F_{k} (C_{j}) - F_{k} (V_{j})]}^{2} + Σ_{j = 1}^{M} Σ_{i = 1, i &NotEqual; j}^{M} {[F_{k} (C_{j}) - F_{k} (C_{i})]}^{2} + Σ_{j = 1}^{N} Σ_{i = 1, i &NotEqual; j}^{N} {[F_{k} (V_{j}) - F_{k} (V_{i})]}^{2}}}

Wherein, p_i∈ 0,1}, it is used for describing selected and unchecked situation in S statement, specifically, described target Function optimization can be by manually by 2^SSecondary searching loop and preferred statement as much as possible are p_i=1 sets result.

With it, mode can be selected merely by sentence to eliminate the first self adaptation statement above-mentioned to a certain extent Select the phenomenon that the basic language unit number brought is the most unbalance.

Obtaining self-adapting data for the above-mentioned second way, the process carrying out voice unit equilibrium treatment is as follows:

Assume that the number of times that in self adaptation statement, each class bunch occurs is F (V_i) and F (C_i), actual select participation adaptive The each class bunch number answered is f (V respectively_i),f(V_i)≤F(V_i) and f (C_i),f(C_i)≤F(C_i), by minimizing target letter Number:

obj = \min {Σ_{j = 1}^{N} Σ_{i = 1}^{M} {[f (C_{j}) - f (V_{j})]}^{2} + Σ_{j = 1}^{M} Σ_{i = 1, i &NotEqual; j}^{M} {[f (C_{j}) - f (C_{i})]}^{2} + Σ_{j = 1}^{N} Σ_{i = 1, i &NotEqual; j}^{N} {[f (V_{j}) - f (V_{i})]}^{2}},

As a example by vowel, what the appearance of each class bunch was possible is chosen as F (V_i) plant, such M vowel class bunch has Compound mode, the most possible compound mode isBy traversal institute likely combined situation, make Obtaining target function value and take minima, now selecting the adaptive voice unit obtained is equilibrium situation.

Step 204, is optimized default acoustic model according to described self-adapting data.

Correspondingly, after obtaining objective self-adapting statement, it is possible to according to described objective self-adapting statement to default sound Model is optimized.

Specifically can use and linearly return (Maximum Likelihood Linear based on maximum likelihood Regression, MLLR) or maximum a posteriori probability (Maximum A Posteriori Linear Regression, MAPLR) The self adaptation of acoustic model is carried out Deng tradition adaptive approach.

Step 205, utilizes the acoustic model after optimizing to mark half and half Open-ended Question.

It should be noted that in actual applications, it is possible not only to utilize the double Open-ended Question type of acoustic model after optimizing to enter Row scoring, and then revaluation can also be carried out to reading aloud topic, improve the scoring accuracy reading aloud topic further.

The method of the raising oral evaluation performance that the embodiment of the present invention provides, reads aloud topic voice from examinee and extracts the most certainly Adapt to data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to and examinee's sound Examinee's model that color is consistent, changes into words person's correlation model by words person's independence model, i.e. by study user pronunciation characteristic automatically, Improve predetermined acoustic model and the matching degree of user pronunciation feature, drastically increase speech recognition effect, thus effectively carry The high accuracy of the even overall oral evaluation system scoring of semi-open topic type.

Correspondingly, the embodiment of the present invention also provides for a kind of system improving oral evaluation performance, as it is shown on figure 3, be that this is A kind of structural representation of system.

In this embodiment, described system includes:

Receiver module 301, for receiving user voice data to be evaluated, described speech data includes: read aloud topic voice Data and semi-open topic type speech data；

Read aloud topic grading module 302, for according to described in read aloud topic speech data and mark to respectively reading aloud topic；

Self-adapting data extraction module 303, for obtaining from the described appraisal result reading aloud topic grading module 302 output Self adaptation valid data；

Model optimization module 304, for being optimized default acoustic model according to described self adaptation valid data；

Semi-open topic grading module 305, half and half Open-ended Question is marked by the acoustic model after utilizing optimization.

In actual applications, above-mentioned topic grading module 302 of reading aloud may include that

It addition, in embodiments of the present invention, self-adapting data extraction module 303 specifically can select score to be higher than setting The speech data reading aloud topic of the first thresholding is as self adaptation valid data, or selects posterior probability higher than second set The speech data corresponding to basic voice unit of limit is as self adaptation valid data.

Further, owing to carrying out selecting or selecting self-adapting data by basic voice unit confidence level by sentence confidence level All there is the situation of each voice unit self-adapting data skewness weighing apparatus, thereby increases and it is possible to affect self adaptation effect.To this end, in the present invention In the system of embodiment, it is also possible to further the self adaptation valid data obtained are carried out voice unit equilibrium treatment, to improve Adaptive robustness.

As shown in Figure 4, it is the embodiment of the present invention another kind of structural representation of system that improves oral evaluation performance.

Unlike embodiment illustrated in fig. 3, in this embodiment, described system also includes:

First balance module 401, in described model optimization module 304 according to self-adapting data to default acoustic mode Before type is optimized, described self adaptation valid data are carried out voice unit equilibrium treatment.

This first balance module 401 specifically may include that statistic unit and first determines unit.Wherein:

Described statistic unit is for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of Bunch refer to the basic voice unit set that pronunciation is similar；

Described first determines that unit minimizes object function determine mesh for the number of times according to described all kinds of bunches of appearance, utilization Mark self adaptation statement, detailed process can refer to the description in above the inventive method embodiment, do not repeats them here.

Correspondingly, in this embodiment, model optimization module 304 according to described objective self-adapting statement to default acoustics Model is optimized.

As it is shown in figure 5, be the another kind of structural representation of the system of embodiment of the present invention raising oral evaluation performance.

Second balance module 501, in described model optimization module 304 according to self-adapting data to default acoustic mode Before type is optimized, described self adaptation valid data are carried out voice unit equilibrium treatment

This second balance module 501 includes: statistic unit and second determines unit.Wherein:

Described second determines that unit minimizes object function determine mesh for the number of times according to described all kinds of bunches of appearance, utilization The mark basic voice unit of self adaptation, detailed process can refer to the description in above the inventive method embodiment, do not repeats them here.

Correspondingly, in this embodiment, model optimization module 304 according to the basic voice unit of described objective self-adapting in advance If acoustic model be optimized.

The system of the raising oral evaluation performance that the embodiment of the present invention provides, reads aloud topic voice from examinee and extracts the most certainly Adapt to data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to and examinee's sound Examinee's model that color is consistent, changes into words person's correlation model by words person's independence model, drastically increases speech recognition effect, from And it is effectively improved the accuracy of the even overall oral evaluation system scoring of semi-open topic type.

Each embodiment in this specification all uses the mode gone forward one by one to describe, identical similar portion between each embodiment Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for system For executing example, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part sees embodiment of the method Part illustrate.System embodiment described above is only schematically, wherein said illustrates as separating component Module or unit can be or may not be physically separate, the parts shown as module or unit can be or Person may not be physical location, i.e. may be located at a place, or can also be distributed on multiple NE.Can root Factually border need select some or all of module therein to realize the purpose of the present embodiment scheme.Ordinary skill Personnel, in the case of not paying creative work, are i.e. appreciated that and implement.

The all parts embodiment of the present invention can realize with hardware, or to run on one or more processor Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that and can use in practice Microprocessor or digital signal processor (DSP) realize the some or all portions in system according to embodiments of the present invention The some or all functions of part.The present invention is also implemented as the part for performing method as described herein or complete The equipment in portion or device program (such as, computer program and computer program).It is achieved in that the program of the present invention can To store on a computer-readable medium, or can be to have the form of one or more signal.Such signal can be from Download on internet website and obtain, or provide on carrier signal, or provide with any other form.

Being described in detail the embodiment of the present invention above, the present invention is carried out by detailed description of the invention used herein Illustrating, the explanation of above example is only intended to help to understand the method and apparatus of the present invention；Simultaneously for this area one As technical staff, according to the thought of the present invention, the most all will change, to sum up institute Stating, this specification content should not be construed as limitation of the present invention.

Claims

1. the method improving oral evaluation performance, it is characterised in that including:

Receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and semi-open topic type voice Data；

Adaptation valid data it are obtained from from appraisal result；

Described self adaptation valid data are carried out voice unit equilibrium treatment, to determine objective self-adapting statement or objective self-adapting Basic voice unit；

According to described objective self-adapting statement or the basic voice unit of objective self-adapting, default acoustic model is optimized；

Method the most according to claim 1, it is characterised in that read aloud topic speech data described in described basis to respectively reading aloud topic Carry out scoring to include:

Described topic speech data of reading aloud with the described topic face text message reading aloud topic, is carried out word tone and aligns, in acquisition text word string The speech signal segment that each basic voice unit is corresponding；

Method the most according to claim 2, it is characterised in that described be obtained from adaptation effective data packets from appraisal result Include:

Method the most according to claim 3, it is characterised in that described described self adaptation valid data are carried out voice unit Equilibrium treatment, to determine that objective self-adapting statement includes:

Adding up in every self adaptation valid data all kinds of bunches of number of times occurred respectively, described all kinds of bunches refer to similar basic of pronunciation Voice unit set；

According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting statement.

The speech data selecting posterior probability corresponding higher than the basic voice unit of the second thresholding set is effective as self adaptation Data.

Method the most according to claim 5, it is characterised in that described described self adaptation valid data are carried out voice unit Equilibrium treatment, to determine that the basic voice unit of objective self-adapting includes:

According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines the basic voice unit of objective self-adapting.

7. according to the method described in any one of claim 1 to 6, it is characterised in that described according to described objective self-adapting statement Or default acoustic model is optimized and includes by the basic voice unit of objective self-adapting:

8. the system improving oral evaluation performance, it is characterised in that including:

Receiver module, for receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and half Open-ended Question type speech data；

Self-adapting data extraction module, for being obtained from adaptation effectively from the described appraisal result reading aloud topic grading module output Data；

Semi-open topic grading module, half and half Open-ended Question is marked by the acoustic model after utilizing optimization；

Described system also includes: the first balance module or the second balance module, wherein:

Described first balance module, for before default acoustic model being optimized according to described self-adapting data, right Described self adaptation valid data carry out voice unit equilibrium treatment, to determine objective self-adapting statement；

Described second balance module, for before default acoustic model being optimized according to described self-adapting data, right Described self adaptation valid data carry out voice unit equilibrium treatment, to determine the basic voice unit of objective self-adapting；

Described model optimization module, specifically for according to described objective self-adapting statement or described objective self-adapting basic voice list Default acoustic model is optimized by unit.

System the most according to claim 8, it is characterised in that described in read aloud topic grading module include:

Alignment unit, for described topic speech data of reading aloud is carried out word tone with the described topic face text message reading aloud topic and aligns, Obtain the speech signal segment that in text word string, each basic voice unit is corresponding；

Likelihood score computing unit, for calculating the likelihood score of the corresponding speech signal segment of described basic voice unit；

Posterior probability computing unit, for adding up, according to described likelihood score, the voice signal that described basic voice unit is corresponding The posterior probability of fragment；

System the most according to claim 9, it is characterised in that

Described self-adapting data extraction module, specifically for selecting the score voice number reading aloud topic higher than the first thresholding set According to as self adaptation valid data.

11. systems according to claim 10, it is characterised in that

Described first balance module includes:

First determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines that target is adaptive Answer statement.

12. systems according to claim 9, it is characterised in that

Described self-adapting data extraction module, specifically for selecting the posterior probability basic voice list higher than the second thresholding set The speech data of unit's correspondence is as self adaptation valid data.

13. systems according to claim 12, it is characterised in that described second balance module includes:

Second determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines that target is adaptive Should voice unit substantially.

14. according to Claim 8 to the system described in 13 any one, it is characterised in that

Described model optimization module, specifically for using the adaptive mode linearly returned based on maximum likelihood to default acoustics Model is optimized；Or use adaptive mode based on maximum a posteriori probability that default acoustic model is optimized.