CN103594087B - Improve the method and system of oral evaluation performance - Google Patents
Improve the method and system of oral evaluation performance Download PDFInfo
- Publication number
- CN103594087B CN103594087B CN201310553383.0A CN201310553383A CN103594087B CN 103594087 B CN103594087 B CN 103594087B CN 201310553383 A CN201310553383 A CN 201310553383A CN 103594087 B CN103594087 B CN 103594087B
- Authority
- CN
- China
- Prior art keywords
- topic
- data
- self
- adapting
- voice unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of method and system improving oral evaluation performance, the method includes: receive user voice data to be evaluated, and described speech data includes: read aloud topic speech data and semi-open topic type speech data;Mark to respectively reading aloud topic according to described topic speech data of reading aloud;Adaptation valid data it are obtained from from appraisal result;According to described self adaptation valid data, default acoustic model is optimized;Utilize the acoustic model after optimizing that half and half Open-ended Question is marked.Utilize the present invention, the accuracy of oral evaluation can be effectively improved.
Description
Technical field
The present invention relates to voice process technology field, be specifically related to a kind of method improving oral evaluation performance and be
System.
Background technology
As the important medium of interpersonal communication, conversational language occupies extremely important status in real life.Along with society
Development that can be economic and the aggravation of the trend of globalization, people to the efficiency of language learning and the objectivity of language assessment,
Fairness and scale test propose the highest requirement.Traditional artificial spoken language proficiency evaluating method makes Faculty and Students
At instructional blocks of time and be spatially very limited, teacher strength, teaching the aspect such as place, funds expenditure there is also many firmly
Gap on part and imbalance;The artificial individual deviation that cannot avoid evaluator self of evaluating and testing, thus it cannot be guaranteed that standards of grading
Unification, the most even cannot accurately reflect the true horizon of measured;And for extensive oral test, then need substantial amounts of
Human and material resources and financial support, limit regular, the assessment test of scale.To this end, industry have developed some languages in succession
Teach by precept and learn and evaluating system.
Oral evaluation is mainly concerned with two class topic types, i.e. reads aloud topic type and semi-open topic type.Wherein, read aloud topic type to refer to want
User is asked to read aloud pre-set text so as to the fluent journey investigating user's standard degree that basic voice unit is pronounced and statement is read aloud
Degree;Semi-open topic type refers to by suggestion contents such as system plays image, video or short essays, and according to these suggestion content requirements
User answers relevant issues or the spoken test event repeating broadcasting content etc..
For the oral evaluation of semi-open topic type, mainly by automatic speech recognition technology to user's language in prior art
Sound content carries out text identification, then carries out relevant commenting according to the recognition result statistics feature such as key vocabularies and phrase hit rate
Point.Judge whether key vocabularies and phrase syntax error occur owing to the oral evaluation standard of semi-open topic type essentially consists in, because of
This carries out the speech recognition correct recognition result of acquisition and is particularly important voice to be evaluated, how to improve and inscribes in semi-open property
In type oral evaluation, the accuracy of voice identification result is a urgently to be resolved hurrily major issue.
Summary of the invention
The embodiment of the present invention provides a kind of method and system improving oral evaluation performance, to improve the accurate of oral evaluation
Property.
To this end, the present invention provides following technical scheme:
A kind of method improving oral evaluation performance, including:
Receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and semi-open topic type
Speech data;
Mark to respectively reading aloud topic according to described topic speech data of reading aloud;
Adaptation valid data it are obtained from from appraisal result;
According to described self adaptation valid data, default acoustic model is optimized;
Utilize the acoustic model after optimizing that half and half Open-ended Question is marked.
Preferably, read aloud topic speech data described in described basis and carry out scoring and include respectively reading aloud topic:
Described topic speech data of reading aloud with the described topic face text message reading aloud topic, is carried out word tone and aligns, acquisition text word
The speech signal segment that in string, each basic voice unit is corresponding;
Calculate the likelihood score of the corresponding speech signal segment of described basic voice unit;
The posterior probability of the corresponding speech signal segment of described basic voice unit is added up according to described likelihood score;
The score respectively reading aloud topic is calculated according to described posterior probability.
Preferably, the described adaptation valid data that are obtained from from appraisal result include:
Select the score speech data reading aloud topic higher than the first thresholding set as self adaptation valid data.
Preferably, described method also includes:
Before default acoustic model being optimized according to described self adaptation valid data, effective to described self adaptation
Data carry out voice unit equilibrium treatment, including:
Adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to what pronunciation was similar to
Basic voice unit set;
According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting statement;
Described being optimized default acoustic model according to described self adaptation valid data includes: according to described target certainly
Adapt to statement default acoustic model is optimized.
Preferably, the described adaptation valid data that are obtained from from appraisal result include:
Select the posterior probability speech data corresponding higher than the basic voice unit of the second thresholding set as self adaptation
Valid data.
Preferably, described method also includes:
Before default acoustic model being optimized according to described self adaptation valid data, effective to described self adaptation
Data carry out voice unit equilibrium treatment, including:
Adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to what pronunciation was similar to
Basic voice unit set;
According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting basic voice list
Unit;
Described being optimized default acoustic model according to described self adaptation valid data includes: according to described target certainly
Adapt to basic voice unit default acoustic model is optimized.
Preferably, described being optimized default acoustic model according to described self adaptation valid data includes:
Use the adaptive mode linearly returned based on maximum likelihood that default acoustic model is optimized;Or
Use adaptive mode based on maximum a posteriori probability that default acoustic model is optimized.
A kind of system improving oral evaluation performance, including:
Receiver module, for receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data
With semi-open topic type speech data;
Read aloud topic grading module, for according to described in read aloud topic speech data and mark to respectively reading aloud topic;
Self-adapting data extraction module, for being obtained from adaptation from the described appraisal result reading aloud topic grading module output
Valid data;
Model optimization module, for being optimized default acoustic model according to described self adaptation valid data;
Semi-open topic grading module, half and half Open-ended Question is marked by the acoustic model after utilizing optimization.
Preferably, read aloud topic grading module described in include:
Alignment unit, for carrying out word tone pair by described topic speech data of reading aloud with the described topic face text message reading aloud topic
Together, the speech signal segment that in text word string, each basic voice unit is corresponding is obtained;
Likelihood score computing unit, for calculating the likelihood of the corresponding speech signal segment of described basic voice unit
Degree;
Posterior probability computing unit, for adding up, according to described likelihood score, the voice that described basic voice unit is corresponding
The posterior probability of signal segment;
Score calculation unit, for calculating the score respectively reading aloud topic according to described posterior probability.
Preferably, described self-adapting data extraction module, specifically for selecting score higher than the first thresholding bright set
The speech data of reading topic is as self adaptation valid data.
Preferably, described system also includes:
First balance module, for before default acoustic model being optimized according to described self-adapting data, right
Described self adaptation valid data carry out voice unit equilibrium treatment;Described first balance module includes:
Statistic unit, for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches
Refer to the basic voice unit set that pronunciation is similar;
First determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines target
Self adaptation statement;
Described model optimization module, specifically for carrying out excellent according to described objective self-adapting statement to default acoustic model
Change.
Preferably, described self-adapting data extraction module, specifically for selecting posterior probability higher than the second thresholding set
Speech data corresponding to basic voice unit as self adaptation valid data.
Preferably, described system also includes:
Second balance module, for before default acoustic model being optimized according to described self-adapting data, right
Described self adaptation valid data carry out voice unit equilibrium treatment;Described second balance module includes:
Statistic unit, for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches
Refer to the basic voice unit set that pronunciation is similar;
Second determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines target
The basic voice unit of self adaptation;
Described model optimization module, specifically for according to the basic voice unit of described objective self-adapting to default acoustic mode
Type is optimized.
Preferably, described model optimization module, the adaptive mode linearly returned based on maximum likelihood specifically for employing
Default acoustic model is optimized;Or use adaptive mode based on maximum a posteriori probability to default acoustic model
It is optimized.
The method and system of the raising oral evaluation performance that the embodiment of the present invention provides, read aloud topic voice from examinee and extract
Effectively self-adapting data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to
Examinee's model that examinee's tone color is consistent, changes into words person's correlation model by words person's independence model, drastically increases speech recognition
Effect, thus it is effectively improved the accuracy of the even overall oral evaluation system scoring of semi-open topic type.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to institute in embodiment
The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only described in the present invention
A little embodiments, for those of ordinary skill in the art, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic diagram of the evaluating method of double Open-ended Question type in prior art.
Fig. 2 is the flow chart that the embodiment of the present invention improves the method for oral evaluation performance;
Fig. 3 is a kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance;
Fig. 4 is the another kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance;
Fig. 5 is the another kind of structural representation that the embodiment of the present invention improves the system of oral evaluation performance.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and implement
The embodiment of the present invention is described in further detail by mode.
First below the oral evaluation method of double Open-ended Question type in prior art is briefly described.As it is shown in figure 1, be
The schematic diagram of the evaluating method of double Open-ended Question type in prior art.
This evaluating method comprises the following steps:
Step one: receive user voice signal input, i.e. examinee's phonetic entry.
Step 2: speech recognition, may also include the noise reduction pretreatment etc. of voice signal further.
Described phonic signal character refers to characterize the vector of user pronunciation feature, generally can extract and training set phase
The 39 dimension MFCC(Mel Frequency Cepstrum Coefficient joined, Mel frequency cepstral coefficient) feature etc..
Step 3: decoder, according to the phonic signal character extracted, determines the content of text that voice signal is corresponding.
Specifically, system is searched for optimal path in search network and determines optimal identification result.Described search network by
The acoustic model of systemic presupposition, language model are launched into huge search volume by static or dynamical fashion, and pass through
Viterbi algorithm obtains N-Best decoded result.
Step 4: according to identifying that the content of text obtained determines the spoken scoring of active user.
General system can calculate the feature acquisition such as key word or phrase hit ratio according to N-Best decoded result and comment
Point.
Wherein, acoustic model is for describing the mathematical model of each basic voice unit pronunciation characteristic, at statistical-simulation spectrometry
In application, the determination of its model parameter often to be added up on the training data of magnanimity and obtain, and concrete training process is as follows:
(1) training data is gathered;
(2) acoustic features of training data is extracted;
(3) acoustic model topological structure is set;
(4) acoustic model parameters training.
Language model training process is mainly: collects the training text needed for language model training, and uses the world at present
The N-Gram statistical language model of upper main flow, as the topological structure of described language model, then uses maximal possibility estimation to obtain
The conditional probability distribution of each word all history vocabulary in training text.
Obviously, above-mentioned steps three Chinese version recognition result directly influences the effect of oral evaluation, and recognition result is the most correct
Then oral evaluation is the most reliable.Under automatic speech recognition system is arranged, decoder according to preset acoustic model, language model,
It is launched into huge search volume by static or dynamical fashion, obtains N-Best decoded result by Viterbi algorithm.Know
Other accuracy rate is relevant with the accuracy of search network, particularly acoustic model and distinction.Acoustic model is the finest, and test wrapper
Border more coupling, then recognition accuracy is the highest.
And the acoustic model that legacy system uses is owing to being that system is trained in mass data in advance and obtained, and often has
Stronger universality, the identification to each concrete speaker accordingly has been short of.Particularly double Open-ended Question type spoken language scoring
, there is bigger difference in system, and examination hall environment is easily subject to factors impact between different examinee's tone colors, test environment and
Often there is bigger difference in training environment, the acoustic model of training in advance and examinee's tone color mismatch, causes speech recognition accurate
Rate is extremely low, and the dependency between N-Best recognition result and model answer that mutually deserved legacy speech recognition systems is given is the most relatively
Difference.
To this end, the embodiment of the present invention provides a kind of method and system improving oral evaluation performance, to particular user
During spoken language is evaluated and tested, first needs evaluation and test is read aloud topic part and carries out normal spoken scoring, to appraisal result analysis
Obtain efficient adaptive data, obtain user pronunciation customized information, then according to described self-adapting data to default sound
Model is optimized, and makes acoustic model match with user's tone color, then further according to the acoustic model after optimizing to semi-open
Topic type or even overall spoken topic type are evaluated and tested.
As in figure 2 it is shown, be the flow chart of the method for embodiment of the present invention raising oral evaluation performance, comprise the following steps:
Step 201, receives user voice data to be evaluated, and described speech data includes: read aloud topic speech data and half
Open-ended Question type speech data.
Step 202, marks to respectively reading aloud topic according to described topic speech data of reading aloud.
Specifically, word tone can be carried out with the topic face text message reading aloud topic align reading aloud topic speech data, obtain literary composition
The speech signal segment that in this word string, each basic voice unit is corresponding, then calculates the voice that each basic voice unit is corresponding
The likelihood score of signal segment, adds up, according to calculated likelihood score, the speech signal segment that each basic voice unit is corresponding
Posterior probability, calculate according to described posterior probability and respectively read aloud the score of topic.
Described posterior probability refers to the probability again revised after obtaining the information of " result ".
Assume that basic voice unit is Mi, the speech signal segment of its correspondence is Oi, basic voice unit MiCorresponding
Speech signal segment OiLikelihood score be P (Oi|Mi), then calculate basic voice unit MiCorresponding speech signal segment Oi's
Posterior probability P (Mi|Oi) process as follows:
First speech signal segment O is calculatediRelative to basic voice unit MiAffiliated obscures each base in voice unit set
The likelihood score of this voice unit:
P(Oi|Mj),j=1,2,...,i-1,i+1,...,K
Wherein, K is default voice unit number.
Obscuring voice unit set and can pre-set belonging to each basic voice unit, such as, can by all substantially
Voice unit is as obscuring voice unit set.Further, it is also possible to according to the classification of the current basic voice unit investigated
Determine identical category obscures voice unit set, and in evaluating and testing such as Mandarin Chinese, initial consonant replacement phoneme only allows to be initial sounds
Element.Further, it is also possible to select and the current basic voice unit investigated has the homophylic elementary cell conduct of pronunciation
Obscure voice unit set.
Then, sound bite O and basic voice unit M can be given according to new probability formulaiPosterior probability be:
In embodiments of the present invention, can the posterior probability of comprehensive each basic voice unit mark to reading aloud topic.Tool
Body ground, can read aloud obtaining of topic using the meansigma methods of the described posterior probability reading aloud the interior all basic voice units of topic as described
Point, i.e. read aloud must being divided into of topic:
Wherein, N is one and reads aloud the number of basic voice unit in topic.
Obviously, comprehensive score is the highest, illustrates that topic pronunciation read aloud in this sentence of examinee the most accurate.
Step 203, is obtained from adaptation valid data from appraisal result.
In order to ensure that acoustic model is adaptive reliably, it is desirable to self adaptation valid data are correct as far as possible, and can
The pronunciation characteristic of reflection user.
In embodiments of the present invention, for different users, different self-adapting datas can be obtained.Specifically, permissible
Have following several ways obtain self-adapting data:
Mode one, select by sentence confidence level, i.e. select the average posterior probability of sentence to exceed appointed threshold T1(ratio
Such as, T1=-0.85) the speech data reading aloud topic as effective self-adapting data.The pronunciation calibration of the most such sentence
Really, quality is high, and whole sentence is selected makees self adaptation valid data.
Mode two, select by basic voice unit confidence level, i.e. select posterior probability to exceed appointed threshold T2(different
Voice unit thresholding can be different, and T2 can be-0.7~-1.87) speech data corresponding to basic voice unit as adaptive
Answer valid data.
It should be noted that above-mentioned thresholding T2 can be identical with thresholding T1, it is also possible to different.
Further, owing to carrying out selecting or selecting self-adapting data by basic voice unit confidence level by sentence confidence level
All there is the situation of each voice unit self-adapting data skewness weighing apparatus, thereby increases and it is possible to affect self adaptation effect.To this end, in the present invention
In embodiment, it is also possible to further the self adaptation valid data obtained are carried out voice unit equilibrium treatment, to improve self adaptation
Robustness.
The self adaptation valid data obtained for above-mentioned first kind of way, carry out the process of voice unit equilibrium treatment such as
Under:
(1) adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to class
As basic voice unit set.
Such as, the number of the self adaptation statement of acquisition is S, counts in each statement all kinds of bunches of number of times occurred respectively:
Fk(Vi) and Fk(Ci), wherein, k=1,2 ..., S.
(2) utilize and minimize object function and determine objective self-adapting statement, described in minimize object function and be:
Wherein, pi∈ 0,1}, it is used for describing selected and unchecked situation in S statement, specifically, described target
Function optimization can be by manually by 2SSecondary searching loop and preferred statement as much as possible are pi=1 sets result.
With it, mode can be selected merely by sentence to eliminate the first self adaptation statement above-mentioned to a certain extent
Select the phenomenon that the basic language unit number brought is the most unbalance.
Obtaining self-adapting data for the above-mentioned second way, the process carrying out voice unit equilibrium treatment is as follows:
Assume that the number of times that in self adaptation statement, each class bunch occurs is F (Vi) and F (Ci), actual select participation adaptive
The each class bunch number answered is f (V respectivelyi),f(Vi)≤F(Vi) and f (Ci),f(Ci)≤F(Ci), by minimizing target letter
Number:
As a example by vowel, what the appearance of each class bunch was possible is chosen as F (Vi) plant, such M vowel class bunch has
Compound mode, the most possible compound mode isBy traversal institute likely combined situation, make
Obtaining target function value and take minima, now selecting the adaptive voice unit obtained is equilibrium situation.
Step 204, is optimized default acoustic model according to described self-adapting data.
Correspondingly, after obtaining objective self-adapting statement, it is possible to according to described objective self-adapting statement to default sound
Model is optimized.
Specifically can use and linearly return (Maximum Likelihood Linear based on maximum likelihood
Regression, MLLR) or maximum a posteriori probability (Maximum A Posteriori Linear Regression, MAPLR)
The self adaptation of acoustic model is carried out Deng tradition adaptive approach.
Step 205, utilizes the acoustic model after optimizing to mark half and half Open-ended Question.
It should be noted that in actual applications, it is possible not only to utilize the double Open-ended Question type of acoustic model after optimizing to enter
Row scoring, and then revaluation can also be carried out to reading aloud topic, improve the scoring accuracy reading aloud topic further.
The method of the raising oral evaluation performance that the embodiment of the present invention provides, reads aloud topic voice from examinee and extracts the most certainly
Adapt to data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to and examinee's sound
Examinee's model that color is consistent, changes into words person's correlation model by words person's independence model, i.e. by study user pronunciation characteristic automatically,
Improve predetermined acoustic model and the matching degree of user pronunciation feature, drastically increase speech recognition effect, thus effectively carry
The high accuracy of the even overall oral evaluation system scoring of semi-open topic type.
Correspondingly, the embodiment of the present invention also provides for a kind of system improving oral evaluation performance, as it is shown on figure 3, be that this is
A kind of structural representation of system.
In this embodiment, described system includes:
Receiver module 301, for receiving user voice data to be evaluated, described speech data includes: read aloud topic voice
Data and semi-open topic type speech data;
Read aloud topic grading module 302, for according to described in read aloud topic speech data and mark to respectively reading aloud topic;
Self-adapting data extraction module 303, for obtaining from the described appraisal result reading aloud topic grading module 302 output
Self adaptation valid data;
Model optimization module 304, for being optimized default acoustic model according to described self adaptation valid data;
Semi-open topic grading module 305, half and half Open-ended Question is marked by the acoustic model after utilizing optimization.
In actual applications, above-mentioned topic grading module 302 of reading aloud may include that
Alignment unit, for carrying out word tone pair by described topic speech data of reading aloud with the described topic face text message reading aloud topic
Together, the speech signal segment that in text word string, each basic voice unit is corresponding is obtained;
Likelihood score computing unit, for calculating the likelihood of the corresponding speech signal segment of described basic voice unit
Degree;
Posterior probability computing unit, for adding up, according to described likelihood score, the voice that described basic voice unit is corresponding
The posterior probability of signal segment;
Score calculation unit, for calculating the score respectively reading aloud topic according to described posterior probability.
It addition, in embodiments of the present invention, self-adapting data extraction module 303 specifically can select score to be higher than setting
The speech data reading aloud topic of the first thresholding is as self adaptation valid data, or selects posterior probability higher than second set
The speech data corresponding to basic voice unit of limit is as self adaptation valid data.
Further, owing to carrying out selecting or selecting self-adapting data by basic voice unit confidence level by sentence confidence level
All there is the situation of each voice unit self-adapting data skewness weighing apparatus, thereby increases and it is possible to affect self adaptation effect.To this end, in the present invention
In the system of embodiment, it is also possible to further the self adaptation valid data obtained are carried out voice unit equilibrium treatment, to improve
Adaptive robustness.
As shown in Figure 4, it is the embodiment of the present invention another kind of structural representation of system that improves oral evaluation performance.
Unlike embodiment illustrated in fig. 3, in this embodiment, described system also includes:
First balance module 401, in described model optimization module 304 according to self-adapting data to default acoustic mode
Before type is optimized, described self adaptation valid data are carried out voice unit equilibrium treatment.
This first balance module 401 specifically may include that statistic unit and first determines unit.Wherein:
Described statistic unit is for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of
Bunch refer to the basic voice unit set that pronunciation is similar;
Described first determines that unit minimizes object function determine mesh for the number of times according to described all kinds of bunches of appearance, utilization
Mark self adaptation statement, detailed process can refer to the description in above the inventive method embodiment, do not repeats them here.
Correspondingly, in this embodiment, model optimization module 304 according to described objective self-adapting statement to default acoustics
Model is optimized.
As it is shown in figure 5, be the another kind of structural representation of the system of embodiment of the present invention raising oral evaluation performance.
Unlike embodiment illustrated in fig. 3, in this embodiment, described system also includes:
Second balance module 501, in described model optimization module 304 according to self-adapting data to default acoustic mode
Before type is optimized, described self adaptation valid data are carried out voice unit equilibrium treatment
This second balance module 501 includes: statistic unit and second determines unit.Wherein:
Described statistic unit is for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of
Bunch refer to the basic voice unit set that pronunciation is similar;
Described second determines that unit minimizes object function determine mesh for the number of times according to described all kinds of bunches of appearance, utilization
The mark basic voice unit of self adaptation, detailed process can refer to the description in above the inventive method embodiment, do not repeats them here.
Correspondingly, in this embodiment, model optimization module 304 according to the basic voice unit of described objective self-adapting in advance
If acoustic model be optimized.
The system of the raising oral evaluation performance that the embodiment of the present invention provides, reads aloud topic voice from examinee and extracts the most certainly
Adapt to data, and utilize these data that acoustic model carries out Automatic Optimal, thus generic acoustic model is customized to and examinee's sound
Examinee's model that color is consistent, changes into words person's correlation model by words person's independence model, drastically increases speech recognition effect, from
And it is effectively improved the accuracy of the even overall oral evaluation system scoring of semi-open topic type.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical similar portion between each embodiment
Dividing and see mutually, what each embodiment stressed is the difference with other embodiments.Real especially for system
For executing example, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part sees embodiment of the method
Part illustrate.System embodiment described above is only schematically, wherein said illustrates as separating component
Module or unit can be or may not be physically separate, the parts shown as module or unit can be or
Person may not be physical location, i.e. may be located at a place, or can also be distributed on multiple NE.Can root
Factually border need select some or all of module therein to realize the purpose of the present embodiment scheme.Ordinary skill
Personnel, in the case of not paying creative work, are i.e. appreciated that and implement.
The all parts embodiment of the present invention can realize with hardware, or to run on one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that and can use in practice
Microprocessor or digital signal processor (DSP) realize the some or all portions in system according to embodiments of the present invention
The some or all functions of part.The present invention is also implemented as the part for performing method as described herein or complete
The equipment in portion or device program (such as, computer program and computer program).It is achieved in that the program of the present invention can
To store on a computer-readable medium, or can be to have the form of one or more signal.Such signal can be from
Download on internet website and obtain, or provide on carrier signal, or provide with any other form.
Being described in detail the embodiment of the present invention above, the present invention is carried out by detailed description of the invention used herein
Illustrating, the explanation of above example is only intended to help to understand the method and apparatus of the present invention;Simultaneously for this area one
As technical staff, according to the thought of the present invention, the most all will change, to sum up institute
Stating, this specification content should not be construed as limitation of the present invention.
Claims (14)
1. the method improving oral evaluation performance, it is characterised in that including:
Receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and semi-open topic type voice
Data;
Mark to respectively reading aloud topic according to described topic speech data of reading aloud;
Adaptation valid data it are obtained from from appraisal result;
Described self adaptation valid data are carried out voice unit equilibrium treatment, to determine objective self-adapting statement or objective self-adapting
Basic voice unit;
According to described objective self-adapting statement or the basic voice unit of objective self-adapting, default acoustic model is optimized;
Utilize the acoustic model after optimizing that half and half Open-ended Question is marked.
Method the most according to claim 1, it is characterised in that read aloud topic speech data described in described basis to respectively reading aloud topic
Carry out scoring to include:
Described topic speech data of reading aloud with the described topic face text message reading aloud topic, is carried out word tone and aligns, in acquisition text word string
The speech signal segment that each basic voice unit is corresponding;
Calculate the likelihood score of the corresponding speech signal segment of described basic voice unit;
The posterior probability of the corresponding speech signal segment of described basic voice unit is added up according to described likelihood score;
The score respectively reading aloud topic is calculated according to described posterior probability.
Method the most according to claim 2, it is characterised in that described be obtained from adaptation effective data packets from appraisal result
Include:
Select the score speech data reading aloud topic higher than the first thresholding set as self adaptation valid data.
Method the most according to claim 3, it is characterised in that described described self adaptation valid data are carried out voice unit
Equilibrium treatment, to determine that objective self-adapting statement includes:
Adding up in every self adaptation valid data all kinds of bunches of number of times occurred respectively, described all kinds of bunches refer to similar basic of pronunciation
Voice unit set;
According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines objective self-adapting statement.
Method the most according to claim 2, it is characterised in that described be obtained from adaptation effective data packets from appraisal result
Include:
The speech data selecting posterior probability corresponding higher than the basic voice unit of the second thresholding set is effective as self adaptation
Data.
Method the most according to claim 5, it is characterised in that described described self adaptation valid data are carried out voice unit
Equilibrium treatment, to determine that the basic voice unit of objective self-adapting includes:
Adding up in every self adaptation valid data all kinds of bunches of number of times occurred respectively, described all kinds of bunches refer to similar basic of pronunciation
Voice unit set;
According to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines the basic voice unit of objective self-adapting.
7. according to the method described in any one of claim 1 to 6, it is characterised in that described according to described objective self-adapting statement
Or default acoustic model is optimized and includes by the basic voice unit of objective self-adapting:
Use the adaptive mode linearly returned based on maximum likelihood that default acoustic model is optimized;Or
Use adaptive mode based on maximum a posteriori probability that default acoustic model is optimized.
8. the system improving oral evaluation performance, it is characterised in that including:
Receiver module, for receiving user voice data to be evaluated, described speech data includes: read aloud topic speech data and half
Open-ended Question type speech data;
Read aloud topic grading module, for according to described in read aloud topic speech data and mark to respectively reading aloud topic;
Self-adapting data extraction module, for being obtained from adaptation effectively from the described appraisal result reading aloud topic grading module output
Data;
Model optimization module, for being optimized default acoustic model according to described self adaptation valid data;
Semi-open topic grading module, half and half Open-ended Question is marked by the acoustic model after utilizing optimization;
Described system also includes: the first balance module or the second balance module, wherein:
Described first balance module, for before default acoustic model being optimized according to described self-adapting data, right
Described self adaptation valid data carry out voice unit equilibrium treatment, to determine objective self-adapting statement;
Described second balance module, for before default acoustic model being optimized according to described self-adapting data, right
Described self adaptation valid data carry out voice unit equilibrium treatment, to determine the basic voice unit of objective self-adapting;
Described model optimization module, specifically for according to described objective self-adapting statement or described objective self-adapting basic voice list
Default acoustic model is optimized by unit.
System the most according to claim 8, it is characterised in that described in read aloud topic grading module include:
Alignment unit, for described topic speech data of reading aloud is carried out word tone with the described topic face text message reading aloud topic and aligns,
Obtain the speech signal segment that in text word string, each basic voice unit is corresponding;
Likelihood score computing unit, for calculating the likelihood score of the corresponding speech signal segment of described basic voice unit;
Posterior probability computing unit, for adding up, according to described likelihood score, the voice signal that described basic voice unit is corresponding
The posterior probability of fragment;
Score calculation unit, for calculating the score respectively reading aloud topic according to described posterior probability.
System the most according to claim 9, it is characterised in that
Described self-adapting data extraction module, specifically for selecting the score voice number reading aloud topic higher than the first thresholding set
According to as self adaptation valid data.
11. systems according to claim 10, it is characterised in that
Described first balance module includes:
Statistic unit, for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to
The basic voice unit set that pronunciation is similar;
First determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines that target is adaptive
Answer statement.
12. systems according to claim 9, it is characterised in that
Described self-adapting data extraction module, specifically for selecting the posterior probability basic voice list higher than the second thresholding set
The speech data of unit's correspondence is as self adaptation valid data.
13. systems according to claim 12, it is characterised in that described second balance module includes:
Statistic unit, for adding up all kinds of bunches of number of times occurred in every self adaptation valid data respectively, described all kinds of bunches refer to
The basic voice unit set that pronunciation is similar;
Second determines unit, and for according to the described all kinds of bunches of number of times occurred, utilization minimizes object function and determines that target is adaptive
Should voice unit substantially.
14. according to Claim 8 to the system described in 13 any one, it is characterised in that
Described model optimization module, specifically for using the adaptive mode linearly returned based on maximum likelihood to default acoustics
Model is optimized;Or use adaptive mode based on maximum a posteriori probability that default acoustic model is optimized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310553383.0A CN103594087B (en) | 2013-11-08 | 2013-11-08 | Improve the method and system of oral evaluation performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310553383.0A CN103594087B (en) | 2013-11-08 | 2013-11-08 | Improve the method and system of oral evaluation performance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103594087A CN103594087A (en) | 2014-02-19 |
CN103594087B true CN103594087B (en) | 2016-10-12 |
Family
ID=50084194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310553383.0A Active CN103594087B (en) | 2013-11-08 | 2013-11-08 | Improve the method and system of oral evaluation performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103594087B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104464757B (en) * | 2014-10-28 | 2019-01-18 | 科大讯飞股份有限公司 | Speech evaluating method and speech evaluating device |
CN104464423A (en) * | 2014-12-19 | 2015-03-25 | 科大讯飞股份有限公司 | Calibration optimization method and system for speaking test evaluation |
CN106409291B (en) * | 2016-11-04 | 2019-12-17 | 南京侃侃信息科技有限公司 | Method for implementing voice search list |
CN106652622B (en) * | 2017-02-07 | 2019-09-17 | 广东小天才科技有限公司 | A kind of text training method and device |
CN107240394A (en) * | 2017-06-14 | 2017-10-10 | 北京策腾教育科技有限公司 | A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system |
CN107808674B (en) * | 2017-09-28 | 2020-11-03 | 上海流利说信息技术有限公司 | Method, medium and device for evaluating voice and electronic equipment |
CN108417207B (en) * | 2018-01-19 | 2020-06-30 | 苏州思必驰信息科技有限公司 | Deep hybrid generation network self-adaption method and system |
TWI683290B (en) * | 2018-06-28 | 2020-01-21 | 吳雲中 | Spoken language teaching auxiliary method and device |
CN109522301A (en) * | 2018-11-07 | 2019-03-26 | 平安医疗健康管理股份有限公司 | A kind of data processing method, electronic equipment and storage medium |
CN110164414B (en) * | 2018-11-30 | 2023-02-14 | 腾讯科技(深圳)有限公司 | Voice processing method and device and intelligent equipment |
CN109658918A (en) * | 2018-12-03 | 2019-04-19 | 广东外语外贸大学 | A kind of intelligence Oral English Practice repetition topic methods of marking and system |
CN110111775B (en) * | 2019-05-17 | 2021-06-22 | 腾讯科技(深圳)有限公司 | Streaming voice recognition method, device, equipment and storage medium |
CN110489756B (en) * | 2019-08-23 | 2020-10-27 | 上海松鼠课堂人工智能科技有限公司 | Conversational human-computer interactive spoken language evaluation system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1139318A1 (en) * | 1999-09-27 | 2001-10-04 | Kojima Co., Ltd. | Pronunciation evaluation system |
CN102034475A (en) * | 2010-12-08 | 2011-04-27 | 中国科学院自动化研究所 | Method for interactively scoring open short conversation by using computer |
CN102354495A (en) * | 2011-08-31 | 2012-02-15 | 中国科学院自动化研究所 | Testing method and system of semi-opened spoken language examination questions |
CN103065626A (en) * | 2012-12-20 | 2013-04-24 | 中国科学院声学研究所 | Automatic grading method and automatic grading equipment for read questions in test of spoken English |
-
2013
- 2013-11-08 CN CN201310553383.0A patent/CN103594087B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1139318A1 (en) * | 1999-09-27 | 2001-10-04 | Kojima Co., Ltd. | Pronunciation evaluation system |
CN102034475A (en) * | 2010-12-08 | 2011-04-27 | 中国科学院自动化研究所 | Method for interactively scoring open short conversation by using computer |
CN102354495A (en) * | 2011-08-31 | 2012-02-15 | 中国科学院自动化研究所 | Testing method and system of semi-opened spoken language examination questions |
CN103065626A (en) * | 2012-12-20 | 2013-04-24 | 中国科学院声学研究所 | Automatic grading method and automatic grading equipment for read questions in test of spoken English |
Non-Patent Citations (1)
Title |
---|
面向大规模英语口语机考的复述题自动评分技术;严可等;《清华大学学报(自然科学版)》;20091231;第49卷(第S1期);摘要、第1358页左栏第1-19行、第1358页右栏倒数第1-2行、图1 * |
Also Published As
Publication number | Publication date |
---|---|
CN103594087A (en) | 2014-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103594087B (en) | Improve the method and system of oral evaluation performance | |
CN103559894B (en) | Oral evaluation method and system | |
US8392190B2 (en) | Systems and methods for assessment of non-native spontaneous speech | |
CN103559892B (en) | Oral evaluation method and system | |
CN110782921B (en) | Voice evaluation method and device, storage medium and electronic device | |
CN101740024B (en) | Method for automatic evaluation of spoken language fluency based on generalized fluency | |
Kim et al. | Automatic pronunciation scoring of specific phone segments for language instruction. | |
CN105845134A (en) | Spoken language evaluation method through freely read topics and spoken language evaluation system thereof | |
CN104464423A (en) | Calibration optimization method and system for speaking test evaluation | |
CN108154735A (en) | Oral English Practice assessment method and device | |
WO2013172531A1 (en) | Language learning system and learning method | |
CN109147765A (en) | Audio quality comprehensive evaluating method and system | |
US9489864B2 (en) | Systems and methods for an automated pronunciation assessment system for similar vowel pairs | |
US9262941B2 (en) | Systems and methods for assessment of non-native speech using vowel space characteristics | |
CN106558252B (en) | Spoken language practice method and device realized by computer | |
CN103985391A (en) | Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation | |
CN103985392A (en) | Phoneme-level low-power consumption spoken language assessment and defect diagnosis method | |
CN107240394A (en) | A kind of dynamic self-adapting speech analysis techniques for man-machine SET method and system | |
Evanini et al. | Content-based automated assessment of non-native spoken language proficiency in a simulated conversation | |
Loukina et al. | Automated scoring across different modalities | |
CN113205729A (en) | Foreign student-oriented speech evaluation method, device and system | |
Shashidhar et al. | Automatic spontaneous speech grading: A novel feature derivation technique using the crowd | |
CN110349567A (en) | The recognition methods and device of voice signal, storage medium and electronic device | |
Zechner et al. | Automatic scoring of children’s read-aloud text passages and word lists | |
CN114241835A (en) | Student spoken language quality evaluation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666 Applicant after: Iflytek Co., Ltd. Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666 Applicant before: Anhui USTC iFLYTEK Co., Ltd. |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |