CN109271482A

CN109271482A - A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice

Info

Publication number: CN109271482A
Application number: CN201811030689.7A
Authority: CN
Inventors: 刘健刚; 李晨; 马冬梅; 陈美华; 赵力
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2018-09-05
Filing date: 2018-09-05
Publication date: 2019-01-25

Abstract

The invention discloses a kind of implementation methods of automatic Evaluation Platform of postgraduates'english oral teaching voice, the platform is suitable for postgraduate ESP oral English teaching, and the voice-over-net identification and oral expression that keyword is carried out in online international conference Oral English Teaching system platform are assessed automatically.It the described method comprises the following steps: (1) establishing the special corpus of ESP；(2) user speech identification feature information bank is established；(3) algorithm model that building international conference oral English teaching voice is assessed and realized automatically；(4) characteristic information analysis processing and the excavation and approach realizing " keyword " retrieval and identify.The present invention changes teaching pattern, is assessed in conjunction with artificial intelligence, to adapt to be compatible with international demand, and covers the assessment element of three grammer of English, syntax and context aspects, teaching and evaluation capacity with practical significance and raising Oral English Practice.

Description

A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice

Technical field

The invention belongs to online Web-based instruction exchange, oral English teaching and its voice evaluation systems, are related to a kind of postgraduate's English The implementation method of the automatic Evaluation Platform of language oral English teaching voice.

Background technique

ESP (English for Specific Purposes) i.e. English for specific purpose, be and it is a certain it is particular professional, learn What section or occupation were closely related.ESP itself is not a kind of special linguistic form, it is a kind of language based on learner's demand It says teaching method, there is specific instructional objective.Demand of the ESP based on student and the course designed, it relates generally to demand point Several main aspects such as analysis, Course Exercise, teaching methodology, compilation of teaching materials, Curriculum Evaluation and teacher's development.Based on teaching demand, layer The difference of secondary, purpose and epoch calling, postgraduate students English teaching are set as ESP mostly.ESP based on functionalism Viewpoint of Language is The restriction sample-- " international conference " for solving postgraduate's oral English teaching provides theoretical basis, simplifies and facilitates in new technology Under the conditions of Network Recognition voice scheme.By the platform construction of " keyword " speech recognition, reach the religion of Graduate English spoken language Tending to for learning and assess is perfect.

Voice-over-net knows method for distinguishing using DSP speech recognition equipment in existing English oral language machine examination system, mainly fits For outdoor different occasion, unspecified person speech recognitions.This method is concentrated mainly on to traditional noise reduction audio technology progress It improves, to propose a kind of voice-over-net identifying schemes of unspecified person in English net test system.This method is only limitted to country Standard test, network settings Platform Requirements are excessively high, are not appropriate for the network platform of each university in the whole nation, generalization is weaker, does not have There is commonly used practical significance and popular popularizes.

Summary of the invention

Goal of the invention: for the deficiency of above-mentioned existing method, the present invention provides a kind of postgraduates'english oral teaching voice The implementation method of automatic Evaluation Platform, is suitble to the online Oral English Teaching system of each university's audio-visual education programme present situation and English teaching mode The system of platform improves the quality of instruction and intelligent evaluation ability of Oral English Practice.

Technical solution: a kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice, including walk as follows It is rapid:

(1) the special corpus of ESP, the special corpus international conference English Grammar vocabulary of ESP are established；

(2) subscriber identity information library is established, the subscriber identity information library includes voice recognition information；

(3) building international conference oral English teaching voice assesses identifying system model automatically；

(4) characteristic information analysis is handled, and the characteristic information analysis processing includes that extraction, retrieval and the spoken language of keyword are commented Estimate.

Further, corpus described in step (1) includes three kinds, respectively ESP spoken corpus, basic subject corpus Library and special disciplines corpus；

Further, it includes voice conversion text and direct language that step (2), which establishes the speech recognition in subscriber identity information library, Sound processing carries out user and extracts characteristic.

Further, step (2) includes the following steps:

(21) pretreatment and feature extraction

Firstly, carrying out mute detection based on energy and zero-crossing rate method, and remove noise with spectrum-subtraction, to voice signal into Row preemphasis, framing, and linear prediction analysis is carried out, cepstrum coefficient is then found out from obtained LPC coefficient as speaker The feature vector of identification；

(22) training

The characteristic vector sequence X that step (21) is extracted enters time-delay neural network after delay, and TDNN study is special The structure for levying vector, extracts the temporal information of characteristic vector sequence；Then learning outcome is mentioned in the form of residual error feature vector Gauss hybrid models are supplied, GMM model training are carried out by greatest hope method, and utilize the inversion method backward with inertia Update the weight coefficient of TDNN；

(23) it identifies

The output sequence O of characteristic vector sequence X and time-delay neural network is subtracted each other, and obtained residual sequence R is provided To GMM model, the sequence R=R of T residual error vector₁,R₂,...,R_T, GMM probability expression is as follows:

It is indicated in log-domain are as follows:

Speech recognition is based on Bayes' theorem, in the model of N number of user, person if the maximum model of likelihood probability is corresponding For target speaker, calculation expression is as follows:

Further, specific step is as follows for step (22) training:

(a) GMM model and TDNN structure are determined

The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, computational chart It is as follows up to formula:

X in above formula_tFor D dimensional feature vector, D=13 here；b_i(x_t) it is member's density function, it be mean value vector is u_i, Covariance matrix is ∑_iGaussian function；

p_iIt is that mixed weight-value mixed weight-value meets condition:Complete GMM model parameter is as follows:

λ={ (p_i,u_i,∑_i), i=1,2 ..., M }

According to the time-delay neural network without feedback, feature vector x (n) after the delay of linear delay block, as when The input of time-delay neural network, time-delay neural network carry out nonlinear transformation to input, and then linear weighted function, obtains output vector, Again compared with feature vector x (n), criterion is lowest mean square criterion；The neuron number of the hidden layer of time-delay neural network with The ratio of the number of the neuron of input layer is 3:2, and nonlinear activation S function isY is after being weighted summation Input；In training, inertia coeffeicent γ=0.8 of neural network；

(b) condition of convergence and maximum number of iterations are set

The condition of convergence is the Euclidean distance of the adjacent coefficient of GMM twice and TDNN weight coefficient less than 0.0001, greatest iteration time Number is not more than 100；

(c) TDNN and GMM model parameter of primary iteration are determined at random

The initial coefficients of TDNN are set as the pseudo random number generated by computer, and the initial mixing coefficient of GMM is 1/M, and M is The mixing item number of GMM, GMM initial mean value and variance generate M polymeric type by LBG method by the residual vector of TDNN, respectively The mean value and variance for calculating this M polymeric type obtain；

(d) feature vector x (n) is inputted TDNN network, the defeated of feature vector x (n) and the TDNN before TDNN will be passed through Feature vector o (n) subtracts each other out, obtains all residual vectors；

(e) using the parameter of greatest hope method amendment GMM model

If residual vector is r_t, classification posterior probability is calculated first, and calculation expression is as follows:

Then mixed weight-value is updatedMean value vectorAnd covariance matrixCalculation expression is as follows:

(f) using weight coefficient, mean vector and the variance of each Gaussian Profile in revised GMM model, residual error band Enter, obtain a likelihood probability, TDNN parameter is corrected by the inversion method backward with inertia, makeover process is as follows:

TDNN parameter maximizes to obtain by the function in following formula:

Wherein o_tFor neural network output, x_tFor the characteristic vector of input；

Above formula is taken and is taken after logarithm again negative, is obtained:

Ask G (X) its iterative formula as follows using the inversion method backward with inertia:

Wherein, For in the m times iteration, connection inputs x_iWith it is defeated Y out_jWeight coefficient, k be neural network sequence number, α is iteration step length, F (x)=- lnp ((x_t-o_t) | λ), γ is inertial system Number；

(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if It is, then deconditioning otherwise, to skip to step (d).

Further, specific step is as follows for step (3):

(31) voice inputs: the voice content of input includes international conference opening speech, introduces keynote adress people, paper a surname It says, put question to the carrier language given a lecture with answer, closing；

(32) feature and the abnormal spoken mistake LSTM disaggregated model of machine learning method training, mouth based on broad sense fluency Language scoring regression analysis model and spoken diagnostic rule model；

(33) according to the gender of the script of voice data difference topic and enunciator, corresponding speech recognition system is configured；

(34) to word speed in voice data, coherent, content understanding, advanced basketball skills and reconstruct mark feature quantify, and extract language Fluency feature in sound data；

(35) it is based on Gaussian process regression fit analysis method, detection and scoring to abnormal spoken fluency mistake are examined It is disconnected.

Further, step (4) includes the reasoning algorithm based on gauss hybrid models, obtains best adjacent channel.

The utility model has the advantages that the present invention is compared with prior art, significant effect is: one aspect of the present invention combination English teaching The characteristics of and rule, be incision theme with " international conference ", cover the assessment of three grammer of English, syntax and context aspects Intension carries out the generalization of teaching pattern change in national graduate education to have；On the other hand there is practicability, root System evaluation is improved into the expression of oral English teaching and international conference closer to real standard according to the phonetic feature of user and speaker Ability.

Detailed description of the invention

Fig. 1 is processing flow schematic diagram of the invention；

Fig. 2 is the special Oral English Practice building of corpus flow diagram of international conference of the present invention；

Fig. 3 is ESP international conference spoken language recognition methods and system structure diagram；

Fig. 4 is characteristic information analysis processing and keyword retrieval processing flow schematic diagram.

Specific embodiment

In order to which technical solution disclosed by the invention is described in detail, with reference to the accompanying drawings of the specification and its specific embodiment is done It is further elucidated above.

A kind of implementation method for automatic Evaluation Platform of postgraduates'english oral teaching voice that the present invention supplies, this method include Following steps:

(1) ESP " international conference " special corpus is established；

(2) information bank of speech recognition " particular person " is established, the meaning of " particular person " is the owner using the system Member, including user and speaker；

(3) platform framework that building " international conference " oral English teaching voice is assessed and realized automatically；

(4) excavation and approach of " keyword " retrieval and identification are realized.

Wherein, the special corpus of international conference described in step (1) is divided into three levels, the special English mouth of respectively ESP Language corpus, postgraduate's subject corpus and postgraduate's profession corpus.The corpus further includes mentioning for user's phonetic feature It takes, voice is read aloud by the article that typing covers 48 phonetic symbols of English when user is in registration, extracts each user's Speech characteristic parameter.

The foundation of speech recognition " particular person " information bank described in step (2) is based on automatic speaker and recognizes (Automatic Speaker Identification, referred to as ASI) technology, comprising the following steps:

(2-1) pretreatment and feature extraction；

Mute detection is carried out based on the method for energy and zero-crossing rate firstly, having used, and removes noise with spectrum-subtraction, and right Voice signal carries out preemphasis, framing, and carries out linear prediction (LPC) analysis, then finds out cepstrum from obtained LPC coefficient Feature vector of the coefficient as Speaker Identification；

(2-2) training；

When training, the feature vector extracted is entered into time-delay neural network (TDNN) after delay, TDNN study is special The structure for levying vector, extracts the temporal information of characteristic vector sequence；Then learning outcome is mentioned in the form of residual error feature vector It supplies gauss hybrid models (GMM), GMM model training is carried out using greatest hope method, and utilize anti-backward with inertia Drill the weight coefficient that method updates TDNN；Specific training process is as follows:

(a) GMM model and TDNN structure are determined:

The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, Ke Yiyong Following form indicates:

λ={ (p_i,u_i,∑_i), i=1,2 ..., M }

Herein, that utilize is the TDNN without feedback；Feature vector x (n) after the delay of linear delay block, as The input of TDNN, TDNN carry out nonlinear transformation to input, then linear weighted function, obtain output vector, then with feature vector ratio Compared with usually used criterion is lowest mean square criterion (MMSE)；The neuron number of the hidden layer of TDNN and the nerve of input layer The ratio of the number of member is 3:2, and nonlinear activation S function isY is the input being weighted after summation；In training When, inertia coeffeicent γ=0.8 of neural network；

(b) condition of convergence and maximum number of iterations are set；Specifically, the condition of convergence be the adjacent coefficient of GMM twice with For the Euclidean distance of TDNN weight coefficient less than 0.0001, maximum number of iterations is usually more than 100；

(c) TDNN and GMM model parameter of primary iteration are determined at random；The initial coefficients of TDNN are set as being produced by computer Raw pseudo random number, the initial mixing coefficient of GMM can be taken as 1/M, and M is the mixing item number of GMM, GMM initial mean value and variance M polymeric type is generated by LBG (Linde, Buzo, Gray) method by the residual vector of TDNN, calculates separately this M polymeric type Mean value and variance obtain；

(e) using the parameter of greatest hope method amendment GMM model；

If residual vector is r_t, classification posterior probability is calculated first:

Then mixed weight-value is updatedMean value vectorAnd covariance matrix

(f) using the weight coefficient of each Gaussian Profile of revised GMM model, mean vector and variance, residual error is brought into, A likelihood probability is obtained, corrects TDNN parameter using the inversion method backward with inertia；

TDNN parameter is by making the function in following formula maximize to obtain:

(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if It is, then deconditioning otherwise, to skip to step (d)；

(2-3) identification

When identification, characteristic vector sequence X inputs TDNN after delay；Then the output sequence O of X and TDNN is subtracted each other into institute Obtained residual sequence R is supplied to GMM model, for the sequence R=R of T residual error vector₁,R₂,...,R_T, its GMM probability It can be written as:

It is indicated in log-domain are as follows:

Bayes' theorem is used when identification, in the model of N number of unknown words person, if the maximum model of likelihood probability is corresponding Person is target speaker:

The feature of the technology is test statement or keyword described in default speaker (particular person), therefrom extracts and speaks The related information of people's feature, then compared with the reference model of storage, make correct judgement.

Further, the platform that building " international conference " oral English teaching voice is assessed and realized automatically described in step (3) Frame, specific implementation flow are as follows:

(3-1) utilizes voice-input device, and acquisition ESP " international conference " (opens and gives a lecture) by Opening Speech, Introduction to Key-note Speakers (introducing keynote adress people), Essay Speech (paper is explained and publicised), Questions and Answers (puts question to and answers) the carrier language of and Closing Speech (closing is given a lecture)；

(3-2) is classified using the abnormal spoken mistake LSTM of method training of feature and machine learning based on broad sense fluency Model, spoken scoring regression analysis model and spoken diagnostic rule model；

(3-3) configures corresponding speech recognition system according to the script of voice data difference topic and the gender of enunciator；

(3-4) is quantified using coherent, content understanding, advanced basketball skills to word speed in voice data and reconstruct mark feature, Computer is automatically from the comprehensive fluency feature extracted in voice data of expert's evaluation perspective；

(3-5) uses the method based on Gaussian process regression fit analysis, realizes the inspection to abnormal spoken fluency mistake It surveys and scores, diagnosis.

The platform framework construction process that " international conference " oral English teaching voice is assessed and realized automatically is as shown in Figure 1, have Two features: first is that the speech recognition of " particular person " is established on matching acoustic model basis, it is related to identifying by matching It is associated with voice, reaches automatic assessment；Second is that first voice conversion text, successively self-built composition are established in the speech recognition of " particular person " System is corrected accurately to be assessed.It is retrieved for " keyword " and needs certain realization means with the excavation identified, it is characterized in that: from (linguistics) semanteme, grammer and the associated theory of a language piece are set about, and excavate theory in conjunction with big data, are found out best adjacent channel and are calculated Method.It is arranged to find out with the family of languages if similar context, and determines its regularity of distribution, the special corpus of international conference Oral English Practice " keyword " retrieval method is selected in setting, is gone to carry out super sentence structural analysis while being carried out language (language) function, making a variation and its make The analysis given with situation.

After the present invention improves the ESP theory classified according to Hutchinson (Hutchinson) and Waters (Waters) ESP classification, to adapt to the actual conditions of current China postgraduate discipline classification, convenient for establish be directed to subject effective corpus Classification.As shown in Figure 2.The special Oral English Practice corpus of ESP international conference should be set based on the classification of current China university subject It sets.In this way convenient for ESP teaching more approach profession, teaching efficiency tends to be ideal.

Special Oral English Practice corpus classification reason, work, the big subject module of society three of ESP international conference, such as Fig. 3 institute Show.Substantially cover the subject of Chinese Universities '.The feature of this classification is, to (Hutchinson) and Waters (Waters) ESP Classification adjusted and supplemented, have modified English for science and technology, eliminate Business English subject, hyperphoric is natural sciences and industry science Section.Possibility and actuality are provided specifically to segment major special disciplines under three macrotaxonomy module of natural sciences, industry science and social sciences. The special Oral English Practice corpus of ESP international conference divides into natural sciences, industry science and the big subject module of social sciences three.Section below three big modules Subject refinement module time corpus is established according to respective specific refinement special disciplines.Natural sciences divide into such as physics subject corpus, number Learn the corpus branches such as subject corpus, chemistry subject corpus time corpus；Industry science divides into such as facing Information Science corpus, energy subject language The corpus branches such as material, mechanical subject corpus time corpus；Social sciences divides into such as philosophy subject corpus, art discipline corpus, law science Ke Deng corpus branch time corpus.The foundation of the special Oral English Practice corpus of ESP international conference, convenient for using computer to student Voice document carry out matching comparison, carry out spoken assessment.The special Oral English Practice for being established as ESP international conference of corpus Identification and the assessment method of having established realization means.

The present invention is to the recognition methods of ESP international conference spoken language and the general introduction of system structure.First to described in postgraduate Test statement or keyword are pre-processed, and therefrom extract related with speaker's feature information, then with corresponding English Language spoken language materials are compared, and make correct judgement.

The method and system structure of ESP international conference spoken language identification, initially sets up and applies a postgraduate ESP world meeting The spoken word recognition system of view.This system is by establishing each subject spoken language materials specifically segmented, from corresponding profession research The personal characteristics that this disciplines postgraduate is extracted in one section of raw spoken audio, passes through the analysis to these personal characteristics And identification, to achieve the purpose that this disciplines postgraduate is recognized or confirmed.Whole process is by pretreatment, spy The several majors composition such as sign extraction, pattern match and judgement.

The completion of the method and system structure of ESP international conference spoken language identification is implemented, and two stages can be divided into, that is, trains (registration) stage and cognitive phase.In the training stage, each of system postgraduate reads aloud the text for covering 48 phonetic symbols of English Chapter, forms the training corpus of several corresponding each specific disciplines, and system is learnt according to these training corpus by training Establish the template or model parameter reference set of each user.

In cognitive phase, characteristic parameter extracted to the corpus of the specific Specialty Graduate of each subject, and in training The reference parameter collection or model template that each single postgraduate obtains in the process are compared, and quasi- according to certain similitude It is then matched, and is determined, to achieve the purpose that assessment.

Fig. 4 expression, the platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically, and illustrate operation Realization means:

" keyword " retrieval and identification.ESP " international conference " by retrieval Opening Speech (opening speech), Introduction to Key-note Speakers (introducing keynote adress people), Essay Speech (paper is explained and publicised), The carrier language keyword of Questions and Answers (put question to and answer) and Closing Speech (closing is given a lecture), It establishes the special corpus of " international conference " Oral English Practice (forgiving letter signal and audio signal).Language in this special corpus Material association natural sciences, industry science and the big subject module of social sciences three.

The platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically covers Oral English Teaching and survey 3 big language points of examination: stating, talk with, answer a question, and solves the association of grammer, syntax and context, coordination structure and function It is unified.In terms of grammer, the creation analysis assessment rule in terms of dictionary, syntax rule two；In terms of syntax, from knowledge dynamic Aspect creation analysis assessment rule, preferably selects word；In terms of context, the creation analysis assessment rule in terms of context logic.

The assessment and processing on platform that postgraduate " international conference " oral English teaching is assessed and realized automatically, mainly from super sentence The aspects such as structure, language (language) function, language (language) variation, context set about being handled.Super sentence structure mainly solves The type of sentence can only be handled, correctly judge inversion sentence, imperative sentence, turnover sentence, parallel sentence difficult point, from prompt word such as " but Be ", the type of sentence is analyzed in " that is "；Language (language) function treatment grammer, to solve language (language) Level and logical relation；The emotion of language (language) variation processing speaker, arrests letter from voice tune and text context Breath, identifies the attitude of speaker；Context handles the environment of Central Position and speaker of the sentence in entire article, solves coherent Property problem.

The platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically divides in computer identification process For text signal and the big technical treatment module of audio signal two.

Text signal is that keyword recognition and matching comparison are carried out in the case where speech processes are converted into the technical support of text.It is first First, postgraduate is by related " international conference " content typing, and having special-purpose software, (for example incision Iflytek voice conversion text is soft Part) generate word or file；Secondly, generating document carries out keyword spotting；Again, it identifies effective keyword, is confirmed as grinding Study carefully keyword used in " international conference " of raw specific profession；Spoken automatic assessment is finally carried out, is confirmed.

Audio signal is that keyword recognition and matching comparison are carried out under voice process technology support.Firstly, research Related " international conference " content typing is generated audio file by life；Secondly, the audio file to generation is analyzed, generates and " close The map of keyword "；Again, " keyword " map and " keyword audio map " in the special corpus of " international conference " Oral English Practice Matching comparison is carried out, identifies effective keyword, is confirmed as keyword used in " international conference " of the specific profession of postgraduate；By It is more complicated in " keyword audio map ", for ensure " keyword audio map " matching comparison reliability, spectrum recognition, It matches in comparison process, the parallel algorithm of identification, matching comparison can be set, for example when LPC spectrum matching comparison identification, can adopt Take two kinds of parallel modes of LPC Power estimation and LPC cepstrum.LPC Power estimation is close to the peak of spectrum in the biggish region of signal energy At value, LPC spectrum and signal spectrum very close to.Then operand is small for LPC cepstrum.It is matched and is compared by " double set ", it is ensured that identification Reliability；Spoken automatic assessment is finally carried out, is confirmed.

Claims

1. a kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice, it is characterised in that: including walking as follows It is rapid:

(1) the special corpus of ESP is established, the special corpus of ESP includes international conference English Grammar vocabulary；

(4) characteristic information analysis is handled, and the characteristic information analysis processing includes the extraction, retrieval and spoken assessment of keyword.

2. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1, Be characterized in that: corpus described in step (1) includes three kinds, respectively ESP spoken corpus, basic subject corpus and profession Subject corpus.

3. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1, Be characterized in that: step (2) establish subscriber identity information library speech recognition include voice conversion text and direct speech processes into Row user extracts characteristic.

4. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1, Be characterized in that: step (2) includes the following steps:

(21) pretreatment and feature extraction

Firstly, carrying out mute detection based on energy and zero-crossing rate method, and noise is removed with spectrum-subtraction, voice signal is carried out pre- It aggravates, framing, and carries out linear prediction analysis, cepstrum coefficient is then found out from obtained LPC coefficient as Speaker Identification Feature vector；

(22) training

The characteristic vector sequence X that step (21) is extracted by delay after enter time-delay neural network, TDNN learning characteristic to The structure of amount extracts the temporal information of characteristic vector sequence；Then learning outcome is supplied in the form of residual error feature vector Gauss hybrid models are carried out GMM model training by greatest hope method, and are updated using the inversion method backward with inertia The weight coefficient of TDNN；

(23) it identifies

The output sequence O of characteristic vector sequence X and time-delay neural network is subtracted each other, and obtained residual sequence R is supplied to GMM Model, the sequence R=R of T residual error vector₁,R₂,...,R_T, GMM probability expression is as follows:

It is indicated in log-domain are as follows:

Speech recognition is based on Bayes' theorem, and in the model of N number of user, person is mesh if the maximum model of likelihood probability is corresponding Speaker is marked, calculation expression is as follows:

5. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 4, Be characterized in that: specific step is as follows for step (22) training:

(a) GMM model and TDNN structure are determined

The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, calculation expression It is as follows:

X in above formula_tFor D dimensional feature vector, D=13 here；b_i(x_t) it is member's density function, it be mean value vector is u_i, association side Poor matrix is ∑_iGaussian function；

λ={ (p_i,u_i,∑_i), i=1,2 ..., M }

According to the time-delay neural network without feedback, feature vector x (n) after the delay of linear delay block, as when sprawl Input through network, time-delay neural network carry out nonlinear transformation to input, then linear weighted function, obtain output vector, then with Feature vector x (n) compares, and criterion is lowest mean square criterion；The neuron number of the hidden layer of time-delay neural network and input The ratio of the number of the neuron of layer is 3:2, and nonlinear activation S function isY is the input being weighted after summation； In training, inertia coeffeicent γ=0.8 of neural network；

(b) condition of convergence and maximum number of iterations are set

The condition of convergence is the Euclidean distance of the adjacent coefficient of GMM twice and TDNN weight coefficient less than 0.0001, and maximum number of iterations is not Greater than 100；

(c) TDNN and GMM model parameter of primary iteration are determined at random

The initial coefficients of TDNN are set as the pseudo random number generated by computer, and the initial mixing coefficient of GMM is 1/M, M GMM Mixing item number, GMM initial mean value and variance generate M polymeric type by LBG method by the residual vector of TDNN, calculate separately The mean value and variance of this M polymeric type obtain；

(d) feature vector x (n) is inputted TDNN network, it will be special by the output of feature vector x (n) and TDNN before TDNN Sign vector o (n) is subtracted each other, and all residual vectors are obtained；

(e) using the parameter of greatest hope method amendment GMM model

(f) using weight coefficient, mean vector and the variance of each Gaussian Profile in revised GMM model, residual error is brought into, is obtained To a likelihood probability, TDNN parameter is corrected by the inversion method backward with inertia, makeover process is as follows:

TDNN parameter maximizes to obtain by the function in following formula:

Wherein, For in the m times iteration, connection inputs x_iWith output y_j Weight coefficient, k be neural network sequence number, α is iteration step length, F (x)=- lnp ((x_t-o_t) | λ), γ is inertia coeffeicent；

(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if it is, Otherwise deconditioning skips to step (d).

6. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1, Be characterized in that: specific step is as follows for step (3):

(31) voice input: the voice content of input include international conference opening speech, introduce keynote adress people, paper is explained and publicised, Put question to the carrier language with answer, closing speech；

(32) feature and the abnormal spoken mistake LSTM disaggregated model of machine learning method training, spoken language based on broad sense fluency are commented Divide regression analysis model and spoken diagnostic rule model；

(34) to word speed in voice data, coherent, content understanding, advanced basketball skills and reconstruct mark feature quantify, and extract voice number Fluency feature in；

(35) it is based on Gaussian process regression fit analysis method, the detection and scoring, diagnosis to abnormal spoken fluency mistake.

7. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1, Be characterized in that: step (4) includes the reasoning algorithm based on gauss hybrid models, obtains best adjacent channel.