CN109271482A - A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice - Google Patents
A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice Download PDFInfo
- Publication number
- CN109271482A CN109271482A CN201811030689.7A CN201811030689A CN109271482A CN 109271482 A CN109271482 A CN 109271482A CN 201811030689 A CN201811030689 A CN 201811030689A CN 109271482 A CN109271482 A CN 109271482A
- Authority
- CN
- China
- Prior art keywords
- voice
- vector
- tdnn
- english
- gmm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The invention discloses a kind of implementation methods of automatic Evaluation Platform of postgraduates'english oral teaching voice, the platform is suitable for postgraduate ESP oral English teaching, and the voice-over-net identification and oral expression that keyword is carried out in online international conference Oral English Teaching system platform are assessed automatically.It the described method comprises the following steps: (1) establishing the special corpus of ESP;(2) user speech identification feature information bank is established;(3) algorithm model that building international conference oral English teaching voice is assessed and realized automatically;(4) characteristic information analysis processing and the excavation and approach realizing " keyword " retrieval and identify.The present invention changes teaching pattern, is assessed in conjunction with artificial intelligence, to adapt to be compatible with international demand, and covers the assessment element of three grammer of English, syntax and context aspects, teaching and evaluation capacity with practical significance and raising Oral English Practice.
Description
Technical field
The invention belongs to online Web-based instruction exchange, oral English teaching and its voice evaluation systems, are related to a kind of postgraduate's English
The implementation method of the automatic Evaluation Platform of language oral English teaching voice.
Background technique
ESP (English for Specific Purposes) i.e. English for specific purpose, be and it is a certain it is particular professional, learn
What section or occupation were closely related.ESP itself is not a kind of special linguistic form, it is a kind of language based on learner's demand
It says teaching method, there is specific instructional objective.Demand of the ESP based on student and the course designed, it relates generally to demand point
Several main aspects such as analysis, Course Exercise, teaching methodology, compilation of teaching materials, Curriculum Evaluation and teacher's development.Based on teaching demand, layer
The difference of secondary, purpose and epoch calling, postgraduate students English teaching are set as ESP mostly.ESP based on functionalism Viewpoint of Language is
The restriction sample-- " international conference " for solving postgraduate's oral English teaching provides theoretical basis, simplifies and facilitates in new technology
Under the conditions of Network Recognition voice scheme.By the platform construction of " keyword " speech recognition, reach the religion of Graduate English spoken language
Tending to for learning and assess is perfect.
Voice-over-net knows method for distinguishing using DSP speech recognition equipment in existing English oral language machine examination system, mainly fits
For outdoor different occasion, unspecified person speech recognitions.This method is concentrated mainly on to traditional noise reduction audio technology progress
It improves, to propose a kind of voice-over-net identifying schemes of unspecified person in English net test system.This method is only limitted to country
Standard test, network settings Platform Requirements are excessively high, are not appropriate for the network platform of each university in the whole nation, generalization is weaker, does not have
There is commonly used practical significance and popular popularizes.
Summary of the invention
Goal of the invention: for the deficiency of above-mentioned existing method, the present invention provides a kind of postgraduates'english oral teaching voice
The implementation method of automatic Evaluation Platform, is suitble to the online Oral English Teaching system of each university's audio-visual education programme present situation and English teaching mode
The system of platform improves the quality of instruction and intelligent evaluation ability of Oral English Practice.
Technical solution: a kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice, including walk as follows
It is rapid:
(1) the special corpus of ESP, the special corpus international conference English Grammar vocabulary of ESP are established;
(2) subscriber identity information library is established, the subscriber identity information library includes voice recognition information;
(3) building international conference oral English teaching voice assesses identifying system model automatically;
(4) characteristic information analysis is handled, and the characteristic information analysis processing includes that extraction, retrieval and the spoken language of keyword are commented
Estimate.
Further, corpus described in step (1) includes three kinds, respectively ESP spoken corpus, basic subject corpus
Library and special disciplines corpus;
Further, it includes voice conversion text and direct language that step (2), which establishes the speech recognition in subscriber identity information library,
Sound processing carries out user and extracts characteristic.
Further, step (2) includes the following steps:
(21) pretreatment and feature extraction
Firstly, carrying out mute detection based on energy and zero-crossing rate method, and remove noise with spectrum-subtraction, to voice signal into
Row preemphasis, framing, and linear prediction analysis is carried out, cepstrum coefficient is then found out from obtained LPC coefficient as speaker
The feature vector of identification;
(22) training
The characteristic vector sequence X that step (21) is extracted enters time-delay neural network after delay, and TDNN study is special
The structure for levying vector, extracts the temporal information of characteristic vector sequence;Then learning outcome is mentioned in the form of residual error feature vector
Gauss hybrid models are supplied, GMM model training are carried out by greatest hope method, and utilize the inversion method backward with inertia
Update the weight coefficient of TDNN;
(23) it identifies
The output sequence O of characteristic vector sequence X and time-delay neural network is subtracted each other, and obtained residual sequence R is provided
To GMM model, the sequence R=R of T residual error vector1,R2,...,RT, GMM probability expression is as follows:
It is indicated in log-domain are as follows:
Speech recognition is based on Bayes' theorem, in the model of N number of user, person if the maximum model of likelihood probability is corresponding
For target speaker, calculation expression is as follows:
Further, specific step is as follows for step (22) training:
(a) GMM model and TDNN structure are determined
The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, computational chart
It is as follows up to formula:
X in above formulatFor D dimensional feature vector, D=13 here;bi(xt) it is member's density function, it be mean value vector is ui,
Covariance matrix is ∑iGaussian function;
piIt is that mixed weight-value mixed weight-value meets condition:Complete GMM model parameter is as follows:
λ={ (pi,ui,∑i), i=1,2 ..., M }
According to the time-delay neural network without feedback, feature vector x (n) after the delay of linear delay block, as when
The input of time-delay neural network, time-delay neural network carry out nonlinear transformation to input, and then linear weighted function, obtains output vector,
Again compared with feature vector x (n), criterion is lowest mean square criterion;The neuron number of the hidden layer of time-delay neural network with
The ratio of the number of the neuron of input layer is 3:2, and nonlinear activation S function isY is after being weighted summation
Input;In training, inertia coeffeicent γ=0.8 of neural network;
(b) condition of convergence and maximum number of iterations are set
The condition of convergence is the Euclidean distance of the adjacent coefficient of GMM twice and TDNN weight coefficient less than 0.0001, greatest iteration time
Number is not more than 100;
(c) TDNN and GMM model parameter of primary iteration are determined at random
The initial coefficients of TDNN are set as the pseudo random number generated by computer, and the initial mixing coefficient of GMM is 1/M, and M is
The mixing item number of GMM, GMM initial mean value and variance generate M polymeric type by LBG method by the residual vector of TDNN, respectively
The mean value and variance for calculating this M polymeric type obtain;
(d) feature vector x (n) is inputted TDNN network, the defeated of feature vector x (n) and the TDNN before TDNN will be passed through
Feature vector o (n) subtracts each other out, obtains all residual vectors;
(e) using the parameter of greatest hope method amendment GMM model
If residual vector is rt, classification posterior probability is calculated first, and calculation expression is as follows:
Then mixed weight-value is updatedMean value vectorAnd covariance matrixCalculation expression is as follows:
(f) using weight coefficient, mean vector and the variance of each Gaussian Profile in revised GMM model, residual error band
Enter, obtain a likelihood probability, TDNN parameter is corrected by the inversion method backward with inertia, makeover process is as follows:
TDNN parameter maximizes to obtain by the function in following formula:
Wherein otFor neural network output, xtFor the characteristic vector of input;
Above formula is taken and is taken after logarithm again negative, is obtained:
Ask G (X) its iterative formula as follows using the inversion method backward with inertia:
Wherein, For in the m times iteration, connection inputs xiWith it is defeated
Y outjWeight coefficient, k be neural network sequence number, α is iteration step length, F (x)=- lnp ((xt-ot) | λ), γ is inertial system
Number;
(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if
It is, then deconditioning otherwise, to skip to step (d).
Further, specific step is as follows for step (3):
(31) voice inputs: the voice content of input includes international conference opening speech, introduces keynote adress people, paper a surname
It says, put question to the carrier language given a lecture with answer, closing;
(32) feature and the abnormal spoken mistake LSTM disaggregated model of machine learning method training, mouth based on broad sense fluency
Language scoring regression analysis model and spoken diagnostic rule model;
(33) according to the gender of the script of voice data difference topic and enunciator, corresponding speech recognition system is configured;
(34) to word speed in voice data, coherent, content understanding, advanced basketball skills and reconstruct mark feature quantify, and extract language
Fluency feature in sound data;
(35) it is based on Gaussian process regression fit analysis method, detection and scoring to abnormal spoken fluency mistake are examined
It is disconnected.
Further, step (4) includes the reasoning algorithm based on gauss hybrid models, obtains best adjacent channel.
The utility model has the advantages that the present invention is compared with prior art, significant effect is: one aspect of the present invention combination English teaching
The characteristics of and rule, be incision theme with " international conference ", cover the assessment of three grammer of English, syntax and context aspects
Intension carries out the generalization of teaching pattern change in national graduate education to have;On the other hand there is practicability, root
System evaluation is improved into the expression of oral English teaching and international conference closer to real standard according to the phonetic feature of user and speaker
Ability.
Detailed description of the invention
Fig. 1 is processing flow schematic diagram of the invention;
Fig. 2 is the special Oral English Practice building of corpus flow diagram of international conference of the present invention;
Fig. 3 is ESP international conference spoken language recognition methods and system structure diagram;
Fig. 4 is characteristic information analysis processing and keyword retrieval processing flow schematic diagram.
Specific embodiment
In order to which technical solution disclosed by the invention is described in detail, with reference to the accompanying drawings of the specification and its specific embodiment is done
It is further elucidated above.
A kind of implementation method for automatic Evaluation Platform of postgraduates'english oral teaching voice that the present invention supplies, this method include
Following steps:
(1) ESP " international conference " special corpus is established;
(2) information bank of speech recognition " particular person " is established, the meaning of " particular person " is the owner using the system
Member, including user and speaker;
(3) platform framework that building " international conference " oral English teaching voice is assessed and realized automatically;
(4) excavation and approach of " keyword " retrieval and identification are realized.
Wherein, the special corpus of international conference described in step (1) is divided into three levels, the special English mouth of respectively ESP
Language corpus, postgraduate's subject corpus and postgraduate's profession corpus.The corpus further includes mentioning for user's phonetic feature
It takes, voice is read aloud by the article that typing covers 48 phonetic symbols of English when user is in registration, extracts each user's
Speech characteristic parameter.
The foundation of speech recognition " particular person " information bank described in step (2) is based on automatic speaker and recognizes (Automatic
Speaker Identification, referred to as ASI) technology, comprising the following steps:
(2-1) pretreatment and feature extraction;
Mute detection is carried out based on the method for energy and zero-crossing rate firstly, having used, and removes noise with spectrum-subtraction, and right
Voice signal carries out preemphasis, framing, and carries out linear prediction (LPC) analysis, then finds out cepstrum from obtained LPC coefficient
Feature vector of the coefficient as Speaker Identification;
(2-2) training;
When training, the feature vector extracted is entered into time-delay neural network (TDNN) after delay, TDNN study is special
The structure for levying vector, extracts the temporal information of characteristic vector sequence;Then learning outcome is mentioned in the form of residual error feature vector
It supplies gauss hybrid models (GMM), GMM model training is carried out using greatest hope method, and utilize anti-backward with inertia
Drill the weight coefficient that method updates TDNN;Specific training process is as follows:
(a) GMM model and TDNN structure are determined:
The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, Ke Yiyong
Following form indicates:
X in above formulatFor D dimensional feature vector, D=13 here;bi(xt) it is member's density function, it be mean value vector is ui,
Covariance matrix is ∑iGaussian function;
piIt is that mixed weight-value mixed weight-value meets condition:Complete GMM model parameter is as follows:
λ={ (pi,ui,∑i), i=1,2 ..., M }
Herein, that utilize is the TDNN without feedback;Feature vector x (n) after the delay of linear delay block, as
The input of TDNN, TDNN carry out nonlinear transformation to input, then linear weighted function, obtain output vector, then with feature vector ratio
Compared with usually used criterion is lowest mean square criterion (MMSE);The neuron number of the hidden layer of TDNN and the nerve of input layer
The ratio of the number of member is 3:2, and nonlinear activation S function isY is the input being weighted after summation;In training
When, inertia coeffeicent γ=0.8 of neural network;
(b) condition of convergence and maximum number of iterations are set;Specifically, the condition of convergence be the adjacent coefficient of GMM twice with
For the Euclidean distance of TDNN weight coefficient less than 0.0001, maximum number of iterations is usually more than 100;
(c) TDNN and GMM model parameter of primary iteration are determined at random;The initial coefficients of TDNN are set as being produced by computer
Raw pseudo random number, the initial mixing coefficient of GMM can be taken as 1/M, and M is the mixing item number of GMM, GMM initial mean value and variance
M polymeric type is generated by LBG (Linde, Buzo, Gray) method by the residual vector of TDNN, calculates separately this M polymeric type
Mean value and variance obtain;
(d) feature vector x (n) is inputted TDNN network, the defeated of feature vector x (n) and the TDNN before TDNN will be passed through
Feature vector o (n) subtracts each other out, obtains all residual vectors;
(e) using the parameter of greatest hope method amendment GMM model;
If residual vector is rt, classification posterior probability is calculated first:
Then mixed weight-value is updatedMean value vectorAnd covariance matrix
(f) using the weight coefficient of each Gaussian Profile of revised GMM model, mean vector and variance, residual error is brought into,
A likelihood probability is obtained, corrects TDNN parameter using the inversion method backward with inertia;
TDNN parameter is by making the function in following formula maximize to obtain:
Wherein otFor neural network output, xtFor the characteristic vector of input;
Above formula is taken and is taken after logarithm again negative, is obtained:
Ask G (X) its iterative formula as follows using the inversion method backward with inertia:
Wherein, For in the m times iteration, connection inputs xiWith it is defeated
Y outjWeight coefficient, k be neural network sequence number, α is iteration step length, F (x)=- lnp ((xt-ot) | λ), γ is inertial system
Number;
(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if
It is, then deconditioning otherwise, to skip to step (d);
(2-3) identification
When identification, characteristic vector sequence X inputs TDNN after delay;Then the output sequence O of X and TDNN is subtracted each other into institute
Obtained residual sequence R is supplied to GMM model, for the sequence R=R of T residual error vector1,R2,...,RT, its GMM probability
It can be written as:
It is indicated in log-domain are as follows:
Bayes' theorem is used when identification, in the model of N number of unknown words person, if the maximum model of likelihood probability is corresponding
Person is target speaker:
The feature of the technology is test statement or keyword described in default speaker (particular person), therefrom extracts and speaks
The related information of people's feature, then compared with the reference model of storage, make correct judgement.
Further, the platform that building " international conference " oral English teaching voice is assessed and realized automatically described in step (3)
Frame, specific implementation flow are as follows:
(3-1) utilizes voice-input device, and acquisition ESP " international conference " (opens and gives a lecture) by Opening Speech,
Introduction to Key-note Speakers (introducing keynote adress people), Essay Speech (paper is explained and publicised),
Questions and Answers (puts question to and answers) the carrier language of and Closing Speech (closing is given a lecture);
(3-2) is classified using the abnormal spoken mistake LSTM of method training of feature and machine learning based on broad sense fluency
Model, spoken scoring regression analysis model and spoken diagnostic rule model;
(3-3) configures corresponding speech recognition system according to the script of voice data difference topic and the gender of enunciator;
(3-4) is quantified using coherent, content understanding, advanced basketball skills to word speed in voice data and reconstruct mark feature,
Computer is automatically from the comprehensive fluency feature extracted in voice data of expert's evaluation perspective;
(3-5) uses the method based on Gaussian process regression fit analysis, realizes the inspection to abnormal spoken fluency mistake
It surveys and scores, diagnosis.
The platform framework construction process that " international conference " oral English teaching voice is assessed and realized automatically is as shown in Figure 1, have
Two features: first is that the speech recognition of " particular person " is established on matching acoustic model basis, it is related to identifying by matching
It is associated with voice, reaches automatic assessment;Second is that first voice conversion text, successively self-built composition are established in the speech recognition of " particular person "
System is corrected accurately to be assessed.It is retrieved for " keyword " and needs certain realization means with the excavation identified, it is characterized in that: from
(linguistics) semanteme, grammer and the associated theory of a language piece are set about, and excavate theory in conjunction with big data, are found out best adjacent channel and are calculated
Method.It is arranged to find out with the family of languages if similar context, and determines its regularity of distribution, the special corpus of international conference Oral English Practice
" keyword " retrieval method is selected in setting, is gone to carry out super sentence structural analysis while being carried out language (language) function, making a variation and its make
The analysis given with situation.
After the present invention improves the ESP theory classified according to Hutchinson (Hutchinson) and Waters (Waters)
ESP classification, to adapt to the actual conditions of current China postgraduate discipline classification, convenient for establish be directed to subject effective corpus
Classification.As shown in Figure 2.The special Oral English Practice corpus of ESP international conference should be set based on the classification of current China university subject
It sets.In this way convenient for ESP teaching more approach profession, teaching efficiency tends to be ideal.
Special Oral English Practice corpus classification reason, work, the big subject module of society three of ESP international conference, such as Fig. 3 institute
Show.Substantially cover the subject of Chinese Universities '.The feature of this classification is, to (Hutchinson) and Waters (Waters) ESP
Classification adjusted and supplemented, have modified English for science and technology, eliminate Business English subject, hyperphoric is natural sciences and industry science
Section.Possibility and actuality are provided specifically to segment major special disciplines under three macrotaxonomy module of natural sciences, industry science and social sciences.
The special Oral English Practice corpus of ESP international conference divides into natural sciences, industry science and the big subject module of social sciences three.Section below three big modules
Subject refinement module time corpus is established according to respective specific refinement special disciplines.Natural sciences divide into such as physics subject corpus, number
Learn the corpus branches such as subject corpus, chemistry subject corpus time corpus;Industry science divides into such as facing Information Science corpus, energy subject language
The corpus branches such as material, mechanical subject corpus time corpus;Social sciences divides into such as philosophy subject corpus, art discipline corpus, law science
Ke Deng corpus branch time corpus.The foundation of the special Oral English Practice corpus of ESP international conference, convenient for using computer to student
Voice document carry out matching comparison, carry out spoken assessment.The special Oral English Practice for being established as ESP international conference of corpus
Identification and the assessment method of having established realization means.
The present invention is to the recognition methods of ESP international conference spoken language and the general introduction of system structure.First to described in postgraduate
Test statement or keyword are pre-processed, and therefrom extract related with speaker's feature information, then with corresponding English
Language spoken language materials are compared, and make correct judgement.
The method and system structure of ESP international conference spoken language identification, initially sets up and applies a postgraduate ESP world meeting
The spoken word recognition system of view.This system is by establishing each subject spoken language materials specifically segmented, from corresponding profession research
The personal characteristics that this disciplines postgraduate is extracted in one section of raw spoken audio, passes through the analysis to these personal characteristics
And identification, to achieve the purpose that this disciplines postgraduate is recognized or confirmed.Whole process is by pretreatment, spy
The several majors composition such as sign extraction, pattern match and judgement.
The completion of the method and system structure of ESP international conference spoken language identification is implemented, and two stages can be divided into, that is, trains
(registration) stage and cognitive phase.In the training stage, each of system postgraduate reads aloud the text for covering 48 phonetic symbols of English
Chapter, forms the training corpus of several corresponding each specific disciplines, and system is learnt according to these training corpus by training
Establish the template or model parameter reference set of each user.
In cognitive phase, characteristic parameter extracted to the corpus of the specific Specialty Graduate of each subject, and in training
The reference parameter collection or model template that each single postgraduate obtains in the process are compared, and quasi- according to certain similitude
It is then matched, and is determined, to achieve the purpose that assessment.
Fig. 4 expression, the platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically, and illustrate operation
Realization means:
" keyword " retrieval and identification.ESP " international conference " by retrieval Opening Speech (opening speech),
Introduction to Key-note Speakers (introducing keynote adress people), Essay Speech (paper is explained and publicised),
The carrier language keyword of Questions and Answers (put question to and answer) and Closing Speech (closing is given a lecture),
It establishes the special corpus of " international conference " Oral English Practice (forgiving letter signal and audio signal).Language in this special corpus
Material association natural sciences, industry science and the big subject module of social sciences three.
The platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically covers Oral English Teaching and survey
3 big language points of examination: stating, talk with, answer a question, and solves the association of grammer, syntax and context, coordination structure and function
It is unified.In terms of grammer, the creation analysis assessment rule in terms of dictionary, syntax rule two;In terms of syntax, from knowledge dynamic
Aspect creation analysis assessment rule, preferably selects word;In terms of context, the creation analysis assessment rule in terms of context logic.
The assessment and processing on platform that postgraduate " international conference " oral English teaching is assessed and realized automatically, mainly from super sentence
The aspects such as structure, language (language) function, language (language) variation, context set about being handled.Super sentence structure mainly solves
The type of sentence can only be handled, correctly judge inversion sentence, imperative sentence, turnover sentence, parallel sentence difficult point, from prompt word such as " but
Be ", the type of sentence is analyzed in " that is ";Language (language) function treatment grammer, to solve language (language)
Level and logical relation;The emotion of language (language) variation processing speaker, arrests letter from voice tune and text context
Breath, identifies the attitude of speaker;Context handles the environment of Central Position and speaker of the sentence in entire article, solves coherent
Property problem.
The platform framework that postgraduate " international conference " oral English teaching is assessed and realized automatically divides in computer identification process
For text signal and the big technical treatment module of audio signal two.
Text signal is that keyword recognition and matching comparison are carried out in the case where speech processes are converted into the technical support of text.It is first
First, postgraduate is by related " international conference " content typing, and having special-purpose software, (for example incision Iflytek voice conversion text is soft
Part) generate word or file;Secondly, generating document carries out keyword spotting;Again, it identifies effective keyword, is confirmed as grinding
Study carefully keyword used in " international conference " of raw specific profession;Spoken automatic assessment is finally carried out, is confirmed.
Audio signal is that keyword recognition and matching comparison are carried out under voice process technology support.Firstly, research
Related " international conference " content typing is generated audio file by life;Secondly, the audio file to generation is analyzed, generates and " close
The map of keyword ";Again, " keyword " map and " keyword audio map " in the special corpus of " international conference " Oral English Practice
Matching comparison is carried out, identifies effective keyword, is confirmed as keyword used in " international conference " of the specific profession of postgraduate;By
It is more complicated in " keyword audio map ", for ensure " keyword audio map " matching comparison reliability, spectrum recognition,
It matches in comparison process, the parallel algorithm of identification, matching comparison can be set, for example when LPC spectrum matching comparison identification, can adopt
Take two kinds of parallel modes of LPC Power estimation and LPC cepstrum.LPC Power estimation is close to the peak of spectrum in the biggish region of signal energy
At value, LPC spectrum and signal spectrum very close to.Then operand is small for LPC cepstrum.It is matched and is compared by " double set ", it is ensured that identification
Reliability;Spoken automatic assessment is finally carried out, is confirmed.
Claims (7)
1. a kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice, it is characterised in that: including walking as follows
It is rapid:
(1) the special corpus of ESP is established, the special corpus of ESP includes international conference English Grammar vocabulary;
(2) subscriber identity information library is established, the subscriber identity information library includes voice recognition information;
(3) building international conference oral English teaching voice assesses identifying system model automatically;
(4) characteristic information analysis is handled, and the characteristic information analysis processing includes the extraction, retrieval and spoken assessment of keyword.
2. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1,
Be characterized in that: corpus described in step (1) includes three kinds, respectively ESP spoken corpus, basic subject corpus and profession
Subject corpus.
3. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1,
Be characterized in that: step (2) establish subscriber identity information library speech recognition include voice conversion text and direct speech processes into
Row user extracts characteristic.
4. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1,
Be characterized in that: step (2) includes the following steps:
(21) pretreatment and feature extraction
Firstly, carrying out mute detection based on energy and zero-crossing rate method, and noise is removed with spectrum-subtraction, voice signal is carried out pre-
It aggravates, framing, and carries out linear prediction analysis, cepstrum coefficient is then found out from obtained LPC coefficient as Speaker Identification
Feature vector;
(22) training
The characteristic vector sequence X that step (21) is extracted by delay after enter time-delay neural network, TDNN learning characteristic to
The structure of amount extracts the temporal information of characteristic vector sequence;Then learning outcome is supplied in the form of residual error feature vector
Gauss hybrid models are carried out GMM model training by greatest hope method, and are updated using the inversion method backward with inertia
The weight coefficient of TDNN;
(23) it identifies
The output sequence O of characteristic vector sequence X and time-delay neural network is subtracted each other, and obtained residual sequence R is supplied to GMM
Model, the sequence R=R of T residual error vector1,R2,...,RT, GMM probability expression is as follows:
It is indicated in log-domain are as follows:
Speech recognition is based on Bayes' theorem, and in the model of N number of user, person is mesh if the maximum model of likelihood probability is corresponding
Speaker is marked, calculation expression is as follows:
5. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 4,
Be characterized in that: specific step is as follows for step (22) training:
(a) GMM model and TDNN structure are determined
The probability density function of one M rank GMM is obtained by M Gaussian probability-density function weighted sum, calculation expression
It is as follows:
X in above formulatFor D dimensional feature vector, D=13 here;bi(xt) it is member's density function, it be mean value vector is ui, association side
Poor matrix is ∑iGaussian function;
piIt is that mixed weight-value mixed weight-value meets condition:Complete GMM model parameter is as follows:
λ={ (pi,ui,∑i), i=1,2 ..., M }
According to the time-delay neural network without feedback, feature vector x (n) after the delay of linear delay block, as when sprawl
Input through network, time-delay neural network carry out nonlinear transformation to input, then linear weighted function, obtain output vector, then with
Feature vector x (n) compares, and criterion is lowest mean square criterion;The neuron number of the hidden layer of time-delay neural network and input
The ratio of the number of the neuron of layer is 3:2, and nonlinear activation S function isY is the input being weighted after summation;
In training, inertia coeffeicent γ=0.8 of neural network;
(b) condition of convergence and maximum number of iterations are set
The condition of convergence is the Euclidean distance of the adjacent coefficient of GMM twice and TDNN weight coefficient less than 0.0001, and maximum number of iterations is not
Greater than 100;
(c) TDNN and GMM model parameter of primary iteration are determined at random
The initial coefficients of TDNN are set as the pseudo random number generated by computer, and the initial mixing coefficient of GMM is 1/M, M GMM
Mixing item number, GMM initial mean value and variance generate M polymeric type by LBG method by the residual vector of TDNN, calculate separately
The mean value and variance of this M polymeric type obtain;
(d) feature vector x (n) is inputted TDNN network, it will be special by the output of feature vector x (n) and TDNN before TDNN
Sign vector o (n) is subtracted each other, and all residual vectors are obtained;
(e) using the parameter of greatest hope method amendment GMM model
If residual vector is rt, classification posterior probability is calculated first, and calculation expression is as follows:
Then mixed weight-value is updatedMean value vectorAnd covariance matrixCalculation expression is as follows:
(f) using weight coefficient, mean vector and the variance of each Gaussian Profile in revised GMM model, residual error is brought into, is obtained
To a likelihood probability, TDNN parameter is corrected by the inversion method backward with inertia, makeover process is as follows:
TDNN parameter maximizes to obtain by the function in following formula:
Wherein otFor neural network output, xtFor the characteristic vector of input;
Above formula is taken and is taken after logarithm again negative, is obtained:
Ask G (X) its iterative formula as follows using the inversion method backward with inertia:
Wherein, For in the m times iteration, connection inputs xiWith output yj
Weight coefficient, k be neural network sequence number, α is iteration step length, F (x)=- lnp ((xt-ot) | λ), γ is inertia coeffeicent;
(g) judge whether to meet the condition of convergence set in step (b) or whether reach maximum number of iterations, if it is,
Otherwise deconditioning skips to step (d).
6. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1,
Be characterized in that: specific step is as follows for step (3):
(31) voice input: the voice content of input include international conference opening speech, introduce keynote adress people, paper is explained and publicised,
Put question to the carrier language with answer, closing speech;
(32) feature and the abnormal spoken mistake LSTM disaggregated model of machine learning method training, spoken language based on broad sense fluency are commented
Divide regression analysis model and spoken diagnostic rule model;
(33) according to the gender of the script of voice data difference topic and enunciator, corresponding speech recognition system is configured;
(34) to word speed in voice data, coherent, content understanding, advanced basketball skills and reconstruct mark feature quantify, and extract voice number
Fluency feature in;
(35) it is based on Gaussian process regression fit analysis method, the detection and scoring, diagnosis to abnormal spoken fluency mistake.
7. a kind of implementation method of automatic Evaluation Platform of postgraduates'english oral teaching voice according to claim 1,
Be characterized in that: step (4) includes the reasoning algorithm based on gauss hybrid models, obtains best adjacent channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811030689.7A CN109271482A (en) | 2018-09-05 | 2018-09-05 | A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811030689.7A CN109271482A (en) | 2018-09-05 | 2018-09-05 | A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109271482A true CN109271482A (en) | 2019-01-25 |
Family
ID=65187842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811030689.7A Pending CN109271482A (en) | 2018-09-05 | 2018-09-05 | A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271482A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114495615A (en) * | 2021-12-31 | 2022-05-13 | 江苏师范大学 | Evaluation system for teaching ability of teacher and schoolchild |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101241699A (en) * | 2008-03-14 | 2008-08-13 | 北京交通大学 | A speaker identification system for remote Chinese teaching |
CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
-
2018
- 2018-09-05 CN CN201811030689.7A patent/CN109271482A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101241699A (en) * | 2008-03-14 | 2008-08-13 | 北京交通大学 | A speaker identification system for remote Chinese teaching |
CN101740024A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for automatic evaluation based on generalized fluent spoken language fluency |
CN102034472A (en) * | 2009-09-28 | 2011-04-27 | 戴红霞 | Speaker recognition method based on Gaussian mixture model embedded with time delay neural network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114495615A (en) * | 2021-12-31 | 2022-05-13 | 江苏师范大学 | Evaluation system for teaching ability of teacher and schoolchild |
CN114495615B (en) * | 2021-12-31 | 2023-12-01 | 江苏师范大学 | Evaluation system with real-time feedback function for evaluating teaching ability of teachers and students |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | End-to-end neural network based automated speech scoring | |
CN105741832B (en) | Spoken language evaluation method and system based on deep learning | |
CN101201980B (en) | Remote Chinese language teaching system based on voice affection identification | |
CN101739867A (en) | Method for scoring interpretation quality by using computer | |
CN106803422A (en) | A kind of language model re-evaluation method based on memory network in short-term long | |
Cheng et al. | Automatic assessment of the speech of young English learners | |
CN110060657A (en) | Multi-to-multi voice conversion method based on SN | |
CN110136686A (en) | Multi-to-multi voice conversion method based on STARGAN Yu i vector | |
Shao et al. | AI-based Arabic Language and Speech Tutor | |
CN113887883A (en) | Course teaching evaluation implementation method based on voice recognition technology | |
CN109119064A (en) | A kind of implementation method suitable for overturning the Oral English Teaching system in classroom | |
CN109271482A (en) | A kind of implementation method of the automatic Evaluation Platform of postgraduates'english oral teaching voice | |
Wang | Research on open oral English scoring system based on neural network | |
CN112233655A (en) | Neural network training method for improving voice command word recognition performance | |
Chandel et al. | Sensei: Spoken language assessment for call center agents | |
Shi et al. | Construction of English Pronunciation Judgment and Detection Model Based on Deep Learning Neural Networks Data Stream Fusion | |
CN108629024A (en) | A kind of teaching Work attendance method based on voice recognition | |
Suzuki et al. | Automatic evaluation system of English prosody based on word importance factor | |
Lee et al. | Affective effects of speech-enabled robots for language learning | |
Greenberg | Deep Language Learning | |
Kang et al. | AI‐based language tutoring systems with end‐to‐end automatic speech recognition and proficiency evaluation | |
Li et al. | Improvement and Optimization Method of College English Teaching Level Based on Convolutional Neural Network Model in an Embedded Systems Context | |
Wang | English Speech Recognition and Pronunciation Quality Evaluation Model Based on Neural Network | |
Gerosa et al. | Investigating automatic assessment of reading comprehension in young children | |
Yang | Machine learning for English teaching: a novel evaluation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190125 |