CN109300339A - A kind of exercising method and system of Oral English Practice - Google Patents
A kind of exercising method and system of Oral English Practice Download PDFInfo
- Publication number
- CN109300339A CN109300339A CN201811376417.2A CN201811376417A CN109300339A CN 109300339 A CN109300339 A CN 109300339A CN 201811376417 A CN201811376417 A CN 201811376417A CN 109300339 A CN109300339 A CN 109300339A
- Authority
- CN
- China
- Prior art keywords
- audio
- testing
- standard
- standard audio
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
Abstract
The invention discloses a kind of exercising method of Oral English Practice and system, wherein the testing audio is translated to computer literal this document the following steps are included: receive spoken testing audio by method;Described computer literal this document is translated into standard audio;The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, the phoneme posterior probability of the testing audio Yu the standard audio is calculated according to hidden Markov model, it determines that the opposite percentage with the standard audio of the testing audio scores according to the phoneme posterior probability, and exports the standard audio and percentage scoring.The above method and system can be effectively according to the corresponding standard audios of the audio output of input, while can also provide reasonable scoring for the audio of input, can effectively improve the Oral English Practice ability of user, have very high practicability.
Description
Technical field
The present invention relates to language learning field, in particular to the exercising method and system of a kind of Oral English Practice.
Background technique
With economic globalization deeply and the promotion of Chinese overall national strength, China and exchanging for the world are just increasingly frequent,
Demand to international language knowledge is also being skyrocketed through.Meanwhile depending on information technology and making rapid progress, area of computer aided language
Speech study is increasingly mature, so that e-learning spoken language becomes possibility.But existing terminal teaching is still habitually continued to use
Intrinsic teaching pattern, stresses the study of word and grammer mostly, and spoken language exercise software few in number can only also be provided and only be limited
In simulation communication read aloud or with read function, cannot fundamentally improve the use ability of user's English.
Summary of the invention
In view of the above technical problems, the present invention, which provides one kind, effectively to correspond to standard pronunciation according to the audio output received
Frequently, while corresponding scoring is provided, and the exercising method and system of a kind of Oral English Practice of Oral English Practice ability can be effectively improved.
In order to solve the above technical problems, the technical solution used in the present invention is: providing a kind of practice side of Oral English Practice
Method, which comprises the following steps:
Spoken testing audio is received, the testing audio is translated into computer literal this document;
Described computer literal this document is translated into standard audio;
The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, according to hidden Markov model
The phoneme posterior probability for calculating the testing audio Yu the standard audio determines the test according to the phoneme posterior probability
The opposite percentage with the standard audio of audio scores, and exports the standard audio and percentage scoring.
The invention adopts the above technical scheme, the technical effect reached are as follows: the practice side of Oral English Practice provided by the invention
Method can export corresponding computer literal this document, and according to the output pair of computer literal this document effectively according to testing audio
The standard audio answered can effectively determine survey by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio
The phoneme posterior probability of audition frequency and standard audio, and determine testing audio relative to standard audio according to phoneme posterior probability
Scoring, while outputting standard audio.The exercising method of above-mentioned Oral English Practice can be effectively corresponding according to the audio output of input
Standard audio, while can also for input audio provide reasonable scoring, can effectively improve the Oral English Practice of user
Ability has very high practicability.
Preferably, in the above-mentioned technical solutions, described the testing audio is translated into computer literal this document specifically to wrap
Include following steps:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstrum point are carried out to the speech waveform signal
Acoustic feature value corresponding with the speech waveform signal is extracted in analysis, carries out model recognition training to the acoustic feature value, really
Fixed corresponding acoustic model and language model;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determination is given
Determine the probability that text issues corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs general
Rate;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text
This sequence.
Preferably, in the above-mentioned technical solutions, the Meier of testing audio and the standard audio is stated in the extraction respectively
Before frequency cepstral coefficient, it is described described computer literal this document is translated into standard audio after, it is further comprising the steps of:
Respectively to phonetic speech power corresponding with the standard audio and the testing audio spectrum it is intrinsic decline and pronounced
The oppressive high frequency section of system is supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
Preferably, in the above-mentioned technical solutions, described to extract the Meier frequency for stating testing audio and the standard audio respectively
Rate cepstrum coefficient, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
Additionally provide a kind of exercise system of Oral English Practice, comprising:
The testing audio is translated to computer literal herein for receiving spoken testing audio by audio conversion module
Part;
Text conversion module, for described computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and the standard audio respectively,
The phoneme posterior probability that the testing audio Yu the standard audio are calculated according to hidden Markov model, after the phoneme
It tests the opposite percentage with the standard audio of testing audio described in determine the probability to score, and exports the standard audio and institute
State percentage scoring.
Preferably, in the above-mentioned technical solutions, the testing audio is translated to computer literal by the audio conversion module
The concrete operations that this document executes are as follows:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstrum point are carried out to the speech waveform signal
Acoustic feature value corresponding with the speech waveform signal is extracted in analysis, carries out model recognition training to the acoustic feature value, really
Fixed corresponding acoustic model and language model;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determination is given
Determine the probability that text issues corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs general
Rate;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text
This sequence.
Preferably, in the above-mentioned technical solutions, the audio comparison module, be also used to respectively to the standard audio and
It the intrinsic decline of the corresponding phonetic speech power spectrum of the testing audio and is supplemented by the high frequency section that articulatory system constrains;
To the standard audio and testing audio progress sub-frame processing after supplement.
Preferably, in the above-mentioned technical solutions, the audio comparison module, standard audio and survey after being also used to supplement
The time domain waveform of every frame audio of audition frequency is converted to frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
The invention adopts the above technical scheme, the technical effect reached are as follows: the practice system of Oral English Practice provided by the invention
System can export corresponding computer literal this document, and according to the output pair of computer literal this document effectively according to testing audio
The standard audio answered can effectively determine survey by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio
The phoneme posterior probability of audition frequency and standard audio, and determine testing audio relative to standard audio according to phoneme posterior probability
Scoring, while outputting standard audio.The exercise system of above-mentioned Oral English Practice can be effectively corresponding according to the audio output of input
Standard audio, while can also for input audio provide reasonable scoring, can effectively improve the Oral English Practice of user
Ability has very high practicability.
A kind of storage medium is additionally provided, program instruction is stored thereon with, described program is instructed when being executed by processor,
Implementation method the method for claim.
Detailed description of the invention
The present invention will be further explained below with reference to the attached drawings:
Fig. 1 is the exercising method schematic flow chart of Oral English Practice provided by the invention;
Fig. 2 is the schematic flow chart of audio conversion text provided by the invention;
Fig. 3 is the schematic block diagram of the exercise system of Oral English Practice provided by the invention.
Specific embodiment
In order to effectively improve the oracy of user's English, the present invention provides a kind of practice sides of Oral English Practice
Method is detailed in figure.Fig. 1 is the schematic flow chart of the exercising method of Oral English Practice provided by the invention.Specifically includes the following steps:
Step S10: spoken testing audio is received, testing audio is translated into computer literal this document;
Step S20: computer literal this document is translated into standard audio;
Step S30: the mel-frequency cepstrum coefficient for stating testing audio and standard audio is extracted respectively, according to hidden Markov
Model calculates the phoneme posterior probability of testing audio and standard audio, determines that testing audio is opposite with mark according to phoneme posterior probability
The percentage of quasi- audio scores, and outputting standard audio and percentage scoring.
The above method can export corresponding computer literal this document, and herein according to computer literal according to testing audio
Part exports corresponding standard audio can be effective by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio
Determination testing audio and standard audio phoneme posterior probability, and determine testing audio relative to mark according to phoneme posterior probability
The scoring of quasi- audio, while outputting standard audio.Allow users to according to specific scoring, for oneself oracy into
Row improves, and the effective Oral English Practice ability for improving user has very high practicability.
On the basis of Fig. 1 corresponding embodiment, also improved.It is detailed in Fig. 2, Fig. 2 is audio conversion provided by the invention
The schematic flow chart of text.Specifically includes the following steps:
Step S11: being converted to speech waveform signal for testing audio, carries out frequency spectrum or cepstrum point to speech waveform signal
Acoustic feature value corresponding with speech waveform signal is extracted in analysis, is carried out model recognition training to acoustic feature value, is determined corresponding
Acoustic model and language model;
Step S12: contacting between acoustic feature value and sentence pronunciation modeling unit is created by acoustic model, and determination is given
Determine the probability that text issues corresponding voice;
Step S13: language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs
Probability;
The probability that step S14: issuing the probability of corresponding voice according to given text and current word occurs, exports optimal text
This sequence.
Above-mentioned technical proposal can export optimal text sequence, i.e. subsequent standards voice effectively according to testing audio
The foundation of generation.By to a series of operation such as testing audio, it is ensured that the accuracy and uniqueness of the text sequence of output,
Data are provided for subsequent scoring and the output of standard audio to support.
On the basis of Fig. 2 corresponding embodiment, in order to guarantee that system has good recognition effect, also improved.
Specifically before the mel-frequency cepstrum coefficient for extracting testing audio and standard audio respectively, computer literal this document is translated
It is further comprising the steps of after standard audio:
Constrain respectively to the intrinsic decline of phonetic speech power corresponding with standard audio and testing audio spectrum and by articulatory system
High frequency section supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
By the above-mentioned processing to testing audio and standard audio progress, it is effectively guaranteed subsequent to standard audio and survey
The extraction of audition frequency mel-frequency cepstrum coefficient, improves the efficiency of audio identification.
Preferably, in the above-mentioned technical solutions, the mel-frequency cepstrum system for stating testing audio and standard audio is extracted respectively
Number, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to component frequency feature, mel-frequency cepstrum coefficient is obtained.
By the conversion of frequency domain figure to standard audio and testing audio and the extraction of the component frequency feature of every frame audio,
It effectively ensures lifting for mel-frequency cepstrum coefficient, ensure that the accuracy and efficiency that mel-frequency cepstrum coefficient extracts.
On the basis of Fig. 1 corresponding embodiment of the method, the present invention also provides a kind of exercise systems of Oral English Practice, in detail
See that Fig. 3, Fig. 3 are the schematic block diagram of the exercise system of Oral English Practice provided by the invention.A kind of exercise system of Oral English Practice
Include:
Testing audio is translated to computer literal this document for receiving spoken testing audio by audio conversion module;
Text conversion module, for computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and standard audio respectively, according to
Hidden Markov model calculates the phoneme posterior probability of testing audio and standard audio, determines test tone according to phoneme posterior probability
The opposite percentage with standard audio of frequency scores, and outputting standard audio and percentage scoring.
When scoring standard audio, it is necessary to be built upon the specific point of articulation basis that user is able to use
On, meanwhile, the result of software feedback must be similar with the sense of hearing judging result of native speaker of English.
For audio comparison module based on speech recognition module (ASR), ASR system exports speech recognition period before
Text is converted to audio, and the test tone as master pattern and practitioner compares, and comments to provide to the test tone of practitioner
Point.
Pretreatment: firstly, being pre-processed respectively to test speaker and canonical reference pronunciation.Pretreatment includes to pronunciation
1) preemphasis: the intrinsic high frequency section for declining and being constrained by articulatory system of supplement phonetic speech power spectrum, to reduce noise
Influence to end-point detection later and characteristic parameter extraction module.2) framing adding window: long jiggly phonetic segmentation at 20-
50 milliseconds of short and small " frame ", to meet the condition of Fourier transformation.3) end-point detection: as far as possible by one section of pretreated voice
It is divided into independent word.Pretreated purpose is that guarantee system has good recognition effect.
It extracts the process of MFCC: voice to be measured being pre-processed first.Fast Fourier Transform (FFT) is passed through to wherein every frame
(FFr) voice is transformed into frequency domain figure from time domain waveform should by the acquirement of Meier filter group according to the auditory properties of human ear
The component frequency feature of frame voice, then by can be obtained by MFCC after discrete cosine transform (DCT).
HMM model: hidden Markov model (Hidden Markov Models, HMM), a kind of system as voice signal
Model is counted, is the major technique model of current speech processes every field.HMM includes five basic elements and three big basic calculations
Method, wherein decoding algorithm viterbi is also the basis of pronunciation scoring algorithm in Oral English Practice study.For giving the sequence of observations
And model λ=(A, B, π), Viterbi algorithm can not only find a status switch Q=q good enough1q2...qtTo explain
The sequence of observations can also obtain output probability corresponding to the path.
After above-mentioned processing, the phoneme posterior probability of available test speaker contrast standard reference model.If this
Shi Jinhang be grading parameters generating process, then need expert for this pronunciation carry out experience marking, it is general to obtain phoneme posteriority
Several corresponding relationships between rate and expertise scoring can train the auto-adaptive parameter x to be scored according to corresponding relationship
With y, and then determine score function for pronounce scoring.If what is carried out at this time is the operation of pronunciation scoring, system can be incited somebody to action
The phoneme posterior probability of test speaker substitutes into score function, finally obtains pronunciation scoring.
Scoring algorithm:
Scoring process can regard a kind of mode identification procedure based on HMM model as, after feature extraction, known to setting
The output observation sequence of voice to be scored is O (O1,O2,...,Ot), it usesIndicate canonical reference HMM model, wherein π
Indicate reset condition distribution, A is St-1To StState transformation probability matrix, B is HMM in i environment corresponding to status switch
Observation sequence output probability matrix, there are more recessive state sequence S=(s in the model1,s2,...,st), then voice
Assessment is operation when known to canonical reference HMM model π, obtains the probability of input voice observation sequence OProcess.
Cutting alignment, acquisition most probable corresponding recessiveness with observation sequence O are implemented to the phoneme in characteristic sequence using Viterbi algorithm
Status switch S. repeatedly trains HMM model, updates the parameter in the model, the HMM that output matches with observation sequence
The optimal probability of modelThe optimum probability is then posterior probability scoring.
For each frame OtPhoneme q is calculatediPosterior probability P (qi|Ot):
Wherein P (Ot|qi) it is given phoneme qiLower observation vector OtProbability distribution, P (qi) it is phoneme qiPrior probability,
Denominator is to obtain observed quantity O to the phoneme of all text locatingstProbability summation.Phoneme qiPosterior probability under i sections each
Logarithm is taken, is then added up, so that it may obtain phoneme qiPosterior probability score under i sections of voices.
And the posterior probability score of entire sentence are as follows:
Wherein N is the number of phoneme in sentence.
In view of word speed and an index of the spoken qualification of judge, so pronunciation rate should be included in judge mark
Standard finally can define the score of phoneme duration are as follows:
Wherein diIt is corresponding to phoneme qiI-th section of duration, f (di) it is normalized function, this allows for text and speaks
The independence of people normalizes voice duration using the measurement of voice rate (ROS), and voice rate is in a word or a speaker
In all pronunciations, the phoneme quantity of every unit duration.Usually take f (di)=ROSdi。
The above method can export corresponding computer literal this document, and herein according to computer literal according to testing audio
Part exports corresponding standard audio can be effective by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio
Determination testing audio and standard audio phoneme posterior probability, and determine testing audio relative to mark according to phoneme posterior probability
The scoring of quasi- audio, while outputting standard audio.Allow users to according to specific scoring, for oneself oracy into
Row improves, and the effective Oral English Practice ability for improving user has very high practicability.
Preferably, in the above-mentioned technical solutions, testing audio is translated to computer literal this document and held by audio conversion module
Capable concrete operations are as follows:
Testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to speech waveform signal, extract with
The corresponding acoustic feature value of speech waveform signal carries out model recognition training to acoustic feature value, determines corresponding acoustic model
And language model;
Contacting between acoustic feature value and sentence pronunciation modeling unit is created by acoustic model, and determines given text hair
The probability of voice is corresponded to out;
Language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to given text, export optimal text sequence.
Specific audio conversion module is mainly by front-end processing, acoustic model, language model, decoder (decoder) four
Big module composition.
Front end processing block is mainly that the speech waveform signal that will be received passes through pretreatment, carries out frequency spectrum to voice signal
Or cepstral analysis, extract corresponding acoustic feature value to carry out the recognition training of model, the quality of feature extraction is by direct shadow
Ring the precision to identification.
The task of acoustic model is to calculate p (X | W), i.e., after given text sequence, issues the probability of this section of voice.Acoustic mode
Type be automatic speech recognition system major part it in occupation of most computing cost and the performance that decide system.Sound
Model is learned the observational characteristic of voice signal and the pronunciation modeling unit of sentence is used to connect.Traditional speech recognition system
Generally use the acoustic model based on GMM-HMM (Gaussian Mixture hidden markov model).Microsoft Research Yu Dong in 2011, Deng
The deep neural network and hidden Markov mould based on context-sensitive (Context Dependent, CD) that power etc. puts forward
The acoustic model of type (CD-DNN-HMM), so that the accuracy of speech recognition has the raising of matter.
Language model (Language Model, LM) is the Probability p (W) generated for predicting character (word) sequence.Language
Model generally utilizes chain rule, the probability of a sentence is disassembled the product at the wherein probability of each word.If W is by w1,
w2,...,wnComposition, then P (W) can be splitted into:
P (W)=P (w1)P(w2|w1)P(w3|w1,w2)...P(wn|w1,w2,...,wn-1)
Each single item is all the probability of current word under conditions of all words before known.It is most common to do to improve efficiency
Method is to think that the probability distribution of each word only depends on several words last in history.Such language model is known as n-gram
Model, in n-gram model, the probability distribution of each word only depends on the word of front n-1.Such as in 2-gram model, it is
Split into following this form:
P (W)=P (w1)P(w2|w1)P(w3|w2)...P(wn|wn-1)
The training of usual language model and acoustic model is relatively independent.After training each model, Wo Menxu
The two to be combined by a decoding stage.Such as formula:
Decoded final purpose is to combine language model and acoustic model, obtains an optimal output sequence by search
Column.Viterbi algorithm (Viterbi Algorithm) is generally used in the decoder of mainstream at present.
In practice, this four module is carried out and is conditioned each other simultaneously, cuts down not excellent enough possibility at any time, finally acceptable
Time in find out optimal solution
Above-mentioned technical proposal can export optimal text sequence, i.e. subsequent standards voice effectively according to testing audio
The foundation of generation.By to a series of operation such as testing audio, it is ensured that the accuracy and uniqueness of the text sequence of output,
Data are provided for subsequent scoring and the output of standard audio to support.
Preferably, in the above-mentioned technical solutions, audio comparison module, be also used to respectively to standard audio and testing audio
The intrinsic of corresponding phonetic speech power spectrum declines and is supplemented by the high frequency section that articulatory system constrains;
To the standard audio and testing audio progress sub-frame processing after supplement.
By the above-mentioned processing to testing audio and standard audio progress, it is effectively guaranteed subsequent to standard audio and survey
The extraction of audition frequency mel-frequency cepstrum coefficient, improves the efficiency of audio identification.
Preferably, in the above-mentioned technical solutions, audio comparison module, standard audio and test tone after being also used to supplement
The time domain waveform of every frame audio of frequency is converted to frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to component frequency feature, mel-frequency cepstrum coefficient is obtained.
By the conversion of frequency domain figure to standard audio and testing audio and the extraction of the component frequency feature of every frame audio,
It effectively ensures lifting for mel-frequency cepstrum coefficient, ensure that the accuracy and efficiency that mel-frequency cepstrum coefficient extracts.
A kind of storage medium is additionally provided, program instruction is stored thereon with, described program is instructed when being executed by processor,
Implementation method the method for claim.
Reader should be understood that in the description of this specification reference term " one embodiment ", " is shown " some embodiments "
The description of example ", " specific example " or " some examples " etc. mean specific features described in conjunction with this embodiment or example, structure,
Material or feature are included at least one embodiment or example of the invention.In the present specification, above-mentioned term is shown
The statement of meaning property need not be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described
It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this
The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples
Sign is combined.
It is apparent to those skilled in the art that for convenience of description and succinctly, the dress of foregoing description
The specific work process with unit is set, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, only
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit
Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks
On unit.It can select some or all of unit therein according to the actual needs to realize the mesh of the embodiment of the present invention
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.
It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product
To be stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or
Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products
Out, which is stored in a storage medium, including some instructions are used so that a computer equipment
(can be personal computer, server or the network equipment etc.) executes all or part of each embodiment method of the present invention
Step.And storage medium above-mentioned include: USB flash disk, it is mobile hard disk, read-only memory (ROM, Read-Only Memory), random
Access various Jie that can store program code such as memory (RAM, Random Access Memory), magnetic or disk
Matter.
Above embodiment, which is intended to illustrate the present invention, to be realized or use for professional and technical personnel in the field, to above-mentioned
Embodiment, which is modified, will be readily apparent to those skilled in the art, therefore the present invention includes but is not limited to
Above embodiment, it is any to meet the claims or specification description, meet with principles disclosed herein and novelty,
The method of inventive features, technique, product, fall within the scope of protection of the present invention.
Claims (9)
1. a kind of exercising method of Oral English Practice, which comprises the following steps:
Spoken testing audio is received, the testing audio is translated into computer literal this document;
Described computer literal this document is translated into standard audio;
The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, is calculated according to hidden Markov model
The phoneme posterior probability of the testing audio and the standard audio determines the testing audio according to the phoneme posterior probability
The opposite percentage with the standard audio scores, and exports the standard audio and percentage scoring.
2. the exercising method of Oral English Practice as described in claim 1, which is characterized in that described to translate to the testing audio
Computer literal this document specifically includes the following steps:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to the speech waveform signal, mentioned
Acoustic feature value corresponding with the speech waveform signal is taken, model recognition training, determination pair are carried out to the acoustic feature value
The acoustic model and language model answered;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determines given text
Word issues the probability of corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text sequence
Column.
3. the exercising method of Oral English Practice as described in claim 1, which is characterized in that state testing audio in described extract respectively
Before the mel-frequency cepstrum coefficient of the standard audio, it is described by described computer literal this document translate to standard audio it
Afterwards, further comprising the steps of:
Respectively to the intrinsic decline of phonetic speech power corresponding with the standard audio and testing audio spectrum and by articulatory system
Oppressive high frequency section is supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
4. the exercising method of Oral English Practice as claimed in claim 3, which is characterized in that it is described respectively extract state testing audio and
The mel-frequency cepstrum coefficient of the standard audio, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
5. a kind of exercise system of Oral English Practice characterized by comprising
The testing audio is translated to computer literal this document for receiving spoken testing audio by audio conversion module;
Text conversion module, for described computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and the standard audio respectively, according to
Hidden Markov model calculates the phoneme posterior probability of the testing audio Yu the standard audio, general according to the phoneme posteriority
Rate determines that the opposite percentage with the standard audio of the testing audio scores, and exports the standard audio and described hundred
Divide than scoring.
6. the exercise system of Oral English Practice as claimed in claim 5, which is characterized in that the audio conversion module is by the survey
Audition frequency translates to the concrete operations of computer literal this document execution are as follows:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to the speech waveform signal, mentioned
Acoustic feature value corresponding with the speech waveform signal is taken, model recognition training, determination pair are carried out to the acoustic feature value
The acoustic model and language model answered;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determines given text
Word issues the probability of corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text sequence
Column.
7. the exercise system of Oral English Practice as claimed in claim 5, which is characterized in that the audio comparison module is also used to
Constrain respectively to the intrinsic decline of phonetic speech power corresponding with the standard audio and testing audio spectrum and by articulatory system
High frequency section supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
8. the exercise system of Oral English Practice as claimed in claim 7, which is characterized in that the audio comparison module is also used to
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
9. a kind of storage medium, is stored thereon with program instruction, which is characterized in that described program instruction is being executed by processor
When, realize the described in any item methods of Claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811376417.2A CN109300339A (en) | 2018-11-19 | 2018-11-19 | A kind of exercising method and system of Oral English Practice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811376417.2A CN109300339A (en) | 2018-11-19 | 2018-11-19 | A kind of exercising method and system of Oral English Practice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109300339A true CN109300339A (en) | 2019-02-01 |
Family
ID=65144144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811376417.2A Pending CN109300339A (en) | 2018-11-19 | 2018-11-19 | A kind of exercising method and system of Oral English Practice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109300339A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109979257A (en) * | 2019-04-27 | 2019-07-05 | 深圳市数字星河科技有限公司 | A method of partition operation is carried out based on reading English auto-scoring and is precisely corrected |
CN110797049A (en) * | 2019-10-17 | 2020-02-14 | 科大讯飞股份有限公司 | Voice evaluation method and related device |
CN111640452A (en) * | 2019-03-01 | 2020-09-08 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN112837679A (en) * | 2020-12-31 | 2021-05-25 | 北京策腾教育科技集团有限公司 | Language learning method and system |
CN115346421A (en) * | 2021-05-12 | 2022-11-15 | 北京猿力未来科技有限公司 | Spoken language fluency scoring method, computing device and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013057735A (en) * | 2011-09-07 | 2013-03-28 | National Institute Of Information & Communication Technology | Hidden markov model learning device for voice synthesis and voice synthesizer |
CN103985392A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phoneme-level low-power consumption spoken language assessment and defect diagnosis method |
CN103985391A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation |
CN104517606A (en) * | 2013-09-30 | 2015-04-15 | 腾讯科技(深圳)有限公司 | Method and device for recognizing and testing speech |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
CN104810017A (en) * | 2015-04-08 | 2015-07-29 | 广东外语外贸大学 | Semantic analysis-based oral language evaluating method and system |
CN107886968A (en) * | 2017-12-28 | 2018-04-06 | 广州讯飞易听说网络科技有限公司 | Speech evaluating method and system |
CN108305639A (en) * | 2018-05-11 | 2018-07-20 | 南京邮电大学 | Speech-emotion recognition method, computer readable storage medium, terminal |
-
2018
- 2018-11-19 CN CN201811376417.2A patent/CN109300339A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013057735A (en) * | 2011-09-07 | 2013-03-28 | National Institute Of Information & Communication Technology | Hidden markov model learning device for voice synthesis and voice synthesizer |
CN104517606A (en) * | 2013-09-30 | 2015-04-15 | 腾讯科技(深圳)有限公司 | Method and device for recognizing and testing speech |
CN103985392A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phoneme-level low-power consumption spoken language assessment and defect diagnosis method |
CN103985391A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation |
CN104732977A (en) * | 2015-03-09 | 2015-06-24 | 广东外语外贸大学 | On-line spoken language pronunciation quality evaluation method and system |
CN104810017A (en) * | 2015-04-08 | 2015-07-29 | 广东外语外贸大学 | Semantic analysis-based oral language evaluating method and system |
CN107886968A (en) * | 2017-12-28 | 2018-04-06 | 广州讯飞易听说网络科技有限公司 | Speech evaluating method and system |
CN108305639A (en) * | 2018-05-11 | 2018-07-20 | 南京邮电大学 | Speech-emotion recognition method, computer readable storage medium, terminal |
Non-Patent Citations (1)
Title |
---|
涂惠燕: "移动设备平台上英语口语学习中的语音识别技术", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640452A (en) * | 2019-03-01 | 2020-09-08 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN109979257A (en) * | 2019-04-27 | 2019-07-05 | 深圳市数字星河科技有限公司 | A method of partition operation is carried out based on reading English auto-scoring and is precisely corrected |
CN109979257B (en) * | 2019-04-27 | 2021-01-08 | 深圳市数字星河科技有限公司 | Method for performing accurate splitting operation correction based on English reading automatic scoring |
CN110797049A (en) * | 2019-10-17 | 2020-02-14 | 科大讯飞股份有限公司 | Voice evaluation method and related device |
CN112837679A (en) * | 2020-12-31 | 2021-05-25 | 北京策腾教育科技集团有限公司 | Language learning method and system |
CN115346421A (en) * | 2021-05-12 | 2022-11-15 | 北京猿力未来科技有限公司 | Spoken language fluency scoring method, computing device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107221318B (en) | English spoken language pronunciation scoring method and system | |
CN103928023B (en) | A kind of speech assessment method and system | |
Shobaki et al. | The OGI kids’ speech corpus and recognizers | |
CN101751919B (en) | Spoken Chinese stress automatic detection method | |
CN109300339A (en) | A kind of exercising method and system of Oral English Practice | |
WO2006034200A2 (en) | Method and system for the automatic generation of speech features for scoring high entropy speech | |
CN107886968B (en) | Voice evaluation method and system | |
Yin et al. | Automatic cognitive load detection from speech features | |
CN106653002A (en) | Literal live broadcasting method and platform | |
Mohammed et al. | Quranic verses verification using speech recognition techniques | |
Shah et al. | Effectiveness of PLP-based phonetic segmentation for speech synthesis | |
JP2001166789A (en) | Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Sinha et al. | Acoustic-phonetic feature based dialect identification in Hindi Speech | |
Chauhan et al. | Emotion recognition using LP residual | |
Hämäläinen et al. | Improving speech recognition through automatic selection of age group–specific acoustic models | |
CN110853669B (en) | Audio identification method, device and equipment | |
Gültekin et al. | Turkish dialect recognition using acoustic and phonotactic features in deep learning architectures | |
Hanani et al. | Palestinian Arabic regional accent recognition | |
Fatima et al. | Vowel-category based short utterance speaker recognition | |
Cahyaningtyas et al. | HMM-based indonesian speech synthesis system with declarative and question sentences intonation | |
Hanani et al. | Speech-based identification of social groups in a single accent of British English by humans and computers | |
Li et al. | English sentence pronunciation evaluation using rhythm and intonation | |
Rai et al. | An efficient online examination system using speech recognition | |
Lachhab et al. | Improving the recognition of pathological voice using the discriminant HLDA transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |