CN109300339A - A kind of exercising method and system of Oral English Practice - Google Patents

A kind of exercising method and system of Oral English Practice Download PDF

Info

Publication number
CN109300339A
CN109300339A CN201811376417.2A CN201811376417A CN109300339A CN 109300339 A CN109300339 A CN 109300339A CN 201811376417 A CN201811376417 A CN 201811376417A CN 109300339 A CN109300339 A CN 109300339A
Authority
CN
China
Prior art keywords
audio
testing
standard
standard audio
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811376417.2A
Other languages
Chinese (zh)
Inventor
王泓懿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201811376417.2A priority Critical patent/CN109300339A/en
Publication of CN109300339A publication Critical patent/CN109300339A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/04Electrically-operated educational appliances with audible presentation of the material to be studied
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages

Abstract

The invention discloses a kind of exercising method of Oral English Practice and system, wherein the testing audio is translated to computer literal this document the following steps are included: receive spoken testing audio by method;Described computer literal this document is translated into standard audio;The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, the phoneme posterior probability of the testing audio Yu the standard audio is calculated according to hidden Markov model, it determines that the opposite percentage with the standard audio of the testing audio scores according to the phoneme posterior probability, and exports the standard audio and percentage scoring.The above method and system can be effectively according to the corresponding standard audios of the audio output of input, while can also provide reasonable scoring for the audio of input, can effectively improve the Oral English Practice ability of user, have very high practicability.

Description

A kind of exercising method and system of Oral English Practice
Technical field
The present invention relates to language learning field, in particular to the exercising method and system of a kind of Oral English Practice.
Background technique
With economic globalization deeply and the promotion of Chinese overall national strength, China and exchanging for the world are just increasingly frequent, Demand to international language knowledge is also being skyrocketed through.Meanwhile depending on information technology and making rapid progress, area of computer aided language Speech study is increasingly mature, so that e-learning spoken language becomes possibility.But existing terminal teaching is still habitually continued to use Intrinsic teaching pattern, stresses the study of word and grammer mostly, and spoken language exercise software few in number can only also be provided and only be limited In simulation communication read aloud or with read function, cannot fundamentally improve the use ability of user's English.
Summary of the invention
In view of the above technical problems, the present invention, which provides one kind, effectively to correspond to standard pronunciation according to the audio output received Frequently, while corresponding scoring is provided, and the exercising method and system of a kind of Oral English Practice of Oral English Practice ability can be effectively improved.
In order to solve the above technical problems, the technical solution used in the present invention is: providing a kind of practice side of Oral English Practice Method, which comprises the following steps:
Spoken testing audio is received, the testing audio is translated into computer literal this document;
Described computer literal this document is translated into standard audio;
The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, according to hidden Markov model The phoneme posterior probability for calculating the testing audio Yu the standard audio determines the test according to the phoneme posterior probability The opposite percentage with the standard audio of audio scores, and exports the standard audio and percentage scoring.
The invention adopts the above technical scheme, the technical effect reached are as follows: the practice side of Oral English Practice provided by the invention Method can export corresponding computer literal this document, and according to the output pair of computer literal this document effectively according to testing audio The standard audio answered can effectively determine survey by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio The phoneme posterior probability of audition frequency and standard audio, and determine testing audio relative to standard audio according to phoneme posterior probability Scoring, while outputting standard audio.The exercising method of above-mentioned Oral English Practice can be effectively corresponding according to the audio output of input Standard audio, while can also for input audio provide reasonable scoring, can effectively improve the Oral English Practice of user Ability has very high practicability.
Preferably, in the above-mentioned technical solutions, described the testing audio is translated into computer literal this document specifically to wrap Include following steps:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstrum point are carried out to the speech waveform signal Acoustic feature value corresponding with the speech waveform signal is extracted in analysis, carries out model recognition training to the acoustic feature value, really Fixed corresponding acoustic model and language model;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determination is given Determine the probability that text issues corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs general Rate;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text This sequence.
Preferably, in the above-mentioned technical solutions, the Meier of testing audio and the standard audio is stated in the extraction respectively Before frequency cepstral coefficient, it is described described computer literal this document is translated into standard audio after, it is further comprising the steps of:
Respectively to phonetic speech power corresponding with the standard audio and the testing audio spectrum it is intrinsic decline and pronounced The oppressive high frequency section of system is supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
Preferably, in the above-mentioned technical solutions, described to extract the Meier frequency for stating testing audio and the standard audio respectively Rate cepstrum coefficient, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
Additionally provide a kind of exercise system of Oral English Practice, comprising:
The testing audio is translated to computer literal herein for receiving spoken testing audio by audio conversion module Part;
Text conversion module, for described computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and the standard audio respectively, The phoneme posterior probability that the testing audio Yu the standard audio are calculated according to hidden Markov model, after the phoneme It tests the opposite percentage with the standard audio of testing audio described in determine the probability to score, and exports the standard audio and institute State percentage scoring.
Preferably, in the above-mentioned technical solutions, the testing audio is translated to computer literal by the audio conversion module The concrete operations that this document executes are as follows:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstrum point are carried out to the speech waveform signal Acoustic feature value corresponding with the speech waveform signal is extracted in analysis, carries out model recognition training to the acoustic feature value, really Fixed corresponding acoustic model and language model;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determination is given Determine the probability that text issues corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs general Rate;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text This sequence.
Preferably, in the above-mentioned technical solutions, the audio comparison module, be also used to respectively to the standard audio and It the intrinsic decline of the corresponding phonetic speech power spectrum of the testing audio and is supplemented by the high frequency section that articulatory system constrains;
To the standard audio and testing audio progress sub-frame processing after supplement.
Preferably, in the above-mentioned technical solutions, the audio comparison module, standard audio and survey after being also used to supplement The time domain waveform of every frame audio of audition frequency is converted to frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
The invention adopts the above technical scheme, the technical effect reached are as follows: the practice system of Oral English Practice provided by the invention System can export corresponding computer literal this document, and according to the output pair of computer literal this document effectively according to testing audio The standard audio answered can effectively determine survey by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio The phoneme posterior probability of audition frequency and standard audio, and determine testing audio relative to standard audio according to phoneme posterior probability Scoring, while outputting standard audio.The exercise system of above-mentioned Oral English Practice can be effectively corresponding according to the audio output of input Standard audio, while can also for input audio provide reasonable scoring, can effectively improve the Oral English Practice of user Ability has very high practicability.
A kind of storage medium is additionally provided, program instruction is stored thereon with, described program is instructed when being executed by processor, Implementation method the method for claim.
Detailed description of the invention
The present invention will be further explained below with reference to the attached drawings:
Fig. 1 is the exercising method schematic flow chart of Oral English Practice provided by the invention;
Fig. 2 is the schematic flow chart of audio conversion text provided by the invention;
Fig. 3 is the schematic block diagram of the exercise system of Oral English Practice provided by the invention.
Specific embodiment
In order to effectively improve the oracy of user's English, the present invention provides a kind of practice sides of Oral English Practice Method is detailed in figure.Fig. 1 is the schematic flow chart of the exercising method of Oral English Practice provided by the invention.Specifically includes the following steps:
Step S10: spoken testing audio is received, testing audio is translated into computer literal this document;
Step S20: computer literal this document is translated into standard audio;
Step S30: the mel-frequency cepstrum coefficient for stating testing audio and standard audio is extracted respectively, according to hidden Markov Model calculates the phoneme posterior probability of testing audio and standard audio, determines that testing audio is opposite with mark according to phoneme posterior probability The percentage of quasi- audio scores, and outputting standard audio and percentage scoring.
The above method can export corresponding computer literal this document, and herein according to computer literal according to testing audio Part exports corresponding standard audio can be effective by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio Determination testing audio and standard audio phoneme posterior probability, and determine testing audio relative to mark according to phoneme posterior probability The scoring of quasi- audio, while outputting standard audio.Allow users to according to specific scoring, for oneself oracy into Row improves, and the effective Oral English Practice ability for improving user has very high practicability.
On the basis of Fig. 1 corresponding embodiment, also improved.It is detailed in Fig. 2, Fig. 2 is audio conversion provided by the invention The schematic flow chart of text.Specifically includes the following steps:
Step S11: being converted to speech waveform signal for testing audio, carries out frequency spectrum or cepstrum point to speech waveform signal Acoustic feature value corresponding with speech waveform signal is extracted in analysis, is carried out model recognition training to acoustic feature value, is determined corresponding Acoustic model and language model;
Step S12: contacting between acoustic feature value and sentence pronunciation modeling unit is created by acoustic model, and determination is given Determine the probability that text issues corresponding voice;
Step S13: language model disassembles complete sentence for single word according to chain rule, and determines that current word occurs Probability;
The probability that step S14: issuing the probability of corresponding voice according to given text and current word occurs, exports optimal text This sequence.
Above-mentioned technical proposal can export optimal text sequence, i.e. subsequent standards voice effectively according to testing audio The foundation of generation.By to a series of operation such as testing audio, it is ensured that the accuracy and uniqueness of the text sequence of output, Data are provided for subsequent scoring and the output of standard audio to support.
On the basis of Fig. 2 corresponding embodiment, in order to guarantee that system has good recognition effect, also improved. Specifically before the mel-frequency cepstrum coefficient for extracting testing audio and standard audio respectively, computer literal this document is translated It is further comprising the steps of after standard audio:
Constrain respectively to the intrinsic decline of phonetic speech power corresponding with standard audio and testing audio spectrum and by articulatory system High frequency section supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
By the above-mentioned processing to testing audio and standard audio progress, it is effectively guaranteed subsequent to standard audio and survey The extraction of audition frequency mel-frequency cepstrum coefficient, improves the efficiency of audio identification.
Preferably, in the above-mentioned technical solutions, the mel-frequency cepstrum system for stating testing audio and standard audio is extracted respectively Number, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to component frequency feature, mel-frequency cepstrum coefficient is obtained.
By the conversion of frequency domain figure to standard audio and testing audio and the extraction of the component frequency feature of every frame audio, It effectively ensures lifting for mel-frequency cepstrum coefficient, ensure that the accuracy and efficiency that mel-frequency cepstrum coefficient extracts.
On the basis of Fig. 1 corresponding embodiment of the method, the present invention also provides a kind of exercise systems of Oral English Practice, in detail See that Fig. 3, Fig. 3 are the schematic block diagram of the exercise system of Oral English Practice provided by the invention.A kind of exercise system of Oral English Practice Include:
Testing audio is translated to computer literal this document for receiving spoken testing audio by audio conversion module;
Text conversion module, for computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and standard audio respectively, according to Hidden Markov model calculates the phoneme posterior probability of testing audio and standard audio, determines test tone according to phoneme posterior probability The opposite percentage with standard audio of frequency scores, and outputting standard audio and percentage scoring.
When scoring standard audio, it is necessary to be built upon the specific point of articulation basis that user is able to use On, meanwhile, the result of software feedback must be similar with the sense of hearing judging result of native speaker of English.
For audio comparison module based on speech recognition module (ASR), ASR system exports speech recognition period before Text is converted to audio, and the test tone as master pattern and practitioner compares, and comments to provide to the test tone of practitioner Point.
Pretreatment: firstly, being pre-processed respectively to test speaker and canonical reference pronunciation.Pretreatment includes to pronunciation 1) preemphasis: the intrinsic high frequency section for declining and being constrained by articulatory system of supplement phonetic speech power spectrum, to reduce noise Influence to end-point detection later and characteristic parameter extraction module.2) framing adding window: long jiggly phonetic segmentation at 20- 50 milliseconds of short and small " frame ", to meet the condition of Fourier transformation.3) end-point detection: as far as possible by one section of pretreated voice It is divided into independent word.Pretreated purpose is that guarantee system has good recognition effect.
It extracts the process of MFCC: voice to be measured being pre-processed first.Fast Fourier Transform (FFT) is passed through to wherein every frame (FFr) voice is transformed into frequency domain figure from time domain waveform should by the acquirement of Meier filter group according to the auditory properties of human ear The component frequency feature of frame voice, then by can be obtained by MFCC after discrete cosine transform (DCT).
HMM model: hidden Markov model (Hidden Markov Models, HMM), a kind of system as voice signal Model is counted, is the major technique model of current speech processes every field.HMM includes five basic elements and three big basic calculations Method, wherein decoding algorithm viterbi is also the basis of pronunciation scoring algorithm in Oral English Practice study.For giving the sequence of observations And model λ=(A, B, π), Viterbi algorithm can not only find a status switch Q=q good enough1q2...qtTo explain The sequence of observations can also obtain output probability corresponding to the path.
After above-mentioned processing, the phoneme posterior probability of available test speaker contrast standard reference model.If this Shi Jinhang be grading parameters generating process, then need expert for this pronunciation carry out experience marking, it is general to obtain phoneme posteriority Several corresponding relationships between rate and expertise scoring can train the auto-adaptive parameter x to be scored according to corresponding relationship With y, and then determine score function for pronounce scoring.If what is carried out at this time is the operation of pronunciation scoring, system can be incited somebody to action The phoneme posterior probability of test speaker substitutes into score function, finally obtains pronunciation scoring.
Scoring algorithm:
Scoring process can regard a kind of mode identification procedure based on HMM model as, after feature extraction, known to setting The output observation sequence of voice to be scored is O (O1,O2,...,Ot), it usesIndicate canonical reference HMM model, wherein π Indicate reset condition distribution, A is St-1To StState transformation probability matrix, B is HMM in i environment corresponding to status switch Observation sequence output probability matrix, there are more recessive state sequence S=(s in the model1,s2,...,st), then voice Assessment is operation when known to canonical reference HMM model π, obtains the probability of input voice observation sequence OProcess. Cutting alignment, acquisition most probable corresponding recessiveness with observation sequence O are implemented to the phoneme in characteristic sequence using Viterbi algorithm Status switch S. repeatedly trains HMM model, updates the parameter in the model, the HMM that output matches with observation sequence The optimal probability of modelThe optimum probability is then posterior probability scoring.
For each frame OtPhoneme q is calculatediPosterior probability P (qi|Ot):
Wherein P (Ot|qi) it is given phoneme qiLower observation vector OtProbability distribution, P (qi) it is phoneme qiPrior probability, Denominator is to obtain observed quantity O to the phoneme of all text locatingstProbability summation.Phoneme qiPosterior probability under i sections each Logarithm is taken, is then added up, so that it may obtain phoneme qiPosterior probability score under i sections of voices.
And the posterior probability score of entire sentence are as follows:
Wherein N is the number of phoneme in sentence.
In view of word speed and an index of the spoken qualification of judge, so pronunciation rate should be included in judge mark Standard finally can define the score of phoneme duration are as follows:
Wherein diIt is corresponding to phoneme qiI-th section of duration, f (di) it is normalized function, this allows for text and speaks The independence of people normalizes voice duration using the measurement of voice rate (ROS), and voice rate is in a word or a speaker In all pronunciations, the phoneme quantity of every unit duration.Usually take f (di)=ROSdi
The above method can export corresponding computer literal this document, and herein according to computer literal according to testing audio Part exports corresponding standard audio can be effective by the testing audio of extraction and the mel-frequency cepstrum coefficient of standard audio Determination testing audio and standard audio phoneme posterior probability, and determine testing audio relative to mark according to phoneme posterior probability The scoring of quasi- audio, while outputting standard audio.Allow users to according to specific scoring, for oneself oracy into Row improves, and the effective Oral English Practice ability for improving user has very high practicability.
Preferably, in the above-mentioned technical solutions, testing audio is translated to computer literal this document and held by audio conversion module Capable concrete operations are as follows:
Testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to speech waveform signal, extract with The corresponding acoustic feature value of speech waveform signal carries out model recognition training to acoustic feature value, determines corresponding acoustic model And language model;
Contacting between acoustic feature value and sentence pronunciation modeling unit is created by acoustic model, and determines given text hair The probability of voice is corresponded to out;
Language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to given text, export optimal text sequence.
Specific audio conversion module is mainly by front-end processing, acoustic model, language model, decoder (decoder) four Big module composition.
Front end processing block is mainly that the speech waveform signal that will be received passes through pretreatment, carries out frequency spectrum to voice signal Or cepstral analysis, extract corresponding acoustic feature value to carry out the recognition training of model, the quality of feature extraction is by direct shadow Ring the precision to identification.
The task of acoustic model is to calculate p (X | W), i.e., after given text sequence, issues the probability of this section of voice.Acoustic mode Type be automatic speech recognition system major part it in occupation of most computing cost and the performance that decide system.Sound Model is learned the observational characteristic of voice signal and the pronunciation modeling unit of sentence is used to connect.Traditional speech recognition system Generally use the acoustic model based on GMM-HMM (Gaussian Mixture hidden markov model).Microsoft Research Yu Dong in 2011, Deng The deep neural network and hidden Markov mould based on context-sensitive (Context Dependent, CD) that power etc. puts forward The acoustic model of type (CD-DNN-HMM), so that the accuracy of speech recognition has the raising of matter.
Language model (Language Model, LM) is the Probability p (W) generated for predicting character (word) sequence.Language Model generally utilizes chain rule, the probability of a sentence is disassembled the product at the wherein probability of each word.If W is by w1, w2,...,wnComposition, then P (W) can be splitted into:
P (W)=P (w1)P(w2|w1)P(w3|w1,w2)...P(wn|w1,w2,...,wn-1)
Each single item is all the probability of current word under conditions of all words before known.It is most common to do to improve efficiency Method is to think that the probability distribution of each word only depends on several words last in history.Such language model is known as n-gram Model, in n-gram model, the probability distribution of each word only depends on the word of front n-1.Such as in 2-gram model, it is Split into following this form:
P (W)=P (w1)P(w2|w1)P(w3|w2)...P(wn|wn-1)
The training of usual language model and acoustic model is relatively independent.After training each model, Wo Menxu The two to be combined by a decoding stage.Such as formula:
Decoded final purpose is to combine language model and acoustic model, obtains an optimal output sequence by search Column.Viterbi algorithm (Viterbi Algorithm) is generally used in the decoder of mainstream at present.
In practice, this four module is carried out and is conditioned each other simultaneously, cuts down not excellent enough possibility at any time, finally acceptable Time in find out optimal solution
Above-mentioned technical proposal can export optimal text sequence, i.e. subsequent standards voice effectively according to testing audio The foundation of generation.By to a series of operation such as testing audio, it is ensured that the accuracy and uniqueness of the text sequence of output, Data are provided for subsequent scoring and the output of standard audio to support.
Preferably, in the above-mentioned technical solutions, audio comparison module, be also used to respectively to standard audio and testing audio The intrinsic of corresponding phonetic speech power spectrum declines and is supplemented by the high frequency section that articulatory system constrains;
To the standard audio and testing audio progress sub-frame processing after supplement.
By the above-mentioned processing to testing audio and standard audio progress, it is effectively guaranteed subsequent to standard audio and survey The extraction of audition frequency mel-frequency cepstrum coefficient, improves the efficiency of audio identification.
Preferably, in the above-mentioned technical solutions, audio comparison module, standard audio and test tone after being also used to supplement The time domain waveform of every frame audio of frequency is converted to frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to component frequency feature, mel-frequency cepstrum coefficient is obtained.
By the conversion of frequency domain figure to standard audio and testing audio and the extraction of the component frequency feature of every frame audio, It effectively ensures lifting for mel-frequency cepstrum coefficient, ensure that the accuracy and efficiency that mel-frequency cepstrum coefficient extracts.
A kind of storage medium is additionally provided, program instruction is stored thereon with, described program is instructed when being executed by processor, Implementation method the method for claim.
Reader should be understood that in the description of this specification reference term " one embodiment ", " is shown " some embodiments " The description of example ", " specific example " or " some examples " etc. mean specific features described in conjunction with this embodiment or example, structure, Material or feature are included at least one embodiment or example of the invention.In the present specification, above-mentioned term is shown The statement of meaning property need not be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
It is apparent to those skilled in the art that for convenience of description and succinctly, the dress of foregoing description The specific work process with unit is set, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of unit, only A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.It can select some or all of unit therein according to the actual needs to realize the mesh of the embodiment of the present invention 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.
It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product To be stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products Out, which is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes all or part of each embodiment method of the present invention Step.And storage medium above-mentioned include: USB flash disk, it is mobile hard disk, read-only memory (ROM, Read-Only Memory), random Access various Jie that can store program code such as memory (RAM, Random Access Memory), magnetic or disk Matter.
Above embodiment, which is intended to illustrate the present invention, to be realized or use for professional and technical personnel in the field, to above-mentioned Embodiment, which is modified, will be readily apparent to those skilled in the art, therefore the present invention includes but is not limited to Above embodiment, it is any to meet the claims or specification description, meet with principles disclosed herein and novelty, The method of inventive features, technique, product, fall within the scope of protection of the present invention.

Claims (9)

1. a kind of exercising method of Oral English Practice, which comprises the following steps:
Spoken testing audio is received, the testing audio is translated into computer literal this document;
Described computer literal this document is translated into standard audio;
The mel-frequency cepstrum coefficient for stating testing audio and the standard audio is extracted respectively, is calculated according to hidden Markov model The phoneme posterior probability of the testing audio and the standard audio determines the testing audio according to the phoneme posterior probability The opposite percentage with the standard audio scores, and exports the standard audio and percentage scoring.
2. the exercising method of Oral English Practice as described in claim 1, which is characterized in that described to translate to the testing audio Computer literal this document specifically includes the following steps:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to the speech waveform signal, mentioned Acoustic feature value corresponding with the speech waveform signal is taken, model recognition training, determination pair are carried out to the acoustic feature value The acoustic model and language model answered;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determines given text Word issues the probability of corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text sequence Column.
3. the exercising method of Oral English Practice as described in claim 1, which is characterized in that state testing audio in described extract respectively Before the mel-frequency cepstrum coefficient of the standard audio, it is described by described computer literal this document translate to standard audio it Afterwards, further comprising the steps of:
Respectively to the intrinsic decline of phonetic speech power corresponding with the standard audio and testing audio spectrum and by articulatory system Oppressive high frequency section is supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
4. the exercising method of Oral English Practice as claimed in claim 3, which is characterized in that it is described respectively extract state testing audio and The mel-frequency cepstrum coefficient of the standard audio, specifically includes the following steps:
The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
5. a kind of exercise system of Oral English Practice characterized by comprising
The testing audio is translated to computer literal this document for receiving spoken testing audio by audio conversion module;
Text conversion module, for described computer literal this document to be translated to standard audio;
Audio comparison module, for extracting the mel-frequency cepstrum coefficient for stating testing audio and the standard audio respectively, according to Hidden Markov model calculates the phoneme posterior probability of the testing audio Yu the standard audio, general according to the phoneme posteriority Rate determines that the opposite percentage with the standard audio of the testing audio scores, and exports the standard audio and described hundred Divide than scoring.
6. the exercise system of Oral English Practice as claimed in claim 5, which is characterized in that the audio conversion module is by the survey Audition frequency translates to the concrete operations of computer literal this document execution are as follows:
The testing audio is converted into speech waveform signal, frequency spectrum or cepstral analysis are carried out to the speech waveform signal, mentioned Acoustic feature value corresponding with the speech waveform signal is taken, model recognition training, determination pair are carried out to the acoustic feature value The acoustic model and language model answered;
Contacting between the acoustic feature value and sentence pronunciation modeling unit is created by the acoustic model, and determines given text Word issues the probability of corresponding voice;
The language model disassembles complete sentence for single word according to chain rule, and determines the probability that current word occurs;
The probability of corresponding voice and the probability of current word appearance are issued according to the given text, export optimal text sequence Column.
7. the exercise system of Oral English Practice as claimed in claim 5, which is characterized in that the audio comparison module is also used to Constrain respectively to the intrinsic decline of phonetic speech power corresponding with the standard audio and testing audio spectrum and by articulatory system High frequency section supplemented;
To the standard audio and testing audio progress sub-frame processing after supplement.
8. the exercise system of Oral English Practice as claimed in claim 7, which is characterized in that the audio comparison module is also used to The time domain waveform of every frame audio of standard audio and testing audio after supplement is converted into frequency domain figure;
The component frequency feature of every frame audio is extracted respectively;
After carrying out discrete cosine transform to the component frequency feature, mel-frequency cepstrum coefficient is obtained.
9. a kind of storage medium, is stored thereon with program instruction, which is characterized in that described program instruction is being executed by processor When, realize the described in any item methods of Claims 1-4.
CN201811376417.2A 2018-11-19 2018-11-19 A kind of exercising method and system of Oral English Practice Pending CN109300339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811376417.2A CN109300339A (en) 2018-11-19 2018-11-19 A kind of exercising method and system of Oral English Practice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811376417.2A CN109300339A (en) 2018-11-19 2018-11-19 A kind of exercising method and system of Oral English Practice

Publications (1)

Publication Number Publication Date
CN109300339A true CN109300339A (en) 2019-02-01

Family

ID=65144144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811376417.2A Pending CN109300339A (en) 2018-11-19 2018-11-19 A kind of exercising method and system of Oral English Practice

Country Status (1)

Country Link
CN (1) CN109300339A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109979257A (en) * 2019-04-27 2019-07-05 深圳市数字星河科技有限公司 A method of partition operation is carried out based on reading English auto-scoring and is precisely corrected
CN110797049A (en) * 2019-10-17 2020-02-14 科大讯飞股份有限公司 Voice evaluation method and related device
CN111640452A (en) * 2019-03-01 2020-09-08 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN112837679A (en) * 2020-12-31 2021-05-25 北京策腾教育科技集团有限公司 Language learning method and system
CN115346421A (en) * 2021-05-12 2022-11-15 北京猿力未来科技有限公司 Spoken language fluency scoring method, computing device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013057735A (en) * 2011-09-07 2013-03-28 National Institute Of Information & Communication Technology Hidden markov model learning device for voice synthesis and voice synthesizer
CN103985392A (en) * 2014-04-16 2014-08-13 柳超 Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN104517606A (en) * 2013-09-30 2015-04-15 腾讯科技(深圳)有限公司 Method and device for recognizing and testing speech
CN104732977A (en) * 2015-03-09 2015-06-24 广东外语外贸大学 On-line spoken language pronunciation quality evaluation method and system
CN104810017A (en) * 2015-04-08 2015-07-29 广东外语外贸大学 Semantic analysis-based oral language evaluating method and system
CN107886968A (en) * 2017-12-28 2018-04-06 广州讯飞易听说网络科技有限公司 Speech evaluating method and system
CN108305639A (en) * 2018-05-11 2018-07-20 南京邮电大学 Speech-emotion recognition method, computer readable storage medium, terminal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013057735A (en) * 2011-09-07 2013-03-28 National Institute Of Information & Communication Technology Hidden markov model learning device for voice synthesis and voice synthesizer
CN104517606A (en) * 2013-09-30 2015-04-15 腾讯科技(深圳)有限公司 Method and device for recognizing and testing speech
CN103985392A (en) * 2014-04-16 2014-08-13 柳超 Phoneme-level low-power consumption spoken language assessment and defect diagnosis method
CN103985391A (en) * 2014-04-16 2014-08-13 柳超 Phonetic-level low power consumption spoken language evaluation and defect diagnosis method without standard pronunciation
CN104732977A (en) * 2015-03-09 2015-06-24 广东外语外贸大学 On-line spoken language pronunciation quality evaluation method and system
CN104810017A (en) * 2015-04-08 2015-07-29 广东外语外贸大学 Semantic analysis-based oral language evaluating method and system
CN107886968A (en) * 2017-12-28 2018-04-06 广州讯飞易听说网络科技有限公司 Speech evaluating method and system
CN108305639A (en) * 2018-05-11 2018-07-20 南京邮电大学 Speech-emotion recognition method, computer readable storage medium, terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
涂惠燕: "移动设备平台上英语口语学习中的语音识别技术", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640452A (en) * 2019-03-01 2020-09-08 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN109979257A (en) * 2019-04-27 2019-07-05 深圳市数字星河科技有限公司 A method of partition operation is carried out based on reading English auto-scoring and is precisely corrected
CN109979257B (en) * 2019-04-27 2021-01-08 深圳市数字星河科技有限公司 Method for performing accurate splitting operation correction based on English reading automatic scoring
CN110797049A (en) * 2019-10-17 2020-02-14 科大讯飞股份有限公司 Voice evaluation method and related device
CN112837679A (en) * 2020-12-31 2021-05-25 北京策腾教育科技集团有限公司 Language learning method and system
CN115346421A (en) * 2021-05-12 2022-11-15 北京猿力未来科技有限公司 Spoken language fluency scoring method, computing device and storage medium

Similar Documents

Publication Publication Date Title
CN107221318B (en) English spoken language pronunciation scoring method and system
CN103928023B (en) A kind of speech assessment method and system
Shobaki et al. The OGI kids’ speech corpus and recognizers
CN101751919B (en) Spoken Chinese stress automatic detection method
CN109300339A (en) A kind of exercising method and system of Oral English Practice
WO2006034200A2 (en) Method and system for the automatic generation of speech features for scoring high entropy speech
CN107886968B (en) Voice evaluation method and system
Yin et al. Automatic cognitive load detection from speech features
CN106653002A (en) Literal live broadcasting method and platform
Mohammed et al. Quranic verses verification using speech recognition techniques
Shah et al. Effectiveness of PLP-based phonetic segmentation for speech synthesis
JP2001166789A (en) Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Sinha et al. Acoustic-phonetic feature based dialect identification in Hindi Speech
Chauhan et al. Emotion recognition using LP residual
Hämäläinen et al. Improving speech recognition through automatic selection of age group–specific acoustic models
CN110853669B (en) Audio identification method, device and equipment
Gültekin et al. Turkish dialect recognition using acoustic and phonotactic features in deep learning architectures
Hanani et al. Palestinian Arabic regional accent recognition
Fatima et al. Vowel-category based short utterance speaker recognition
Cahyaningtyas et al. HMM-based indonesian speech synthesis system with declarative and question sentences intonation
Hanani et al. Speech-based identification of social groups in a single accent of British English by humans and computers
Li et al. English sentence pronunciation evaluation using rhythm and intonation
Rai et al. An efficient online examination system using speech recognition
Lachhab et al. Improving the recognition of pathological voice using the discriminant HLDA transformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201