CN102237083A - Portable interpretation system based on WinCE platform and language recognition method thereof - Google Patents
Portable interpretation system based on WinCE platform and language recognition method thereof Download PDFInfo
- Publication number
- CN102237083A CN102237083A CN2010101605215A CN201010160521A CN102237083A CN 102237083 A CN102237083 A CN 102237083A CN 2010101605215 A CN2010101605215 A CN 2010101605215A CN 201010160521 A CN201010160521 A CN 201010160521A CN 102237083 A CN102237083 A CN 102237083A
- Authority
- CN
- China
- Prior art keywords
- module
- voice
- algorithm
- model
- system based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention relates to a portable interpretation system based on a WinCE platform, which comprises a voice collector, a voice preprocessing module, a voice feature extraction and modeling module, a model base, a recognition module, a corpus base and an interpretation and voice synthesis module, wherein all the modules are established on an embedded platform; the voice collection module is connected with the voice preprocessing module; the voice preprocessing module is connected with the voice feature extraction and modeling module; the voice feature extraction and modeling module is respectively connected with the model base or the recognition module; when a training state is selected, the voice feature extraction and modeling module is connected with the model base; when a recognition state is selected, the voice feature extraction and modeling module is connected with the recognition module; the recognition module is connected with the interpretation and voice synthesis module; and the interpretation and voice synthesis module is connected with the corpus base. The portable interpretation system has the characteristics of high voice recognition efficiency, high recognition accuracy, high equipment portability and two-way interpretation.
Description
Technical field
The present invention relates to the speech recognition technology field, change the portable oral translation system based on the WinCE platform of corresponding translation result after particularly a kind of voice signal identification that is used for the people is sent into.The invention still further relates to the audio recognition method of this translation system.
Background technology
Speech recognition technology allows machine pass through identification exactly and understands, and the voice signal that the people is sent changes corresponding text into or makes the technology of setting command, and it just progressively becomes the gordian technique of man-machine interface in the infotech.In recent years, along with the fast development of embedded device, characteristics such as consumer electronics product is deep into our various fields in life, and it is portable, and cost is low have obtained using widely, therefore, have very big consumption market based on Embedded speech recognition system.And traditional speech recognition system, as the SPEECHSDK 5.1 of Microsoft, the HTK in Cambridge is the speech recognition engine based on PC operating system, can not be used in the embedded OS.
Summary of the invention
The objective of the invention is to design portable oral translation system based on the WinCE platform, can be under the situation of embedded system resource-constrained, realize the recognition function of big vocabulary, and have higher discrimination, and realize from Chinese to English or English spoken two-way translation to Chinese.。
Another object of the present invention is to provide the audio recognition method of this translation system.
In order to realize the foregoing invention purpose, the present invention includes following technical characterictic: a kind of portable oral translation system based on the WinCE platform, it is characterized in that: comprise voice collecting device, voice pretreatment module, phonetic feature extraction and MBM, model bank, identification module, corpus and translation and phonetic synthesis module, all modules all are based upon on the embedded platform; Voice acquisition module is connected with the voice pretreatment module; The voice pretreatment module is extracted with phonetic feature and is connected with MBM; Phonetic feature extracts and is connected with model bank or identification module respectively with MBM; Described phonetic feature extracts and is connected with model bank by being chosen as physical training condition with MBM, by selecting status recognition, is connected with identification module; Identification module is connected with the phonetic synthesis module with translation; Translation is connected with corpus with the phonetic synthesis module; Described identification module obtains translating into text by translation and phonetic synthesis module after the optimal result through the decision-making judgement, and exports with speech form; Through speech selection, realize from Chinese to English or English spoken two-way translation to Chinese.
Described voice pretreatment module comprises successively the pre-emphasis unit that connects, divides frame processing unit, adds window unit and end-point detection unit; Pre-emphasis unit is connected with the voice collecting device, and the end-point detection unit extracts with phonetic feature and is connected with MBM;
Described pre-emphasis unit is a high boost pre-emphasis digital filter;
Frame processing unit taked the field overlapping to divide the frame mode to carry out the processing of branch frame in described minute;
The described window unit that adds adopts Hamming window function carry out windowization;
Described end-point detection unit adopts with short-time energy E and short-time average zero-crossing rate Z as the double threshold of feature relatively, and calculates zero-crossing rate threshold values Z according to quiet section
cT and height energy threshold are carried out the detection of end points as thresholding.
Described phonetic feature extracts with MBM and passes through to extract the MFCC phonetic feature as recognition feature; Adopt hidden Markov model as training and model of cognition; This hidden Markov model is made up of Markov chain and general random process;
Described hidden Markov model utilizes the forward-backward algorithm probabilistic algorithm to solve the valuation problem; Utilize the Viterbi algorithm to solve decoding problem; Utilize the Baum-Welch iterative algorithm to solve problem concerning study.
Be specially: utilize the forward-backward algorithm probabilistic algorithm, solve for the given λ of hidden Markov model system=(π, A, B), the observation sequence O=O that produces according to system
1, O
2..., O
TCalculate the problem of likelihood probability P (O/ λ).
Utilize the Viterbi algorithm, solve for the given λ of hidden Markov model system=(π, A, B), and the observation sequence O=O that produces by system
1, O
2..., O
T, search makes this system produce the status switch S=q of the most possible experience of this observation sequence
1, q
2... q
tProblem.
For the hidden Markov model system of the unknown, utilize the Baum-Welch iterative algorithm to come the estimation model parameter.
The present invention also comprises a kind of speech recognition method of the portable oral translation system based on the WinCE platform, it is characterized in that comprising the steps:
(1) hidden Markov model is trained the acquisition model parameter;
(2) phonetic feature that characteristic extracting module is obtained is as the observation sequence of hidden Markov model; The voice unit that training obtains is a status switch, solves the state transitions sequence by the Viterbi algorithm;
(3) adopt the decision-making judgement, obtain the state transitions sequence of maximum probability;
(4) go out candidate phoneme or syllable according to optimum condition sequence correspondence, form speech and sentence by language model at last.
The first initialization hidden Markov model of described step (1) parameter utilizes the Baum-Welch iterative algorithm to come the estimation model parameter then.
Described step (1) is utilized training algorithm to carry out repeatedly iteration and is obtained the result, also should provide the condition of a finishing iteration simultaneously, when the relative variation of this probability less than ε, the finishing iteration process in addition, is set maximum iteration time N, when iterations during greater than N, also stop iteration, and the Baum-Welch algorithm is adopted the method that increases scale factor, the data underflow problem of correction algorithm.
The present invention is a kind of portable oral translation system and speech recognition method thereof based on the WinCE platform, and its hardware core is a flush bonding processor, and embedded system has low cost, low-power consumption, high-performance, portable fine quality such as strong.In the voice pretreatment module, comprise pre-emphasis unit, divide frame processing unit, add window unit and end-point detection unit, by the voice signal that collects is anticipated, make that embedded system efficient when the later stage speech recognition is higher, recognition accuracy is also higher.Adopt hidden Markov model, Model Identification is carried out with it again in the training pattern storehouse, makes identifying precise and high efficiency more.The present invention compared with prior art has two-way translation, low-cost, low-power consumption, and high-performance, advantage such as portable strong, and have very big consumption market in the speech recognition system field.
Description of drawings
Fig. 1 is the composition synoptic diagram of hidden Markov model
Fig. 2 forward-backward algorithm algorithm synoptic diagram
Fig. 3 hidden Markov model parameter training process flow diagram
Fig. 4 does not have the hidden Markov model structure from left to right of leap
Fig. 5 hidden Markov model identifying;
Fig. 6 is module principle figure of the present invention;
Fig. 7 is the transition probability processing procedure of identification module of the present invention;
Fig. 8 is the corpus structural drawing of translation of the present invention and phonetic synthesis module.
Embodiment
The present invention is a kind of portable oral translation system based on the WinCE platform, design has realized a speech recognition system based on wince, embedded system has low cost, low-power consumption, fine qualities such as high-performance, its core are its flush bonding processor, at present, the little processing of ARM mainly comprises ARM7 series, ARM9 series, ARM9E series, ARM10E series, ARM11 series, and its function from strength to strength.The present invention uses embedded system scientific research platform UP-CPU 6410, adopts up-to-date S3C6410X (ARM11) embedded microprocessor of Samsung company, and its frequency reaches 633M, is a based on the ARM1176JZF-S core, adopts the processor of ARM v6 framework.
Module principle figure of the present invention as shown in Figure 6, voice signal by voice collecting device 1 microphone collection input, carry out pre-emphasis by 2 pairs of voice signals of voice pretreatment module, divide frame, windowing, processing such as end-point detection, what realize above-mentioned processing capacity is pre-emphasis unit 21, divide frame processing unit 22, add window unit 23 and end-point detection unit 24.Carrying out feature by the phonetic feature extraction with 3 pairs of voice messagings of MBM then carries and the training utterance model, phonetic feature extracts and is connected with model bank 4 or identification module 5 with MBM 3, read corpus 6 by translation and phonetic synthesis module 7, translate into the output of text and synthetic speech.
Respectively each modular unit that relates to is described below:
One, pre-emphasis unit 21
The average power spectra of voice signal is subjected to the influence of glottal excitation and mouth and nose radiation, front end is pressed 6dB/oct (octave) decay greatly more than 800Hz, the high more corresponding composition of frequency is more little, will be promoted its HFS before voice signal is analyzed for this reason.Therefore before being analyzed, adopts voice signal the high boost pre-emphasis digital filter processes voice signals of a 6dB/oct usually, realization is promoted its HFS, make the frequency spectrum of signal become smooth, remain on low frequency in the whole frequency band of high frequency, can ask frequency spectrum with same noise.The filter response function is:
H(z)=1-αz
-1,0.9≤α≤1.0
Wherein α is a pre emphasis factor, gets 0.9375 usually, like this, and the output of preemphasis network
The available difference equation of relation with the voice signal s (n) that imports
Expression.
Two, divide frame processing unit 22
Voice signal has time-varying characteristics, but in a short time range, its characteristic remains unchanged promptly relatively stable substantially, and this specific character of voice signal is called " short-time characteristic ", and this short section time is generally 10~30ms.So the analysis of voice signal and processing generally are based upon on the basis of " short-time characteristic ", promptly carry out " short-time analysis ", sound signal stream is adopted divide frame to handle.The frame number of general per second has
Decide on actual conditions.Divide frame both can adopt continuation mode, also can adopting overlaps divides the mode of frame, owing to have correlativity between the voice signal, adopts field to overlap among the present invention and divides the mode of frame.
Like this, for the voice signal of integral body, the characteristic parameter time series of forming by each frame characteristic parameter that analyzes.
Three, add window unit 23
Voice signal has stationarity in short-term, can carry out the branch frame to signal and handle.And be to realize near the speech waveform the sampling n in the voice signal is emphasized and the remainder of waveform is weakened, and then also will be to its windowing process.Each short section to voice signal is handled, and in fact is exactly that each short section is carried out certain conversion or imposed certain computing, and its general expression is:
T[wherein] represent certain conversion, it can be linear also can be non-linear, s (n) is an input speech signal series.Q
nIt is the time series that all each sections obtain after treatment.
Four, the end-point detection unit 24
End-point detection during voice signal is handled mainly is starting point and the end point in order to detect voice automatically.The present invention has adopted the double threshold relative method to carry out end-point detection.The double threshold relative method as feature, in conjunction with the advantage of Z and E, makes detection more accurate with short-time energy E and short-time average zero-crossing rate Z, the processing time of effective reduction system, improve the real-time of system handles, and can get rid of the noise of unvoiced segments, thus the recognition performance that improves.
In the double threshold relative method, short-time energy E and short-time average zero-crossing rate Z feature calculation are as follows respectively:
(1) short-time energy E
The short-time energy of voice signal s (n) is defined as:
Wherein ω (n) is the window function of Hamming window.
For following formula, if make h (n)=ω
2(n), then have:
Following formula represents as can be known, and the short-time energy of window transform is equivalent to signal with " voice square " by a linear filter output, and the unit-sample response of this wave filter is h (n).It realizes that block diagram is as follows:
The realization block diagram of short-time energy
For the short-time average energy E that with n is certain frame voice signal of sign
nFor:
(2) short-time average zero-crossing rate Z
The short-time average zero-crossing rate definition
Wherein ω (n) is a window function.
It realizes that block diagram is as follows:
The short time interval that voice signal begins is equally distributed ambient noise signal.When adopting the double threshold relative method to carry out end-point detection, need calculate zero-crossing rate threshold values Z according to " quiet " section of beginning
cT and height energy threshold ETL (low energy metered valve) and ETU (high energy metered valve) are used as thresholding, just can realize the accurate detection of end points.
Zero-crossing rate threshold values Z
cT=min (IF, Z
c+ 2* σ
Zc), wherein IF is an empirical value, the present invention gets IF=25; Z
c, σ
ZcBe respectively the average and the standard deviation of the zero-crossing rate of initial " quiet " section.
For ETL (low energy metered valve) and ETU (high energy metered valve), need calculate the short-time average energy of " quiet " section earlier, maximum energy value is designated as E
Max, minimum energy value is designated as E
MinOrder:
I1=0.03*(E
max-E
min)+E
min
I2=4*E
min
Then have:
ETL=min(I1,I2)
ETU=5*ETL
Utilize Z
cWhen T and ETL and ETU detected as thresholding, establishing start frame was N1, then the ENERGY E at N1 frame place
N1And zero-crossing rate Z
N1Satisfy ETU>E simultaneously
N1>ETL, E
N1+1>ETU, Z
N1>Z
cT; ENERGY E at end frame N2 place
N2And zero-crossing rate Z
N2Satisfy simultaneously
(adjusting coefficient k=4), Z
N1<Z
cT.
Adopt the double threshold relative method, combine the situation of other frame, can effectively avoid The noise, improve degree of detection, phonetic feature is extracted have high efficiency, be beneficial to the raising of discrimination.
Five, phonetic feature extracts and MBM 3
The extraction that the present invention adopts is based on the MFCC phonetic feature of the auditory properties feature as identification.(Mel-Frequency Cepstral Coefficients is to propose according to human auditory system's characteristic MFCC) to the Mel cepstrum coefficient, and anthropomorphic dummy's ear is to the perception of different frequency voice.People's ear is differentiated the process of sound frequency just as a kind of operation of taking the logarithm.For example: in the Mel frequency domain, the people is a linear relationship to the perception of tone, if the Mel difference on the frequency twice of two sections voice, then people's also poor twice in perception.
Wherein the MFCC algorithmic procedure of characteristic extracting module 3 is:
1. Fast Fourier Transform (FFT) (FFT):
X[n] (n=0,1,2 ..., N-1) a frame discrete voice sequence for obtaining through over-sampling, N is a frame length.X[k] the plural number series of ordering for N, again to X[k] delivery get the signal amplitude spectrum | X[k] |.
2. the actual frequency yardstick is converted to the Mel dimensions in frequency:
Mel (f) is the Mel frequency, and f is an actual frequency, and unit is Hz.
3. configuration triangle filter group and calculate each triangle filter signal amplitude is composed | X[k] | filtered output:
Wherein
w
l(k) be the filter factor of respective filter, o (l), c (l), h (l) be on the actual frequency coordinate axis respective filter lower frequency limit, centre frequency and upper limiting frequency, f
sBe sampling rate, L is a number of filter, and F (l) is filtering output.
4. the logarithm computing is done in all wave filter outputs, is further done discrete cosine transform (DTC) again, can obtain MFCC:
Q is the exponent number of MFCC parameter, generally gets 12, and M (i) is gained MFCC parameter.
Speech model of the present invention adopts hidden Markov model, hidden Markov model (HMM, HiddenMarkov Model) is a kind of statistical signal transaction module,, develops by Markov chain with probability model parametric representation, that be used to describe the statistics of random processes characteristic.Two ingredients of HMM: Markov chain: describe the transfer of state, describe with transition probability.The general random process: the relation between description state and observation sequence, to describe with the observed value probability, it is formed as Fig. 1.
The HMM model can be expressed as: λ=(N, M, π, A, B), wherein
N: Markov chain state number in the model.Remember that N state is θ
1..., θ
N, note t Markov chain state of living in constantly is q
t, obvious q
t∈ (θ
1..., θ
N).
M: the possible observed value number of each state correspondence.Remember that M observed value is V
1..., V
M, note t observed observation vector constantly is O
t, O wherein
t∈ (V
1..., V
M).
π: original state probability vector, π=(π
1..., π
N), π wherein
i=P (q
1=θ
i), 1≤i≤N.
A: state transition probability matrix, A=(a
Ij)
N * N, a
Ij=P (q
I+1=θ
j/ q
t=θ
i), 1≤i, j≤N are the transition probabilities that changes to state j from state i.
B: output probability matrix, B=(b
Ik)
N * M,
b
Ik=P (O
t=V
k/ q
t=θ
i), when representing to get the hang of i, 1≤i≤N, 1≤k≤M produce output V
kProbability.Because a
Ij, b
Ik, π
iAll be probability, therefore need satisfy normalizing condition: a
Ij〉=0, b
Ik〉=0, π
i〉=0
And
HMM relates to three problems:
1, valuation problem
A given λ of HMM system=(π, A, B), according to the observation sequence O=O of system's generation
1, O
2..., O
T, calculate likelihood probability P (O/ λ).To a fixing status switch S=q
1, q
2... q
t, the most basic theoretical calculation method is the probability addition with all possible status switch, promptly
But this method complexity is c
TT, therefore calculated amount is very big, adopts forward direction-back can solve this estimation problem in the identification effectively to algorithm, and calculated amount is c
2T.
Definition forward variable: a
t i=P (o
1o
2... o
t, q
t=i| λ) under the representation model λ, at moment t, observed events is O
t, state is the probability of i.Next forward variable computing formula constantly is:
The synoptic diagram of forward-backward algorithm algorithm as shown in Figure 2.
The definition back is to variable: β
t(i)=P (o
T+1o
T+2... o
T| q
t=i, λ) T is (o to the observed events sequence of moment t+1 backward from stopping constantly in expression
T+1o
T+2... o
T), and the state of t is the probability of i constantly.The back computing formula to variable of previous moment is:
The back is similar to the synoptic diagram and the forward direction method of algorithm, and just direction is opposite.
When utilizing forward direction probability and backward probability to calculate the valuation problem, concrete computing formula is as follows
2, decoding problem
A given λ of HMM system=(π, A, B), and the observation sequence O=O that produces by system
1, O
2..., O
T, search makes this system produce the status switch S=q of the most possible experience of this observation sequence
1, q
2... q
t, promptly find the solution and make P (S/O, λ) Zui Da status switch S.Because
And P (O/ λ) is all identical for all S, so decoding problem is equivalent to and finds the solution the status switch S that makes P (S, O/ λ) maximum.Decoding problem adopts the Viterbi algorithm to solve.
A status switch is looked in expression, and this status switch state when t is i, and the probable value maximum of the status switch of state i and front t-1 state formation, and the recursion formula of algorithm is:
3, problem concerning study
For the HMM system an of the unknown, according to the observation sequence O=O of system's generation
1, O
2..., O
T, how to determine that model λ=(π, A B), promptly find the solution and make system combined probability
Maximum model parameter π, A, B.Problem concerning study is corresponding to the parameter training process of HMM, have only observed data, lack description, select maximum likelihood probability usually as optimum target to state, be based upon on expectation maximization (EM) basis, adopt the Baum-Welch iterative algorithm to come the estimation model parameter.ξ
t(i, state was the probability of j when state was i and t+1 when j) representing t
ξ
t(i,j)=P(q
t=i,q
t+1=j|O,λ)
So the computing formula of state-transition matrix is:
The computing formula of output probability matrix is:
The process of HMM speech recognition of the present invention is specific as follows:
In speech recognition, the MFCC phonetic feature that is obtained by characteristic extracting module is the observation sequence of HMM model; State then is the voice unit that is obtained by training.Therefore, when building the HMM model and carrying out speech recognition, need obtain the HMM model parameter to the model training, training process of the present invention has obtained good training effect as shown in Figure 3.
In the training process, at first initialization HMM parameter utilizes the Baum-Welch iterative algorithm to come the estimation model parameter then.In actual applications, should utilize training algorithm to carry out repeatedly iteration and just can obtain the result, also should provide the condition of a finishing iteration simultaneously.When the relative variation of this probability less than ε, the finishing iteration process in addition, is set maximum iteration time N, when iterations during greater than N, also stops iteration, and the Baum-Welch algorithm is adopted the method that increases scale factor, the data underflow problem of correction algorithm.As shown in Figure 4, the HMM structure from left to right of the nothing leap of the present invention's employing.
As shown in Figure 5, after training the HMM model, utilize the MFCC feature, solve state transitions sequence P (O| λ in conjunction with the Viterbi algorithm
n) (n=1...M), final, adopt the decision-making judgement, obtain the state transitions sequence of maximum probability, as shown in Figure 5.λ according to optimum condition sequence correspondence provides candidate's syllable or sound mother then, forms speech and sentence by language model at last.
Concrete module realizes being described as follows:
Six, identification module 5:
As shown in Figure 7, identification module adopts the HMM model, calls the speech model of having trained in the model bank, mates with the input speech model.Be output as transition probability value P through the HMM template
i(i=0,1...i, i are the template number) is to transition probability P
iCompare, obtain maximum transition probability P value, export corresponding text message, just can obtain recognition result.
Owing in the large vocabulary speech recognition system, have a large amount of nearly sound speech, homonym, cause system recognition rate to reduce.For overcoming the influence of nearly sound speech, homonym, system handles the transition probability that mates the back generation, and its processing procedure as shown in Figure 1.Set the threshold value of transition probability
Work as P
i>P
TThe time, export corresponding text, otherwise give up the result.
By the transition probability threshold processing, effectively improved the discrimination of system.
Seven, translation and phonetic synthesis module:
Translation mainly is that latent state and corpus by identification module output are carried out match query with the phonetic synthesis module, and it is translated into text, adopts the TTS technology, exports with speech form.
Fig. 8 is the structural drawing of corpus.Corpus adopts the complex characteristic vector to set up.Definition phoneme proper vector V
Phoneme, have
V
phoneme=(No.,Phoneme)
Wherein, No. is the phoneme numbering, and Phoneme is a phoneme content.
Definition syllable characteristic vector V
Syllable, have
V
syllable=(No.,Syllable,No.
Word,G
P)
Wherein, No. is the syllable numbering, and Syllable is the syllable content, No.
WordBe word numbering, G
PBe the aligned phoneme sequence collection.
Definition word feature vector V
Word, have
V
Word=(No.,Word,Vector
W,Num
Phrase,No.
Phrase)
Wherein, No. is the word numbering, and Word is the word content, Vector
WBe the part of speech proper vector, and part of speech proper vector Vector
W=(n, v, num, pron, adj, adv), Num
PharseBe the phrase number based on this word, No.
PharseBe the phrase numbering.
Definition note vector V
TranHave
V
Tran=(No.,Tran
n,Tran
v,Tran
num,Tran
pron,Tran
adj,Tran
adv)
Wherein, No. is a numbering of note, Tran
n, Tran
v, Tran
Num, Tran
Pron, Tran
Adj, Tran
AdvBeing respectively part of speech is n, v, num, pron, adj, the note of adv.
In the corpus, certain incidence relation that some feature between the vector exists can come vector is striden the level inquiry by linked character, improves search efficiency.
In translation process, at first according to phoneme proper vector V
PhonemeObtain syllable characteristic vector V
SyllableThe information that is associated, and then to word feature vector V
WordInquire about, at last with note vector V
TranBe the result.
The fundamental purpose of phonetic synthesis is that the text that has translation to obtain is exported with speech form.Three main ingredients: text analysis model, rhythm generation module and acoustic module.As follows by its building-up process:
Text analyzing → rhythm generation → acoustic module
In conjunction with above-mentioned explanation, the present invention compared with prior art has two-way translation, low-cost, low-power consumption, and high-performance, advantage such as portable strong has very big consumption market in the speech recognition system field.
Claims (9)
1. portable oral translation system based on the WinCE platform, it is characterized in that: comprise voice collecting device, voice pretreatment module, phonetic feature extraction and MBM, model bank, identification module, corpus and translation and phonetic synthesis module, all modules all are based upon on the embedded platform; Voice acquisition module is connected with the voice pretreatment module; The voice pretreatment module is extracted with phonetic feature and is connected with MBM; Phonetic feature extracts and is connected with model bank or identification module respectively with MBM; Described phonetic feature extracts and is connected with model bank by being chosen as physical training condition with MBM, by selecting status recognition, is connected with identification module; Identification module is connected with the phonetic synthesis module with translation; Translation is connected with corpus with the phonetic synthesis module; Described identification module obtains translating into text by translation and phonetic synthesis module after the optimal result through the decision-making judgement, and exports with speech form; Through speech selection, realize from Chinese to English or English spoken two-way translation to Chinese.
2. the portable oral translation system based on the WinCE platform according to claim 1 is characterized in that: described voice pretreatment module comprises successively the pre-emphasis unit that connects, divides frame processing unit, adds window unit and end-point detection unit; Pre-emphasis unit is connected with the voice collecting device, and the end-point detection unit extracts with phonetic feature and is connected with MBM;
Described pre-emphasis unit is a high boost pre-emphasis digital filter;
Frame processing unit taked the field overlapping to divide the frame mode to carry out the processing of branch frame in described minute;
The described window unit that adds adopts Hamming window function carry out windowization;
Described end-point detection unit adopts short-time energy E and short-time average zero-crossing rate Z as the double threshold of feature relatively, and calculates zero-crossing rate threshold values ZcT and height energy threshold as thresholding according to quiet section, carries out the detection of end points.
3. the portable oral translation system based on the WinCE platform according to claim 2 is characterized in that: described phonetic feature extracts with MBM and passes through to extract the MFCC phonetic feature as recognition feature; Set up hidden Markov model and be training and model of cognition, this hidden Markov model is made up of Markov chain and general random process;
Described hidden Markov model utilizes the forward-backward algorithm probabilistic algorithm to solve the valuation problem, utilizes the Viterbi algorithm to solve decoding problem; Utilize the Baum-Welch iterative algorithm to solve problem concerning study.
4. the portable oral translation system based on the WinCE platform according to claim 3 is characterized in that:
Utilize the forward-backward algorithm probabilistic algorithm, solve for the given λ of hidden Markov model system=(π, A, B), the observation sequence O=O that produces according to system
1, O
2..., O
TCalculate the problem of likelihood probability P (O/ λ).
5. the portable oral translation system based on the WinCE platform according to claim 3 is characterized in that: utilize the Viterbi algorithm, solve for the given λ of hidden Markov model system=(π, A, B), and the observation sequence O=O that produces by system
1, O
2..., O
T, search makes this system produce the status switch S=q of the most possible experience of this observation sequence
1, q
2... q
tProblem.
6. the portable oral translation system based on the WinCE platform according to claim 3 is characterized in that: for the hidden Markov model system of the unknown, utilize the Baum-Welch iterative algorithm to come the estimation model parameter.
7. the speech recognition method of the portable oral translation system based on the WinCE platform according to claim 3 is characterized in that comprising the steps:
(1) hidden Markov model is trained the acquisition model parameter;
(2) phonetic feature that characteristic extracting module is obtained is as the observation sequence of hidden Markov model; The voice unit that training obtains is a status switch, solves the state transitions sequence by the Viterbi algorithm;
(3) adopt the decision-making judgement, obtain the state transitions sequence of maximum probability;
(4) go out candidate's syllable or sound mother according to optimum condition sequence correspondence, form speech and sentence by language model at last.
8. the speech recognition method of the portable oral translation system based on the WinCE platform according to claim 7, it is characterized in that: the first initialization hidden Markov model of described step (1) parameter, utilize the Baum-Welch iterative algorithm to come the estimation model parameter then.
9. the speech recognition method of the portable oral translation system based on the WinCE platform according to claim 8, it is characterized in that: described step (1) is utilized training algorithm to carry out repeatedly iteration and is obtained the result, also should provide the condition of a finishing iteration simultaneously, when the relative variation of this probability less than ε, the finishing iteration process, in addition, set maximum iteration time N, when iterations during greater than N, also stop iteration, and the Baum-Welch algorithm is adopted the method that increases scale factor, the data underflow problem of correction algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101605215A CN102237083A (en) | 2010-04-23 | 2010-04-23 | Portable interpretation system based on WinCE platform and language recognition method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101605215A CN102237083A (en) | 2010-04-23 | 2010-04-23 | Portable interpretation system based on WinCE platform and language recognition method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102237083A true CN102237083A (en) | 2011-11-09 |
Family
ID=44887672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010101605215A Pending CN102237083A (en) | 2010-04-23 | 2010-04-23 | Portable interpretation system based on WinCE platform and language recognition method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102237083A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663143A (en) * | 2012-05-18 | 2012-09-12 | 徐信 | System and method for audio and video speech processing and retrieval |
CN102789779A (en) * | 2012-07-12 | 2012-11-21 | 广东外语外贸大学 | Speech recognition system and recognition method thereof |
CN103811008A (en) * | 2012-11-08 | 2014-05-21 | 中国移动通信集团上海有限公司 | Audio frequency content identification method and device |
CN104123934A (en) * | 2014-07-23 | 2014-10-29 | 泰亿格电子(上海)有限公司 | Speech composition recognition method and system |
CN104834393A (en) * | 2015-06-04 | 2015-08-12 | 携程计算机技术(上海)有限公司 | Automatic testing device and system |
CN107170453A (en) * | 2017-05-18 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | Across languages phonetic transcription methods, equipment and computer-readable recording medium based on artificial intelligence |
CN108460027A (en) * | 2018-02-14 | 2018-08-28 | 广东外语外贸大学 | A kind of spoken language instant translation method and system |
CN110765868A (en) * | 2019-09-18 | 2020-02-07 | 平安科技(深圳)有限公司 | Lip reading model generation method, device, equipment and storage medium |
CN112329484A (en) * | 2020-11-06 | 2021-02-05 | 中国联合网络通信集团有限公司 | Translation method and device for natural language |
CN114398468A (en) * | 2021-12-09 | 2022-04-26 | 广东外语外贸大学 | Multi-language identification method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131709A1 (en) * | 2003-12-15 | 2005-06-16 | International Business Machines Corporation | Providing translations encoded within embedded digital information |
CN101008942A (en) * | 2006-01-25 | 2007-08-01 | 北京金远见电脑技术有限公司 | Machine translation device and method thereof |
CN101329667A (en) * | 2008-08-04 | 2008-12-24 | 深圳市大正汉语软件有限公司 | Intelligent translation apparatus of multi-language voice mutual translation and control method thereof |
-
2010
- 2010-04-23 CN CN2010101605215A patent/CN102237083A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050131709A1 (en) * | 2003-12-15 | 2005-06-16 | International Business Machines Corporation | Providing translations encoded within embedded digital information |
CN101008942A (en) * | 2006-01-25 | 2007-08-01 | 北京金远见电脑技术有限公司 | Machine translation device and method thereof |
CN101329667A (en) * | 2008-08-04 | 2008-12-24 | 深圳市大正汉语软件有限公司 | Intelligent translation apparatus of multi-language voice mutual translation and control method thereof |
Non-Patent Citations (2)
Title |
---|
苏牧 等: "一种基于电话的中英双向翻译系统", 《第七届全国人机语音通讯学术会议(NCMMSC7)论文集》 * |
魏力: "嵌入式语音识别系统的研究", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663143A (en) * | 2012-05-18 | 2012-09-12 | 徐信 | System and method for audio and video speech processing and retrieval |
CN102789779A (en) * | 2012-07-12 | 2012-11-21 | 广东外语外贸大学 | Speech recognition system and recognition method thereof |
CN103811008A (en) * | 2012-11-08 | 2014-05-21 | 中国移动通信集团上海有限公司 | Audio frequency content identification method and device |
CN104123934A (en) * | 2014-07-23 | 2014-10-29 | 泰亿格电子(上海)有限公司 | Speech composition recognition method and system |
CN104834393A (en) * | 2015-06-04 | 2015-08-12 | 携程计算机技术(上海)有限公司 | Automatic testing device and system |
CN107170453A (en) * | 2017-05-18 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | Across languages phonetic transcription methods, equipment and computer-readable recording medium based on artificial intelligence |
US10796700B2 (en) | 2017-05-18 | 2020-10-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Artificial intelligence-based cross-language speech transcription method and apparatus, device and readable medium using Fbank40 acoustic feature format |
CN108460027A (en) * | 2018-02-14 | 2018-08-28 | 广东外语外贸大学 | A kind of spoken language instant translation method and system |
CN110765868A (en) * | 2019-09-18 | 2020-02-07 | 平安科技(深圳)有限公司 | Lip reading model generation method, device, equipment and storage medium |
CN112329484A (en) * | 2020-11-06 | 2021-02-05 | 中国联合网络通信集团有限公司 | Translation method and device for natural language |
CN114398468A (en) * | 2021-12-09 | 2022-04-26 | 广东外语外贸大学 | Multi-language identification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
CN101944359B (en) | Voice recognition method facing specific crowd | |
CN106228977B (en) | Multi-mode fusion song emotion recognition method based on deep learning | |
CN103928023B (en) | A kind of speech assessment method and system | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
CN104200804B (en) | Various-information coupling emotion recognition method for human-computer interaction | |
CN102800316B (en) | Optimal codebook design method for voiceprint recognition system based on nerve network | |
CN101930735B (en) | Speech emotion recognition equipment and speech emotion recognition method | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
Dua et al. | GFCC based discriminatively trained noise robust continuous ASR system for Hindi language | |
CN103065629A (en) | Speech recognition system of humanoid robot | |
CN109192200B (en) | Speech recognition method | |
CN104123933A (en) | Self-adaptive non-parallel training based voice conversion method | |
Shanthi et al. | Review of feature extraction techniques in automatic speech recognition | |
CN114566189B (en) | Speech emotion recognition method and system based on three-dimensional depth feature fusion | |
Shanthi Therese et al. | Review of feature extraction techniques in automatic speech recognition | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN114842878A (en) | Speech emotion recognition method based on neural network | |
Mistry et al. | Overview: Speech recognition technology, mel-frequency cepstral coefficients (mfcc), artificial neural network (ann) | |
Barman et al. | State of the art review of speech recognition using genetic algorithm | |
CN104240699A (en) | Simple and effective phrase speech recognition method | |
Hu et al. | Speaker Recognition Based on 3DCNN-LSTM. | |
Thalengala et al. | Effect of time-domain windowing on isolated speech recognition system performance | |
CN113611285B (en) | Language identification method based on stacked bidirectional time sequence pooling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20111109 |