CN104078039A - Voice recognition system of domestic service robot on basis of hidden Markov model - Google Patents

Voice recognition system of domestic service robot on basis of hidden Markov model Download PDF

Info

Publication number
CN104078039A
CN104078039A CN201310102175.9A CN201310102175A CN104078039A CN 104078039 A CN104078039 A CN 104078039A CN 201310102175 A CN201310102175 A CN 201310102175A CN 104078039 A CN104078039 A CN 104078039A
Authority
CN
China
Prior art keywords
parameter
signal
voice signal
hidden markov
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310102175.9A
Other languages
Chinese (zh)
Inventor
刘治
苏敏发
谢杰腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201310102175.9A priority Critical patent/CN104078039A/en
Publication of CN104078039A publication Critical patent/CN104078039A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a voice recognition system of a domestic service robot on the basis of a hidden Markov model, and belongs to the field of voice recognition. The whole process of the system is composed of voice signal filtering, sampling, quantification, windowing, end point detection, feature extraction, model training and threshold value comparison. The filtering operation aims to filter low-frequency interference. Voice signals are continuous time-varying analog signals, and therefore sampling quantification must be carried out on the voice signals for obtaining discrete digital signals. Framing is carried out to enable original signals to become sectional signals, and operation of the framing is equivalent to adding a rectangular window to original signals in a time domain. Multiplication with the rectangular window in the time domain is equivalent to convolution between a signal frequency spectrum and Fourier transformation of the rectangular window in the frequency domain. End point detection is achieved through a double-threshold end point detection algorithm. Mel-frequency cepstrum coefficients are adopted by voice signal characteristic parameters, parameter training on the characteristic parameters is achieved through the hidden Markov model, matching is carried out with an established template base, and obtained results are compared with a threshold value to obtain a recognition result.

Description

Household service robot speech recognition system based on Hidden Markov Model (HMM)
Technical field
The invention belongs to speech recognition system field, be specifically related to a kind of voice signal model training and recognition methods based on Hidden Markov Model (HMM).
Background technology
Speech recognition is exactly to allow machine by identifying, the mankind's voice signal be changed into the process of corresponding text or order, its final purpose is exactly as interpersonal talk exchange of information, realize man-machine conversation freely, namely give machine with the sense of hearing, make machine can understand people's language, distinguish content and the speaker of speech, further make machine operate according to people's will, the mankind are freed from heavy or dangerous work.
The research of speech recognition technology relates to numerous subjects such as acoustics, linguistics, phonetics, physiological science, digital signal processing, communication theory, electronic technology, computer science, pattern-recognition and artificial intelligence widely, therefore a speech recognition system that recognition effect is good, needs to consider to comprise speaker's psychological condition, input equipment, the many-sided factor of the environment of speaking.
In recent years; in the very active problem of field of speech recognition, be the understanding of the natural language of the confidence level evaluation and test algorithm of Robust speech recognition, speaker adaptation technology, large vocabulary keyword recognizer, speech recognition, class-based language model and self-adaptation language model and profound level, the direction of research also more and more lays particular emphasis on spoken dialogue system.The research of speaker adaptation technology at present makes considerable headway, the technology that has occurred some comparative maturities, as sound channel normalization technology, the linear regression algorithm (MLLR, Maximum Likel ihood Linear Regression) of maximum likelihood, Bayes (Bayes) self-adaptation algorithm for estimating.And unspecified person, large vocabulary, continuous speech recognition are still the Focal point and difficult point of current stage the Research of Speech Recognition.
Speech recognition technology mainly comprises voice signal pre-service, characteristic parameter extraction, set up template base, recognition decision and the threshold value module such as relatively.Voice signal is from microphone input signal, and through pre-service, pre-service comprises pre-filtering, sampling and Quantifying, pre-emphasis, windowing and end-point detection; After pre-service, signal is carried out to characteristic parameter extraction, by extracted argument sequence, set up and preserve into voice parameterized template storehouse; Speech recognition process is that voice are inputted from microphone, through pre-service, characteristic parameter extraction, the characteristic parameter of extraction is carried out to probability calculation and mates with set up template base, and coupling is obtained a result and compared with threshold value, finally obtains recognition result.
Summary of the invention
The present invention is a kind of speech recognition system based on Hidden Markov Model (HMM) training, mainly by matlab, realizes system emulation.Voice signal first after filtering, sampling and Quantifying obtains discrete digital signal, is exactly then pre-emphasis, and the object of pre-emphasis is filtering low-frequency disturbance; Voice signal is a kind of typical non-stationary signal, there is time varying characteristic, thus by voice signal, divide frame operation, due to the effect of minute frame, signal is originally become sectional, this is the equal of just that original signal has been added to a rectangular window in time domain.In time domain, multiply each other and also with regard to being equivalent to the Fourier transform of the interior signal spectrum of frequency domain and rectangular window, carry out convolution with rectangular window, can do to each frame the processing of a windowing for this reason after minute frame, what in this patent, use is Hamming window; The object of end-point detection is the segment signal from comprising voice, to determine starting point and the terminal of voice, and find out accurately the starting point and ending point of voice segments, just likely making the data that collect is the voice signals that really will analyze, adopts double threshold end-point detection algorithm in this patent.Speech recognition is the process of a coupling, voice signal to input is analyzed, extract required feature, and set up matching template on extracted characteristic parameter basis, must carry out characteristic parameter extraction to voice signal for this reason, in this patent, adopt a kind of characteristic parameter that can fine reflection human auditory system mechanism, Mel frequency cepstrum coefficient (MFCC).The model training of voice signal is the core of speech recognition system, hidden Markov model (Hidden Markov Models, referred to as HMM) be a dual random process: one is reused in the statistical nature (transient state characteristic of signal can directly observe) of the steady section in short-term of describing non-stationary signal; Another heavy stochastic process described each in short-term steady section be how to be converted to next one steady section in short-term, i.e. the dynamic perfromance of statistical nature (lying in observation sequence) in short-term.People's speech process is also a kind of like this dual random process, with the production process of Hidden Markov Model (HMM) (HMM) description voice signal, is therefore point-device.
Accompanying drawing explanation
Fig. 1 speech recognition system identifying the general frame
Fig. 2 speech sound signal terminal point detects block diagram
Fig. 3 voice signal Hidden Markov Model (HMM) training block diagram
Embodiment
Before voice signal is processed, must carry out digitizing to it, this process is exactly that mould/number (A/D) transforms.Mould/number conversion process will and quantize two processes through over-sampling, thereby obtains the discrete digital signal in time and amplitude.According to nyquist sampling law, general sample frequency is more than the twice of original signal frequency, and just can make in sampling process not can drop-out, and can be from sampled signal the waveform of reconstruct original signal accurately.
1) voice signal pre-service
Before voice signal is analyzed, generally to be promoted to voice signal (pre-emphasis), object is filtering low-frequency disturbance, especially the power frequency of 50Hz or 60Hz is disturbed, the HFS that lifting is useful to speech recognition, allow the frequency spectrum of signal become smooth, thereby be convenient to carry out spectrum analysis or channel parameters analysis.Pre-emphasis is by a single order Hi-pass filter 1-0.9375z by voice signal -1, be conventionally referred to as preemphasis filter.Pre-emphasis filter transfer function is:
H(z)=1-0.9375z -1
If s (n) is the voice signal before pre-emphasis, the signal obtaining after preemphasis filter for:
s ‾ ( n ) = s ( n ) - 0.9375 s ( n - 1 )
Voice signal is a kind of non-stationary signal, there is time varying characteristic, but one (it is generally acknowledged at 10-30ms) in scope in short-term, its characteristic remains unchanged substantially, thereby can be seen as a metastable state process, therefore voice signal can be divided to frame operation.General frame number per second is about 33-100 frame, depends on the circumstances.Divide frame can adopt the method for contiguous segmentation, but generally will adopt the method for overlapping segmentation, this is to seamlessly transit between frame and frame in order to make, and keeps continuity.The overlapping part of former frame and a rear frame is called frame and moves.Frame moves with the ratio of frame length and is generally taken as 0-0.5.Due to the effect of minute frame, signal is originally become sectional, this is the equal of just in time domain, to have added a rectangular window at original signal.In time domain, multiply each other and also with regard to being equivalent to the Fourier transform of the interior signal spectrum of frequency domain and rectangular window, carry out convolution with rectangular window.This can change the frequency spectrum of original signal.After minute frame, to do to each frame the processing of a windowing for this reason.Thereby obtain windowing voice signal s (w):
s ( w ) = s ‾ ( n ) * w ( n )
In voice signal digital processing, conventional window function has Hanning window and Hamming window.In this patent, use Hamming window:
2) speech sound signal terminal point detects
The object that speech sound signal terminal point detects is from a segment signal, to determine that starting point and the terminal of voice, the correctness of end-point detection are also the prerequisites of phonetic recognization rate height.Because only have the correct starting point of finding out voice segments and terminal, just may make the data that collect is the voice signals that really will analyze.What this patent adopted is the end-point detection of double threshold relative method.According to the characteristic parameter of voice signal (energy and zero-crossing rate), carry out voiceless sound, noise differentiation exactly, and then complete end-point detection.The meaning of short-time average energy has been to provide the basis of distinguishing voiceless sound section and voiced segments, and this is because the short-time average energy value of voiceless sound section is significantly less than voiced segments, so utilize short-time average energy can divide the boundary of voiceless sound and voiced sound.Voice signal is divided to the short-time average energy that calculates every frame after frame, reset a thresholding, just can realize in theory a simple end-point detection algorithm.
The short-time energy definition of voice signal:
E n = Σ m = - ∞ ∞ T [ x ( m ) ] · w ( n - m ) = Σ m = - ∞ ∞ [ x ( m ) w ( n - m ) ] 2
= Σ m = n n + N - 1 x ( m ) 2 · h ( n - m ) = x 2 ( n ) * h ( n )
Wherein h (n)=w (n) is window function, and N is that window is long.
Short-time zero-crossing rate is a proper method of estimating sinusoidal frequency.When adjacent two sampled values of discrete signal have different symbols, just there is zero passage phenomenon.The number of times that in a common frame signal, waveform passes through zero level is called zero-crossing rate.The short-time zero-crossing rate definition of voice signal:
Zn = Σ m = - ∞ ∞ | sgn [ x ( n ) ] - sgn [ x ( n - 1 ) ] · w ( n - m )
Sgn[wherein] is-symbol function:
sgn = 1 , x ( n ) ≥ 0 0 , x ( n ) ≤ 0
The flow process of double threshold end-point detection algorithm: before starting to carry out speech sound signal terminal point detection, first for short-time average energy and zero-crossing rate are set respectively two thresholdings.One is lower thresholding, and its numerical value is less, more responsive to the variation of signal, is easy to be exceeded; Another is higher thresholding, and numeric ratio is larger, and signal reaches certain intensity, and this thresholding is just likely exceeded.Surpassing the not necessarily beginning of voice of low threshold, is likely also that noise in short-term causes, surpasses that high threshold thinks to be caused by voice signal.The end-point detection of voice signal is divided into four-stage: quiet section, transition section, voice segments, end.In program, with a variable, represent the residing state of current speech signal.At quiet section, if energy or zero-crossing rate have surpassed low threshold, with regard to beginning label starting point, enter transition section.In transition section, because the numerical value of parameter is smaller, whether can not determine that, really in voice segments, therefore the numerical value when two parameters all falls back to below low threshold, just current state is returned to quiet section.And if any one in two parameters surpassed high threshold in transition section, just can be sure of to have entered voice segments.Even if some paroxysmal noises also can cause that the numerical value of short-time energy or short-time zero-crossing rate is very high, but often can not maintain the sufficiently long time, therefore, we introduce the concept of shortest time thresholding again.Current state is when voice segments, and not only the numerical value of two parameters is reduced to below low threshold, and total length that clocks is less than the shortest time thresholding of setting, just thinks that this is one section of noise, continues the later voice signal of scanning.Otherwise with regard to the good end caps of mark, end-point detection finishes to return.
3) characteristic parameter extraction of voice signal
The characteristic parameter extraction of voice signal has several different methods, and linear predictor coefficient (LPC) is based on sound pronunciation mechanism, description be sound channel characteristic; Linear prediction cepstrum coefficient coefficient (LPCC) is based on the synthetic parameter of LPC.But these two kinds of parameters all do not make full use of the auditory properties of people's ear.People's auditory system is also a special nonlinear system in fact, and it is different to the susceptibility of the signal of different frequency, is a logarithmic relationship substantially.This patent adopts Mel frequency cepstrum coefficient (MFCC) to extract the characteristic parameter of voice signal.
Mel frequency cepstrum coefficient (MFCC)) be by the frequency spectrum of signal, first frequency axis be transformed to Mel frequency scale, then transform to cepstrum domain and obtain cepstrum coefficient.Mel is the unit of pitch, is the sensation of human auditory system to sound frequency, and the relation of Mel frequency scale and frequency is:
f mel = 2595 log 10 ( 1 + f 700 )
Wherein f is actual line resistant frequency, f melit is Mei Er frequency.
The computation process of MFCC characteristic parameter is as follows:
1. pair voice signal carries out pre-service, and windowing divides frame to be become short signal.
2. voice signal is dividing frame to become after short signal through windowing, with FFT, these time-domain signals s (w) is converted into frequency-region signal p (f), and can calculate thus its short-time energy spectrum p (w):
p(w)=|p(f)| 2=|X(e jw)| 2
3. p (w) is converted into the p (Mel) on Mei Er (Mel) coordinate by the frequency spectrum on frequency (Hz) axle, wherein Mel represents Mei Er frequency, and its transformational relation is:
f mel = 2595 log 10 ( 1 + f 700 )
4. V-belt bandpass filter is added on to Mei Er coordinate and obtains bank of filters H m(k), then calculate the output of this bank of filters of energy spectrum p (Mel) process on Mei Er (Mel) coordinate:
θ m ( k ) = 1 n [ Σ k = 1 K | X ( k ) | 2 H m ( k ) ] , k = 1,2 , . . . , K
Wherein k represents k wave filter, and K represents number of filter.H wherein m(k) represent k Mel bank of filters, its centre frequency from 0 to between Mel frequency distribution, centre frequency is f (m), m=1,2 ..., K, its formula is designed to:
H m ( k ) = 0 k < f ( m - 1 ) , k > f ( m + 1 ) k - f ( m - 1 ) f ( m ) - f ( m - 1 ) f ( m - 1 ) &le; k &le; f ( m ) f ( m + 1 ) - k f ( m + 1 ) - f ( m ) f ( m ) < k &le; f ( m + 1 )
θ m(k) represent the output energy of k wave filter, Mel frequency cepstrum C mel(n) in U.S.A, now spending the inverse discrete cosine transform (IDCT) that can adopt modification in spectrum tries to achieve:
C Mel ( n ) &Sigma; k = 1 K &theta; ( M k ) cos ( n ( k - 0.5 ) &pi; K ) &CenterDot; ( 1 &le; N &le; P = K / 2 )
5. the cepstrum parameter of standard only reflects the static characteristics of speech parameter, think that the voice between different frame are incoherent, in fact the physical condition being pronounced limits, between different frame, voice variation is continuous, relevant, so also use first order difference Mel cepstrum parameter in identification parameter, it is defined as:
d Mel ( n ) = 1 &Sigma; i = - k k i 2 &Sigma; - k k i &CenterDot; c ( n + i )
Wherein k is constant, generally gets 2, c, and d represents a frame speech parameter, in this patent, MFCC parameter and first order difference Mel cepstrum parameter is merged into a vector, as the parameter of a frame voice signal.
4) model training of phonic signal character parameter
Hidden Markov model (Hidden Markov Models is called for short HMM) is a dual random process: one is reused in the statistical nature of the steady section in short-term of describing non-stationary signal; Another retrace stated each in short-term steady section how to be converted to next one steady section in short-term, i.e. the dynamic perfromance of statistical nature in short-term.People's speech process is also a kind of like this dual random process.Because voice signal itself is an observable sequence, and it is by (not observable) in brain, according to speech, needs and the parameter of the phoneme (word, sentence) that the knowledge of grammar (condition selecting) is sent flows.Whole process and Hidden Markov Model (HMM) are substantially identical, so HMM can accurately describe the production process of voice signal.
A HMM model can be described by following parameters:
The status number of 1.N----model.Between state, connect each other, a state can be by other state transitions.Between state, also can there is other contact method.The set expression of state is S={S 1, S 2..., S n, t state representation is constantly q t.
2.M----observes symbolic number.The number of the observation symbol that each state may be exported.Observation assemble of symbol is expressed as
V={v 1,v 2...,v M}。
The length of 3.T----observation symbol.The observation symbol sebolic addressing that hidden Markov model produces is expressed as O={o 1, o 2..., o t.
4.A----state transition probability distributes.This is the matrix consisting of state transition probability, its element a ijrefer to that t moment state is S i, and constantly transfer to state S at t+1 jprobability, i.e. A={a ij, a ij=p[q t+1=S j| q t=S i] 1≤i, j≤N.
5.B----state S jobservation symbol probability distribute.It is state S jthe matrix that observation symbol probability forms, its element b j(k) refer to state S joutput observation symbol v kprobability, t is constantly in state S j,
B={b j(k)},
6. π----initial state distribution.When it refers to t=1, (initial time) is in certain shape probability of state.
Under actual conditions, observation density is usually continuous, thus in patent, adopt the HMM model with Continuous Observation density, and observation density function is mixed Gaussian density function, while adopting mixed Gaussian density function, the form of expression of the probability density function of observation density is:
b j ( o t ) = &Sigma; m = 1 M c jm N 0 ( o t , u jm , s jm ) , 1 &le; j &le; N
O wherein tthe measurement vector of model to be asked, here o tit is MFCC cepstral vectors; c jmm the mixing constant of state j, namely the hybrid gain factor; N oit is the density function of Gaussian distribution; U jmit is the mean value vector of m the mixed components of state j; S jmit is the covariance matrix of m the mixed components of state j.In fact o tcomponent substantially uncorrelated, so, S jmbecome diagonal form covariance matrix, b j(o t) can be expressed as:
b j ( o i ) &Sigma; m = 1 M c jm &Pi; d = 1 D { exp [ - ( o t ( d ) - u jmd ) 2 / ( 2 s jmd ) ] / 2 &pi; } ( &Pi; d = 1 D S jmd ) 1 / 2
Above formula should meet following statistical restraint condition:
&Sigma; m = 1 M c jm = 1 1 &le; j &le; N c jm &GreaterEqual; 0 1 &le; j &le; N , 1 &le; m &le; M &Integral; - &infin; &infin; b j ( x ) dx = 1 1 &le; j &le; N
Therefore, the complete definition of hybrid density HMM need to be selected following parameter value continuously:
State in N----model
Gaussian Mixture number in M----state
The dimension of each measurement vector of D----
π----initial state distribution probability
A----state transition probability;
C----hybrid gain matrix;
The Mean Matrix of μ----mixed components
The covariance matrix of U----mixed components
We are expressed as λ by the parameter sets of continuous hybrid density HMM model, and HMM model representation is that λ=(π, A, C, μ, U) HMM is applied to three problems that speech recognition must solve:
(1) evaluation problem.Known observation sequence O={o 1, o 2..., o tand model λ=(π, A, C, μ, U) how to calculate the probability P (O/ λ) that produces observation sequence O under the condition of setting models λ.Solving of evaluation problem makes us can select the model that given observation sequence mates most, and in this patent, adopting algorithm is forward-backward algorithm algorithm.
(2) problem identificatioin of optimum condition chain.Known observation sequence O={o 1, o 2..., o tand the status switch of corresponding best (the explanation observation sequence that can be best) of model λ=(π, A, C. μ, U) How to choose.What in this patent, adopt is Viterbi algorithm.
(3) Model Parameter Optimization problem.How adjustment model parameter lambda=(π, A, C, μ, U) is so that P (O/ λ) maximum.Be adjustment model parameter, make model can describe a given observation sequence, illustrate that best this observation sequence is exactly that optimal model generates.The algorithm adopting in this patent is Baum-Welch algorithm.
Forward-backward algorithm algorithm
Forward direction definition of probability is: α t(i)=p (o 1o 2o t, q t=i| λ), represent given HMM model parameter λ, Partial Observation sequence { o 1o 2... o tthe probability in state i constantly at t.
Forward direction probability α t(i) can calculate with recursion formula below:
(1) initialization
α 1(i)=π ib i(o 1) 1≤i≤N
(2) iterative computation
&alpha; t + 1 ( j ) [ &Sigma; i = 1 N &alpha; t ( i ) a ij ] b j ( o t + 1 ) , 1 &le; t &le; T - 1,1 &le; j &le; N
(3) stop calculating
P ( O | &lambda; ) = &Sigma; i = 1 N &alpha; T ( i )
Corresponding with forward direction probability, also have backward probability, definition backward probability is:
β t(i)=p (o t+1, o t+2... o t, q t=i| λ), represent given HMM mode input λ, observation sequence at t constantly in state i, system output observation sequence { o t+1, o t+2... o tprobability.Backward probability β t(i) also have similar recursion formula to calculate:
(1) initialization
β T(i)=1,1≤i≤N
(2) iterative computation
&beta; t ( i ) &Sigma; j = 1 N a ij b j ( o t + 1 ) &beta; t + 1 ( j ) , 1 &le; t &le; T - 1,1 &le; j &le; N
Utilize forward direction probability and backward probability to calculate output probability
Forward direction probability and backward probability are divided into whole observation sequence the output probability product of two Partial Observation sequences to the output probability of HMM model, and they have corresponding recursion formula separately, and output probability computing formula is:
p ( o | &lambda; ) = &Sigma; i = 1 N &alpha; T ( i )
= &Sigma; j = 1 N &alpha; t ( i ) &beta; t ( i ) , 1 &le; t &le; T - 1
= &Sigma; i = 1 N &Sigma; j = 1 N &alpha; t ( i ) a ij b j ( o t + 1 ) &beta; t + 1 ( j ) , 1 &le; t &le; T - 1
Best state chain is determined, for reducing a large amount of multiplication, calculates, and adopts the Viterbi algorithm of logarithmic form
Viterbi algorithm:
(1) pre-service
&pi; &OverBar; = log ( &pi; i ) , b &OverBar; i ( o t ) = log [ b i ( o t ) ] , a &OverBar; ij = log ( a ij )
(2) initialization
&delta; &OverBar; ( i ) = log [ &delta; 1 ] = &pi; &OverBar; + b &OverBar; i ( o t ) ,
(3) iterative computation
&delta; &OverBar; ( j ) = log [ &delta; i ( j ) ] = max 1 &le; t &le; m [ &delta; &OverBar; t - 1 ( j ) + a &OverBar; ij ] + b &OverBar; j ( o t )
(4) stop calculating
p &OverBar; * = max 1 &le; t &le; N [ &delta; &OverBar; T ( i ) ]
q T * = arg max 1 &le; t &le; N [ &delta; &OverBar; T ( i ) ]
(5) recall optimal path
Baum-Welch:
The basic thought of Baum-Welch algorithm is: the model λ making new advances from existing model λ ' estimation according to certain parameter revaluation formula, make p (o| λ ')≤p (o| λ), and with λ, replace λ '.Repeat said process until model parameter, in convergence state, has obtained Maximum Likelihood Model.So, how to construct such revaluation formula, make p (o| λ ')≤p (o| λ).Baum by proof by this problem be converted into dexterously find make auxiliary function Q (λ ', λ) maximized model λ because
Q ( &lambda; &prime; , &lambda; ) &GreaterEqual; Q ( &lambda; &prime; , &lambda; &prime; ) &DoubleRightArrow; P ( O | &lambda; ) &GreaterEqual; P ( O | &lambda; &prime; )
Wherein
Q ( &lambda; &prime; , &lambda; ) = &Sigma; q p ( o , q | &lambda; &prime; ) log p ( o , q | &lambda; )
p ( o , q | &lambda; ) = &pi; q 0 &Pi; t = 1 T a q t - 1 q t b q t ( o t )
By p (o, q| λ) substitution Q (λ ', λ) can obtain
Q ( &lambda; &prime; , &lambda; ) = Q &pi; ( &lambda; &prime; , &pi; ) + &Sigma; i = 1 N Q a i ( &lambda; &prime; , a i ) + &Sigma; j = 1 N Q b i ( &lambda; &prime; , b j )
Wherein
Q &pi; i ( &lambda; &prime; , &pi; ) = &Sigma; i = 1 N p ( O , q 0 = i | &lambda; &prime; ) log &pi; i
Q a i ( &lambda; &prime; , a i ) = &Sigma; j = 1 N &Sigma; t = 1 T p ( O , q t - 1 = i , q i = j | &lambda; &prime; ) log a ij
Q b i ( &lambda; &prime; , b i ) = &Sigma; t = 1 T p ( O , q t = i | &lambda; &prime; ) log b i ( O t )
= &Sigma; k = 1 K &Sigma; t = 1 T P ( O , q t = i | &lambda; &prime; ) log b i ( k ) &delta; ( O t , v k )
Wherein
&delta; ( O t , v k ) = 1 , if O t = v k 0 , else
In formula, parameter must meet following three constraint conditions:
&Sigma; j = 1 N &pi; j = 1
&Sigma; j = 1 N a ij = 1 , &ForAll; i
&Sigma; k = 1 K b j ( k ) = 1 , &ForAll; j
Can find out each individual event of auxiliary function all there is following form:
&Sigma; j = 1 N w j log y j , Variable { y j } j = 1 N Meet &Sigma; j = 1 N y j = 1
Known by mathematical derivation, variable in the situation that meeting constraint condition time, each individual event value is maximum.Each individual event maximizing to auxiliary function Q (λ, λ '), the maximized model of Q (λ, λ ') of sening as an envoy to of can deriving.

Claims (4)

1. the household service robot speech recognition system based on Hidden Markov Model (HMM), is characterized in that comprising the steps:
Step (1): input speech signal is carried out to filtering, be intended to filtering low-frequency disturbance;
Step (2): because voice signal is the simulating signal that consecutive hours becomes, the voice signal after filtering low-frequency disturbance carries out sampling and Quantifying and obtains discrete digital signal;
Step (3): it is sectional that a minute frame becomes original signal, be the equal of in original signal time domain, to have added a rectangular window, and multiply each other with rectangular window in time domain, with regard to being equivalent to the Fourier transform of signal spectrum and rectangular window in frequency domain, carry out convolution, so will carry out windowing process to voice signal;
Step (4): the voice signal after complete to windowing process carries out end-point detection, because the end points of correct detection voice signal is the prerequisite of carrying out speech recognition.
Step (5): the characteristic parameter to voice signal extracts, for the model training of lower step characteristic parameter is done basis;
Step (6): extracted phonic signal character parameter is carried out to model training by Hidden Markov Model (HMM) (HMM);
Step (7): set up the template base of voice signal, the characteristic parameter through Hidden Markov training is mated with template base, passing threshold comparison, finally obtains recognition result.
2. the household service robot speech recognition system based on Hidden Markov Model (HMM) according to claim 1, is characterized in that described step 4) the method that adopts of end-point detection be double threshold end-point detection algorithm.
3. the household service robot speech recognition system based on Hidden Markov Model (HMM) according to claim 1, it is characterized in that described step 5) the Mel frequency cepstrum parameter of standard only reflects the static characteristics of speech parameter, in fact the physical condition being pronounced limits, between different frame, voice variation is continuous, be correlated with, so also use first order difference Mel cepstrum parameter in identification parameter, it is defined as:
d Mel ( n ) = 1 &Sigma; i = - k k i 2 &Sigma; - k k i &CenterDot; c ( n + i )
Wherein k is constant, generally gets 2, c, and d represents a frame speech parameter, in use MFCC parameter and differential parameter is merged into a vector, as the parameter of a frame voice signal.
4. the household service robot speech recognition system based on Hidden Markov Model (HMM) according to claim 1, it is characterized in that described step 6) adopt Hidden Markov Model (HMM) to train the characteristic parameter extracting need to solve three problems, they are respectively problem identificatioin, the Model Parameter Optimization problem of evaluation problem, optimum condition chain; And solve the method that these three problems adopt, be respectively forward-backward algorithm algorithm, Viterbi algorithm and Baum-Welch algorithm.
CN201310102175.9A 2013-03-27 2013-03-27 Voice recognition system of domestic service robot on basis of hidden Markov model Pending CN104078039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310102175.9A CN104078039A (en) 2013-03-27 2013-03-27 Voice recognition system of domestic service robot on basis of hidden Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310102175.9A CN104078039A (en) 2013-03-27 2013-03-27 Voice recognition system of domestic service robot on basis of hidden Markov model

Publications (1)

Publication Number Publication Date
CN104078039A true CN104078039A (en) 2014-10-01

Family

ID=51599262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310102175.9A Pending CN104078039A (en) 2013-03-27 2013-03-27 Voice recognition system of domestic service robot on basis of hidden Markov model

Country Status (1)

Country Link
CN (1) CN104078039A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106373562A (en) * 2016-08-31 2017-02-01 黄钰 Robot voice recognition method based on natural language processing
CN106601234A (en) * 2016-11-16 2017-04-26 华南理工大学 Implementation method of placename speech modeling system for goods sorting
WO2017084360A1 (en) * 2015-11-17 2017-05-26 乐视控股(北京)有限公司 Method and system for speech recognition
CN106898350A (en) * 2017-01-16 2017-06-27 华南理工大学 A kind of interaction of intelligent industrial robot voice and control method based on deep learning
CN107680583A (en) * 2017-09-27 2018-02-09 安徽硕威智能科技有限公司 A kind of speech recognition system and method
CN108056865A (en) * 2017-12-01 2018-05-22 西安科技大学 A kind of multi-modal wheelchair brain control system and method based on cloud platform
CN108081266A (en) * 2017-11-21 2018-05-29 山东科技大学 A kind of method of the mechanical arm hand crawl object based on deep learning
CN109036387A (en) * 2018-07-16 2018-12-18 中央民族大学 Video speech recognition methods and system
CN109192200A (en) * 2018-05-25 2019-01-11 华侨大学 A kind of audio recognition method
CN109961775A (en) * 2017-12-15 2019-07-02 中国移动通信集团安徽有限公司 Accent recognition method, apparatus, equipment and medium based on HMM model
CN110008834A (en) * 2019-02-28 2019-07-12 中电海康集团有限公司 A kind of the steering wheel intervention detection and statistical method of view-based access control model
CN110058689A (en) * 2019-04-08 2019-07-26 深圳大学 A kind of smart machine input method based on face's vibration
CN111968620A (en) * 2019-05-20 2020-11-20 北京声智科技有限公司 Algorithm testing method and device, electronic equipment and storage medium
CN112004468A (en) * 2018-02-23 2020-11-27 波士顿科学国际有限公司 Method for evaluating vessels using continuous physiological measurements
CN112071307A (en) * 2020-09-15 2020-12-11 江苏慧明智能科技有限公司 Intelligent incomplete voice recognition method for elderly people
WO2021042537A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Voice recognition authentication method and system
CN112863546A (en) * 2021-01-21 2021-05-28 安徽理工大学 Belt conveyor health analysis method based on audio characteristic decision
CN113643692A (en) * 2021-03-25 2021-11-12 河南省机械设计研究院有限公司 PLC voice recognition method based on machine learning
CN113689633A (en) * 2021-08-26 2021-11-23 浙江力石科技股份有限公司 Scenic spot human-computer interaction method, device and system

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084360A1 (en) * 2015-11-17 2017-05-26 乐视控股(北京)有限公司 Method and system for speech recognition
CN106373562A (en) * 2016-08-31 2017-02-01 黄钰 Robot voice recognition method based on natural language processing
CN106601234A (en) * 2016-11-16 2017-04-26 华南理工大学 Implementation method of placename speech modeling system for goods sorting
CN106898350A (en) * 2017-01-16 2017-06-27 华南理工大学 A kind of interaction of intelligent industrial robot voice and control method based on deep learning
CN107680583A (en) * 2017-09-27 2018-02-09 安徽硕威智能科技有限公司 A kind of speech recognition system and method
CN108081266A (en) * 2017-11-21 2018-05-29 山东科技大学 A kind of method of the mechanical arm hand crawl object based on deep learning
CN108056865A (en) * 2017-12-01 2018-05-22 西安科技大学 A kind of multi-modal wheelchair brain control system and method based on cloud platform
CN109961775A (en) * 2017-12-15 2019-07-02 中国移动通信集团安徽有限公司 Accent recognition method, apparatus, equipment and medium based on HMM model
CN112004468A (en) * 2018-02-23 2020-11-27 波士顿科学国际有限公司 Method for evaluating vessels using continuous physiological measurements
CN112004468B (en) * 2018-02-23 2023-11-14 波士顿科学国际有限公司 Method for evaluating vessels using continuous physiological measurements
CN109192200A (en) * 2018-05-25 2019-01-11 华侨大学 A kind of audio recognition method
CN109192200B (en) * 2018-05-25 2023-06-13 华侨大学 Speech recognition method
CN109036387A (en) * 2018-07-16 2018-12-18 中央民族大学 Video speech recognition methods and system
CN110008834A (en) * 2019-02-28 2019-07-12 中电海康集团有限公司 A kind of the steering wheel intervention detection and statistical method of view-based access control model
CN110058689A (en) * 2019-04-08 2019-07-26 深圳大学 A kind of smart machine input method based on face's vibration
CN111968620A (en) * 2019-05-20 2020-11-20 北京声智科技有限公司 Algorithm testing method and device, electronic equipment and storage medium
WO2021042537A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Voice recognition authentication method and system
CN112071307A (en) * 2020-09-15 2020-12-11 江苏慧明智能科技有限公司 Intelligent incomplete voice recognition method for elderly people
CN112863546A (en) * 2021-01-21 2021-05-28 安徽理工大学 Belt conveyor health analysis method based on audio characteristic decision
CN113643692A (en) * 2021-03-25 2021-11-12 河南省机械设计研究院有限公司 PLC voice recognition method based on machine learning
CN113643692B (en) * 2021-03-25 2024-03-26 河南省机械设计研究院有限公司 PLC voice recognition method based on machine learning
CN113689633A (en) * 2021-08-26 2021-11-23 浙江力石科技股份有限公司 Scenic spot human-computer interaction method, device and system

Similar Documents

Publication Publication Date Title
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
CN109034046B (en) Method for automatically identifying foreign matters in electric energy meter based on acoustic detection
CN103065629A (en) Speech recognition system of humanoid robot
CN104050965A (en) English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
Cheng et al. Speech emotion recognition using gaussian mixture model
CN103646649A (en) High-efficiency voice detecting method
CN101887725A (en) Phoneme confusion network-based phoneme posterior probability calculation method
CN101226743A (en) Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
Chenchah et al. Speech emotion recognition in noisy environment
Archana et al. Gender identification and performance analysis of speech signals
CN108682432B (en) Speech emotion recognition device
Eringis et al. Improving speech recognition rate through analysis parameters
Chee et al. Automatic detection of prolongations and repetitions using LPCC
Maganti et al. Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms
Chen et al. InQSS: a speech intelligibility assessment model using a multi-task learning network
CN116597853A (en) Audio denoising method
MY An improved feature extraction method for Malay vowel recognition based on spectrum delta
Ardiana et al. Gender Classification Based Speaker’s Voice using YIN Algorithm and MFCC
Razak et al. Towards automatic recognition of emotion in speech
Chit et al. Myanmar continuous speech recognition system using fuzzy logic classification in speech segmentation
Komlen et al. Text independent speaker recognition using LBG vector quantization
Shome et al. Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech
Shahrul Azmi et al. Noise robustness of Spectrum Delta (SpD) features in Malay vowel recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: Guangdong University of Technology

Document name: Notification of Publication of the Application for Invention

DD01 Delivery of document by public notice

Addressee: Guangdong University of Technology

Document name: Notification of before Expiration of Request of Examination as to Substance

DD01 Delivery of document by public notice

Addressee: Guangdong University of Technology

Document name: Notification that Application Deemed to be Withdrawn

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141001