CA1308195C - Continuous speech recognition apparatus - Google Patents
Continuous speech recognition apparatusInfo
- Publication number
- CA1308195C CA1308195C CA000529863A CA529863A CA1308195C CA 1308195 C CA1308195 C CA 1308195C CA 000529863 A CA000529863 A CA 000529863A CA 529863 A CA529863 A CA 529863A CA 1308195 C CA1308195 C CA 1308195C
- Authority
- CA
- Canada
- Prior art keywords
- pattern
- dissimilarity measure
- input pattern
- time length
- vicinity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Image Analysis (AREA)
Abstract
ABSTRACT
A start point vicinity of an uttered input speech pattern is set. DP-matching is performed between the input speech pattern and a plurality of reference patterns obtained by connecting previously prepared reference patterns in the start point vicinity or a portion for the head word of the input speech pattern. A point within the start point vicinity is determined as a temporary start point. The dissimilarity proportional to the time length of the reference pattern is calculated and then converted into a value proportional to the time length of the input pattern from the temporary start point. The dissimilarity measure between the input speech pattern and the reference pattern on the second and the following digits are determined as a value proportional to the time length of the input speech pattern. The end time point of the input speech pattern is determined on the basis of the minimum value of a normalized dissimilarity measure by the time length of the input speech pattern.
A start point vicinity of an uttered input speech pattern is set. DP-matching is performed between the input speech pattern and a plurality of reference patterns obtained by connecting previously prepared reference patterns in the start point vicinity or a portion for the head word of the input speech pattern. A point within the start point vicinity is determined as a temporary start point. The dissimilarity proportional to the time length of the reference pattern is calculated and then converted into a value proportional to the time length of the input pattern from the temporary start point. The dissimilarity measure between the input speech pattern and the reference pattern on the second and the following digits are determined as a value proportional to the time length of the input speech pattern. The end time point of the input speech pattern is determined on the basis of the minimum value of a normalized dissimilarity measure by the time length of the input speech pattern.
Description
~30819S
CONTINUOUS SPEECH RECOGNITION APPARATUS
BACKGROUND OF THE INVENTION:
The present invention relates to a continuous speech recognition apparatus, and particularly to an improvement in speech recognition accuracy affected by start and end time points detection of the continuously uttered speech.
In order to recognize continuously uttered speech a method has been conventionally used in which a connected reference pattern obtained by connecting a plurality of word reference patterns is matched with an input pattern (continuous speech) by use of dynamic programming. An order number of the sequentially connected reference pattern is expressed as a "digit" hereinafter. As the reference patterns words, syllables, semiwords, or clauses may be used.
Thls method is based upon an assumption that the start and end polnts of the lnput pattern are prevlously determined by utilizing the power or spectrum of the input speech, but these are mistakenly detected in many cases due to a change in SN or a nolse effect. When thls mlstake detection occurs, a silent portion may be added to the skart or end portion of a word pattern in the input pattern, or the start or end portion of a word pattern in the input pattern may be cut, resulting in the likelihood of mistaken recognition.
~, .
08~9~i In view of reducing the effect of this kind of speech detection error, a method is described in pages 1318 to 1325 of "Connected Spoken Digit Recognition by O(n)DP Matching" in The Transactions of The Institute of Electronics and Communication Engineers of Japan Vol. J66-D, No. 11 (1983), in which the start and end points of the input speech are not predetermined, input patterns and reference patterns being matched from the vicinity of the start point (iSl to is2) to the vicinity of the end point (iel to ie2).
A conventional method will first be explained before this method is summarized.
A speech pattern A produced by continuous utterance is expressed as A = àl, a2, ...... ai, .... aI (1) whlch ls called an lnput pattern. On the other hand, a reference pattern Bn = bln, b2n, ... bjn, ... bJn (2) ls prepared for each word n, whlch is called a word reference pattern. DP-matching is performed between a connected speech reference pattern C = Bnl, Bn2, ..., BnX
obtained by connectlng X word reference patterns and the lnput pattern A to calculate a measure of dlfference (dlstance) between the two patterns. This difference is called "dissimilarity measure". A word sequence giving 130~3195 a minimum dissimilarity measure D(A, C) shown by the following equation is considered as a recognition result.
D(A, C) = min ~ ~ d(i,j)] (3) j = j (i) wherein the minimum dissimilarity measure is determined by a dynamic programming algorithm described below.
This algorithm is so-called VLB (Clockwise DP) algorithm.
Initial condition:
T(0,0) = 0 T~i,p) = oo, i ~ 0, p ~ 0 G(p,n,j) = oo (4) A recurrence formula (6) is successively calculated for each digit from i=l to I on the basis of the boundary conditions shown in an equation (5), wherein T(i,p) denotes a cumulative dissimilarity to the p-th digit when calculated to the i-th frame of the input pattern, which is called a dlgit dlsslmllarity measure. G(p,n,j) denotes a cumulative dlsslmllarlty measure to the j-th frame of word n on the (p~l)th digit, which is called temporary dissimilarity measure. For the n-th word of the reference pattern on the (p+l)th digit, under the boundary conditions:
G(p,n,0) = T~i-l,p) }
H(p,n,0) = i=l ~he following recurrence equations are calculated from j=l 1308~9S
., ~
to Jn (from the start point to the end point of the n-th reference pattern, ~ G(p,n,j) g(j) = d(j) +min ~ G(p,n,j-l) (6) l G(p,n,j-2) H(p,n,j) = H'(p,n,j) t7) where j is j' giving a minimum G~p,n,j') on the right side of the equation (6), H(p,n,j) indicates the start point of word n on the (p+l)th digit and is called temporary start point indicator, and H'(p,n,j) is H(p,n,j) at the frame prior to one frame of the input pattern. Thus obtained g(j) and H(p,n,j) are stored as G~p,n,j) and H(p,n,j) respectively; wherein d(j) is the distance between feature vector ai at input pattern time (frame) i and feature vector bjn at the n-th reference-pattern time (frame) j, whlch can be determined, for example, as Chebyshev dlstance:
d~j) = Dis(am, bjn) = k~ amk bj k¦ (8) For the sake of mlnimization at the boundary of words, equation (9) is calculated:
if T(i,p+l) G(p,n,Jn) then T(i,p+l) = G(p,n,Jn) ~ (9) N(i,p+l) = n L(i,p+l) = H(p,n,Jn) wherein L(i,p+l) is the p-th digit start point when calculated to the i-th frame of the input pattern.
.~ ~
That is to say, the recurrence equation (6) is calculated for each pair (p,n) on one digit along the reference pattern time axis and this calculation on one digit is performed to the end point i=I along the input pattern time axis.
Recognition results of the input pattern are obtained according to the following procedure:
Initial condition: p=X, i=I (10) Recognition word: n =N(i,p) (11) Word start point ~ =L(i,p).
If p~0, the processing of equation (11) is repeated under the conditions p =p-l and i=~. If p=0, the processing is completed.
As shown in the equation (3), the dissimilarity measure between the input pattern A and the reference pattern C is increased by distance d(i,j) therebetween for one frame calculation movement along the input pattern axis. This dissimilarity measure calculation is started, as shown in the equation (4), by adding the dissimilarity measure obtained at the first frame of input pattern under the condition that the digit dissimilarity measure at the 0-th frame T(0,0) is zero. Thus the cumulation to the I-th frame at the end of the lnput pattern is performed.
In this case, as shown in Fig. 1, the calculation of DP-matching is started after the start (i=l) and end (i=I) points have been previously set.
308i95 On the other hand, in the method described in the reference it is assumed that an initial value of digit dissimilarity measure at the temporary starting point i=l is T(0,0) and, if the start point i~l, the initial value is T(i,0) =d~xi in the vicinity of the temporary starting polnt iSl C i ~ iS2 as a penalty due to derivation from the start point i=l. This makes it possible to allow an unfixed start point in the matching such that the start point may be in the range of iSl C i ~ iS2.
However, if d~=0, the initial value becomes zero in the vicinity of the temporary starting point iSl ' i ~ iS2.
The times of addition of d(i,j) are reduced when the start point is nearer iS2. Conversely, if dg=co, the starting point is allowed to be just i=l and does not become a free ~tarting point. Thus, it is necessary to set d8 value at a value close to the average value of d(i,j), but the average value of d(i,j) depends upon the speaker and the words used, causlng a problem with respect to the difficulty in appropriate determination of d~.
SUMMARY OF THE INVENTION:
It is therefore an object of the present invention to provide a continuous speech recognition apparatus which enables hlghly precise DP-matching wlth unfixed starting point.
It is another object of the present invention to ~308195 provide a continuous speech recognition apparatus which exhibits an extremely reduced rate of recognition error due to speech detection error.
It is a further object of the present invention to pro~ide a continuous speech recognition apparatus which enables DP-matching with free end.
According to the present invention there is provided a continuous speech recognition apparatus in which a start point vicinity of an uttered input speech pattern is set, DP-matching is performed between the input speech pattern and a plurality of reference patterns obtained by connecting previously prepared reference patterns in the a portion for the head word of said input speech pattern, and a dissimilarity measure proportional to the time length of the reference pattern is calculated and then converted into,a value proportlonal to the time length of the input pattern from a temporary start point. A point within the ~tart positlon vicinlty is determined as the temporary start point. The dissimilarity measure between the input speech pattern and the reference pattern on the second and the followlng digits are determined as a value proportional to the time length of the input speech pattern. The end point of the input speech pattern is decided on the basis of the minimum value of a normalized dissimilarity measure by the time length of the input speech pattern.
`` ~308~9S
In summary, the present invention provldes, according to a first broad aspect, a continuous speech recognition apparatus comprising: an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern; a reference pattern output means for outputting a plurality of connected reference patterns obtained by connecting previously prepared reference patterns; a start vicinity setting means for setting a start vicinity of said input pattern; a first dissimilarity measure calculating means for calculating a first dissimilarity measure proportional to the time length of said reference pattern between sald input pattern and sald reference pattern in a portion for the head word of said input pattern; and a second dissimllarity measure calculating means for calculating a second dlssl~llarity measure proportional to the time length of said lnput pattern from a ternporary start polnt on the basis of said flrst di~slmllarlty measure, said temporary start point ranging withln said start vicinity.
According to a second broad aspect, the invention provides a continuous speech recognltlon apparatus comprlsingt an lnput pattern output means for outputtlng a slgnal of a continuously uttered speech as an input pattern; a reference pattern output means for outputtlng connected pattern of the reference patterns speclfled by a deflnlte-state automaton; a ~tart polnt matchlng means for determlnlng a dl~slmilarlty measure proportlonal to the tlme length of the reference pattern between sald reference pattern ln the lnltlal state of the deflnite-~tate automaton and a portlon for a head word of sald lnput pattern; a 7a ~:~08~9S
dissimilarity measure converting means for converting the dissimilarity measure obtained by said start point matching means into a dissimilarity measure proportional to the time length of said input pattern from a temporary start point, said temporary start point ranging within a vicinity of said input pattern; a main matching means for determining a dissimilarity measure proportional to the time length of said input pattern between said input pattern and said reference pattern in each of states other than said initial state of said definite-state automaton under that the converted dissimilarity measure is used as the boundary condition; an end polnt declsion means for determlning a point having a minimum value of normalized dissimilarlty measure normallzed by the time length of sald lnput pattern in an end vlclnlty as an end point; and a recognition means for obtaining recognition results by determinlng the sequence of said reference pattern along the DP-matchlng pa~s through whlch sald minimum normalized disslmllarlty measure ls obtalned, 7b i ~308~9S
Other objects and the features of the present invention will become clear from the following description with reference to the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS:
Fig. 1 is a drawing for explaining the conventional method of the speech recognition;
Fig. 2 is a drawing for explaining a continuous speech recognition method accQrding to the present invention;
Fig. 3 is a block diagram which shows an embodiment of the present invention;
Figs. 4 and 5 are partially detailed block diagrams of the block diagram shown in Fig. 3;
Fig. 6 is an operating time chart of the embodiment shown in Fig. 3;
,Figs. 7A to 7H are flow charts which show the operatlon flow of the embodlment shown in Fig. 3;
Flg. 8 ls a block diagram which shows another embodlment of the present lnvention;
Flg. 9 is an operating tlme chart of the embodlment shown ln Flg. 8; and Flgs. 10A to 10F are flow charts whlch show the operation of the embodiment shown ln Fig. 8.
PREFERRED EMBODIMENTS OF THE INVENTION:
The basic princlple of the present lnventlon will flrst be described.
- 9 - 1~ 18~95 As shown in Fig. 2, DP-matching is performed in the start vicinity, i = iSl, ..., iS2 of an input pattern by using recurrence equations. These equations develop a dissimilarity measure proportional to the time length of the reference pattern. The start vicinity is determined on the basis of a time point when the input speech signal level exceeds a certain threshold. A temporary start point ranges within the start vicinity. More specifically, under the initial condition:
10T(i,p) = oo if p=0 and isl ~ i - iS2 (12) T (i ,p) = O
DP-matching is performed for the first (head) word of the sentence in the following manner:
15Initial condition:
g(i-l,0) = T(i-1,0) ~ (13) h(i-l,0) = i-l J
Recurrence equation:
~ g(i,j-l) g(i,j) = d(i,j) + min ~ g(i-l, j-l) (14) l g(i-2, j-l) h(i,j) = h(i, j-l) (15) wherein i is i' giving the mlnimum g(i', j-l) on the right side of the equation (15).
The recurrence equation 18 calculated from i=l to I and from j=l to Jn.
- lo - 1308195 The resulting g(i, Jn) of DP-matching which has been calculated for the first word is a value proportional to the time length of the reference pattern. This g(i, Jn) is converted into a value G(i) proportional to the time length of the input pattern from the temporary start point.
Namely, assuming that the temporary start point of the input pattern is i=l even if the true starting point is not i=l, this value G(i) is obtained by an equation (16).
G(i) = 1 . g(i, Jn) (16) 0 Then, as regards minimization of the word boundary, if T(i,l) ~ G(i) then T(i,l) = G(i) ~ (17) N(i,l) = n L(i,l) = h(h,Jn) is calculated from i=l to I.
Here, g(l,Jn) is obtained by adding d(i,j) Jn times.
The equation (17) represçnts converslon into a value obtained by equivalently adding d(i,j) i times. G(i) is a value obtained under the case that the start point is i=l. Therefore, it becomes possible to connect the algorithm of the present invention to a conventional start point-fixed algorithm. In other words, DP-matching after the second digit is performed in the same way as in the conventional method.
Then, DP-matching with the reference pattern with - 11 130819s respect to words after the second one is performed in the same way as conventional matching. Namely, as matching of (p~l)th word, Initial condition:
g~i-l,0) = T(i-l,p) ~ (18) h(i-l,0) = i-l J
Recurrence equation:
~ g(i-l,j) g(i,j) = d(i,j) +min ~ g(i-l,j-l) (19) ~ g(i-l,j-2) h(i,j) = h(i-l ") (20) wherein ~ is j' giving the minimum g(i-l,j') on the right side of the equation (19), is calculated from i=l to I and j=l to Jn. Then, as regards the minimization 5 of the word boundary, for each word n, if T(i,p+l) g(i,Jn) then T(i,p+l) = g(i,Jn) ~ (21) N(i,p+l) = n L(i,p+l) = h(i,J ) J
0 is calculated from i=l to I.
For dètermining the end point, as shown in Fig. 1, the minimum normalized disslmilarity measure in the vicinity of the end point i =iel to ie2 on the final digit X is determined.
if V l/i T(i,X) then V = l/i-T(i,X) (22) Ie = i - 12 - 1308~95 is calculated for the vicinity of the end point i =iel to ie2. The vicinity of the end point is set in a similar manner to the setting of the vicinity of the start point. The resulting Ie is the end (terminating) time and V is the minimum normalized dissimilarity measure.
Therefore, the end point may be in the range of iel~ i~ ie2 and becomes an unfixed end point.
Finally, the sequence of the reference pattern is determined along the DP-matching path through which the minimum normalized dissimilarity measure is obtained, thereby obtaining the recognition result. This processing is performed by the following procedures:
Initial condition: p=x, i=Ie Recognition word: n=N(i,p) ~ ~23) Word starting point: 4=L(i,p) J
If pfo, assuming that p -p-l and i=~, the equation (23) is repeated. If p=0, the processing is completed.
An embodlment of the present invention will now be described with reference to the drawings. Fig. 3 ls a block diagram showing an embodiment of the present lnventlon; Figs. 4 and 5, block dlagrams showing the partially detailed arrangement of the embodiment shown in Fig. 3; Fig. 6, a time chart showing the time relationship of the operations in Fig. 3; and Figs. 7A
to 7H, flow charts showing the flow of the operations in Fig. 3.
There has been proposed a method known as the Level Building Method for calculating DP-matching between input pattern and connected patterns of reference patterns.
This method is described on pages 284 to 297 in IEEE
TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOL ASSP-29, No. 2, APRIL, 1981.
The present invention may be applied to this method (Level Building Method) and realizes DP-matching with unfixed start and end points, that i5, the start and end points-free DP-matching. An automaton control can be used for connecting the reference patterns and any word connections can be expressed by suitably setting state transition rules described by automatons.
In the embodiment, an automaton is difined as Automaton a = (K, ~, ~, P0, F) , where K: A set of states ~plp=1,2, ..... , ~
5; A set of input words n{n¦n=1,2l ..., N}
~: A state transition rules ~r(p,q,n)~
where r(p,q,n) means a transition pn~q P0: An initial state, expressed as p=0 hereinafter F: A set of final states FCK.
The reference pattern Bn of a word n contained in the set of words ~ is stored in a reference pattern memory 130. Designatlon information of the state transition rules r(p,q,n) and the final states F are stored in an automaton memory 230.
When an unknown input speech is input from a microphone 100, the input speech is subjected to frequency analysis by an input portion 110 to develop feature vectors ai. Thus developed feature vectors ai are successively sent into an input pattern memory 120.
The input portion 110 has a function to determine a speech section by detecting a speech level, a speech section signal SP which is "1" in the speech section and "0" in other sections being generated. A control portion 240 generates an initialization pulse SETl at a time ~a temporary start point) at which this speech section signal SP rises (Fig. 6). Therefore, the initialization corresponding to the equation (12) and block 10 in Fig. 7A is performed for a T-memory 200.
After the above-described lnitialization has been completed, the dissimilarity measure calculation ls started for ~p,n,q) at p=0 ~initial state) among the state transition rules r~p,q,n) in DP-matching portion 310. Firstly, the setting of the boundary conditions corresponding to the equation ~13) and block 11 in Fig. 7 ls conducted for a g-memory 330 and a h-memory 340, which are work memories for DP-matching, by a signal SET2 from
CONTINUOUS SPEECH RECOGNITION APPARATUS
BACKGROUND OF THE INVENTION:
The present invention relates to a continuous speech recognition apparatus, and particularly to an improvement in speech recognition accuracy affected by start and end time points detection of the continuously uttered speech.
In order to recognize continuously uttered speech a method has been conventionally used in which a connected reference pattern obtained by connecting a plurality of word reference patterns is matched with an input pattern (continuous speech) by use of dynamic programming. An order number of the sequentially connected reference pattern is expressed as a "digit" hereinafter. As the reference patterns words, syllables, semiwords, or clauses may be used.
Thls method is based upon an assumption that the start and end polnts of the lnput pattern are prevlously determined by utilizing the power or spectrum of the input speech, but these are mistakenly detected in many cases due to a change in SN or a nolse effect. When thls mlstake detection occurs, a silent portion may be added to the skart or end portion of a word pattern in the input pattern, or the start or end portion of a word pattern in the input pattern may be cut, resulting in the likelihood of mistaken recognition.
~, .
08~9~i In view of reducing the effect of this kind of speech detection error, a method is described in pages 1318 to 1325 of "Connected Spoken Digit Recognition by O(n)DP Matching" in The Transactions of The Institute of Electronics and Communication Engineers of Japan Vol. J66-D, No. 11 (1983), in which the start and end points of the input speech are not predetermined, input patterns and reference patterns being matched from the vicinity of the start point (iSl to is2) to the vicinity of the end point (iel to ie2).
A conventional method will first be explained before this method is summarized.
A speech pattern A produced by continuous utterance is expressed as A = àl, a2, ...... ai, .... aI (1) whlch ls called an lnput pattern. On the other hand, a reference pattern Bn = bln, b2n, ... bjn, ... bJn (2) ls prepared for each word n, whlch is called a word reference pattern. DP-matching is performed between a connected speech reference pattern C = Bnl, Bn2, ..., BnX
obtained by connectlng X word reference patterns and the lnput pattern A to calculate a measure of dlfference (dlstance) between the two patterns. This difference is called "dissimilarity measure". A word sequence giving 130~3195 a minimum dissimilarity measure D(A, C) shown by the following equation is considered as a recognition result.
D(A, C) = min ~ ~ d(i,j)] (3) j = j (i) wherein the minimum dissimilarity measure is determined by a dynamic programming algorithm described below.
This algorithm is so-called VLB (Clockwise DP) algorithm.
Initial condition:
T(0,0) = 0 T~i,p) = oo, i ~ 0, p ~ 0 G(p,n,j) = oo (4) A recurrence formula (6) is successively calculated for each digit from i=l to I on the basis of the boundary conditions shown in an equation (5), wherein T(i,p) denotes a cumulative dissimilarity to the p-th digit when calculated to the i-th frame of the input pattern, which is called a dlgit dlsslmllarity measure. G(p,n,j) denotes a cumulative dlsslmllarlty measure to the j-th frame of word n on the (p~l)th digit, which is called temporary dissimilarity measure. For the n-th word of the reference pattern on the (p+l)th digit, under the boundary conditions:
G(p,n,0) = T~i-l,p) }
H(p,n,0) = i=l ~he following recurrence equations are calculated from j=l 1308~9S
., ~
to Jn (from the start point to the end point of the n-th reference pattern, ~ G(p,n,j) g(j) = d(j) +min ~ G(p,n,j-l) (6) l G(p,n,j-2) H(p,n,j) = H'(p,n,j) t7) where j is j' giving a minimum G~p,n,j') on the right side of the equation (6), H(p,n,j) indicates the start point of word n on the (p+l)th digit and is called temporary start point indicator, and H'(p,n,j) is H(p,n,j) at the frame prior to one frame of the input pattern. Thus obtained g(j) and H(p,n,j) are stored as G~p,n,j) and H(p,n,j) respectively; wherein d(j) is the distance between feature vector ai at input pattern time (frame) i and feature vector bjn at the n-th reference-pattern time (frame) j, whlch can be determined, for example, as Chebyshev dlstance:
d~j) = Dis(am, bjn) = k~ amk bj k¦ (8) For the sake of mlnimization at the boundary of words, equation (9) is calculated:
if T(i,p+l) G(p,n,Jn) then T(i,p+l) = G(p,n,Jn) ~ (9) N(i,p+l) = n L(i,p+l) = H(p,n,Jn) wherein L(i,p+l) is the p-th digit start point when calculated to the i-th frame of the input pattern.
.~ ~
That is to say, the recurrence equation (6) is calculated for each pair (p,n) on one digit along the reference pattern time axis and this calculation on one digit is performed to the end point i=I along the input pattern time axis.
Recognition results of the input pattern are obtained according to the following procedure:
Initial condition: p=X, i=I (10) Recognition word: n =N(i,p) (11) Word start point ~ =L(i,p).
If p~0, the processing of equation (11) is repeated under the conditions p =p-l and i=~. If p=0, the processing is completed.
As shown in the equation (3), the dissimilarity measure between the input pattern A and the reference pattern C is increased by distance d(i,j) therebetween for one frame calculation movement along the input pattern axis. This dissimilarity measure calculation is started, as shown in the equation (4), by adding the dissimilarity measure obtained at the first frame of input pattern under the condition that the digit dissimilarity measure at the 0-th frame T(0,0) is zero. Thus the cumulation to the I-th frame at the end of the lnput pattern is performed.
In this case, as shown in Fig. 1, the calculation of DP-matching is started after the start (i=l) and end (i=I) points have been previously set.
308i95 On the other hand, in the method described in the reference it is assumed that an initial value of digit dissimilarity measure at the temporary starting point i=l is T(0,0) and, if the start point i~l, the initial value is T(i,0) =d~xi in the vicinity of the temporary starting polnt iSl C i ~ iS2 as a penalty due to derivation from the start point i=l. This makes it possible to allow an unfixed start point in the matching such that the start point may be in the range of iSl C i ~ iS2.
However, if d~=0, the initial value becomes zero in the vicinity of the temporary starting point iSl ' i ~ iS2.
The times of addition of d(i,j) are reduced when the start point is nearer iS2. Conversely, if dg=co, the starting point is allowed to be just i=l and does not become a free ~tarting point. Thus, it is necessary to set d8 value at a value close to the average value of d(i,j), but the average value of d(i,j) depends upon the speaker and the words used, causlng a problem with respect to the difficulty in appropriate determination of d~.
SUMMARY OF THE INVENTION:
It is therefore an object of the present invention to provide a continuous speech recognition apparatus which enables hlghly precise DP-matching wlth unfixed starting point.
It is another object of the present invention to ~308195 provide a continuous speech recognition apparatus which exhibits an extremely reduced rate of recognition error due to speech detection error.
It is a further object of the present invention to pro~ide a continuous speech recognition apparatus which enables DP-matching with free end.
According to the present invention there is provided a continuous speech recognition apparatus in which a start point vicinity of an uttered input speech pattern is set, DP-matching is performed between the input speech pattern and a plurality of reference patterns obtained by connecting previously prepared reference patterns in the a portion for the head word of said input speech pattern, and a dissimilarity measure proportional to the time length of the reference pattern is calculated and then converted into,a value proportlonal to the time length of the input pattern from a temporary start point. A point within the ~tart positlon vicinlty is determined as the temporary start point. The dissimilarity measure between the input speech pattern and the reference pattern on the second and the followlng digits are determined as a value proportional to the time length of the input speech pattern. The end point of the input speech pattern is decided on the basis of the minimum value of a normalized dissimilarity measure by the time length of the input speech pattern.
`` ~308~9S
In summary, the present invention provldes, according to a first broad aspect, a continuous speech recognition apparatus comprising: an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern; a reference pattern output means for outputting a plurality of connected reference patterns obtained by connecting previously prepared reference patterns; a start vicinity setting means for setting a start vicinity of said input pattern; a first dissimilarity measure calculating means for calculating a first dissimilarity measure proportional to the time length of said reference pattern between sald input pattern and sald reference pattern in a portion for the head word of said input pattern; and a second dissimllarity measure calculating means for calculating a second dlssl~llarity measure proportional to the time length of said lnput pattern from a ternporary start polnt on the basis of said flrst di~slmllarlty measure, said temporary start point ranging withln said start vicinity.
According to a second broad aspect, the invention provides a continuous speech recognltlon apparatus comprlsingt an lnput pattern output means for outputtlng a slgnal of a continuously uttered speech as an input pattern; a reference pattern output means for outputtlng connected pattern of the reference patterns speclfled by a deflnlte-state automaton; a ~tart polnt matchlng means for determlnlng a dl~slmilarlty measure proportlonal to the tlme length of the reference pattern between sald reference pattern ln the lnltlal state of the deflnite-~tate automaton and a portlon for a head word of sald lnput pattern; a 7a ~:~08~9S
dissimilarity measure converting means for converting the dissimilarity measure obtained by said start point matching means into a dissimilarity measure proportional to the time length of said input pattern from a temporary start point, said temporary start point ranging within a vicinity of said input pattern; a main matching means for determining a dissimilarity measure proportional to the time length of said input pattern between said input pattern and said reference pattern in each of states other than said initial state of said definite-state automaton under that the converted dissimilarity measure is used as the boundary condition; an end polnt declsion means for determlning a point having a minimum value of normalized dissimilarlty measure normallzed by the time length of sald lnput pattern in an end vlclnlty as an end point; and a recognition means for obtaining recognition results by determinlng the sequence of said reference pattern along the DP-matchlng pa~s through whlch sald minimum normalized disslmllarlty measure ls obtalned, 7b i ~308~9S
Other objects and the features of the present invention will become clear from the following description with reference to the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS:
Fig. 1 is a drawing for explaining the conventional method of the speech recognition;
Fig. 2 is a drawing for explaining a continuous speech recognition method accQrding to the present invention;
Fig. 3 is a block diagram which shows an embodiment of the present invention;
Figs. 4 and 5 are partially detailed block diagrams of the block diagram shown in Fig. 3;
Fig. 6 is an operating time chart of the embodiment shown in Fig. 3;
,Figs. 7A to 7H are flow charts which show the operatlon flow of the embodlment shown in Fig. 3;
Flg. 8 ls a block diagram which shows another embodlment of the present lnvention;
Flg. 9 is an operating tlme chart of the embodlment shown ln Flg. 8; and Flgs. 10A to 10F are flow charts whlch show the operation of the embodiment shown ln Fig. 8.
PREFERRED EMBODIMENTS OF THE INVENTION:
The basic princlple of the present lnventlon will flrst be described.
- 9 - 1~ 18~95 As shown in Fig. 2, DP-matching is performed in the start vicinity, i = iSl, ..., iS2 of an input pattern by using recurrence equations. These equations develop a dissimilarity measure proportional to the time length of the reference pattern. The start vicinity is determined on the basis of a time point when the input speech signal level exceeds a certain threshold. A temporary start point ranges within the start vicinity. More specifically, under the initial condition:
10T(i,p) = oo if p=0 and isl ~ i - iS2 (12) T (i ,p) = O
DP-matching is performed for the first (head) word of the sentence in the following manner:
15Initial condition:
g(i-l,0) = T(i-1,0) ~ (13) h(i-l,0) = i-l J
Recurrence equation:
~ g(i,j-l) g(i,j) = d(i,j) + min ~ g(i-l, j-l) (14) l g(i-2, j-l) h(i,j) = h(i, j-l) (15) wherein i is i' giving the mlnimum g(i', j-l) on the right side of the equation (15).
The recurrence equation 18 calculated from i=l to I and from j=l to Jn.
- lo - 1308195 The resulting g(i, Jn) of DP-matching which has been calculated for the first word is a value proportional to the time length of the reference pattern. This g(i, Jn) is converted into a value G(i) proportional to the time length of the input pattern from the temporary start point.
Namely, assuming that the temporary start point of the input pattern is i=l even if the true starting point is not i=l, this value G(i) is obtained by an equation (16).
G(i) = 1 . g(i, Jn) (16) 0 Then, as regards minimization of the word boundary, if T(i,l) ~ G(i) then T(i,l) = G(i) ~ (17) N(i,l) = n L(i,l) = h(h,Jn) is calculated from i=l to I.
Here, g(l,Jn) is obtained by adding d(i,j) Jn times.
The equation (17) represçnts converslon into a value obtained by equivalently adding d(i,j) i times. G(i) is a value obtained under the case that the start point is i=l. Therefore, it becomes possible to connect the algorithm of the present invention to a conventional start point-fixed algorithm. In other words, DP-matching after the second digit is performed in the same way as in the conventional method.
Then, DP-matching with the reference pattern with - 11 130819s respect to words after the second one is performed in the same way as conventional matching. Namely, as matching of (p~l)th word, Initial condition:
g~i-l,0) = T(i-l,p) ~ (18) h(i-l,0) = i-l J
Recurrence equation:
~ g(i-l,j) g(i,j) = d(i,j) +min ~ g(i-l,j-l) (19) ~ g(i-l,j-2) h(i,j) = h(i-l ") (20) wherein ~ is j' giving the minimum g(i-l,j') on the right side of the equation (19), is calculated from i=l to I and j=l to Jn. Then, as regards the minimization 5 of the word boundary, for each word n, if T(i,p+l) g(i,Jn) then T(i,p+l) = g(i,Jn) ~ (21) N(i,p+l) = n L(i,p+l) = h(i,J ) J
0 is calculated from i=l to I.
For dètermining the end point, as shown in Fig. 1, the minimum normalized disslmilarity measure in the vicinity of the end point i =iel to ie2 on the final digit X is determined.
if V l/i T(i,X) then V = l/i-T(i,X) (22) Ie = i - 12 - 1308~95 is calculated for the vicinity of the end point i =iel to ie2. The vicinity of the end point is set in a similar manner to the setting of the vicinity of the start point. The resulting Ie is the end (terminating) time and V is the minimum normalized dissimilarity measure.
Therefore, the end point may be in the range of iel~ i~ ie2 and becomes an unfixed end point.
Finally, the sequence of the reference pattern is determined along the DP-matching path through which the minimum normalized dissimilarity measure is obtained, thereby obtaining the recognition result. This processing is performed by the following procedures:
Initial condition: p=x, i=Ie Recognition word: n=N(i,p) ~ ~23) Word starting point: 4=L(i,p) J
If pfo, assuming that p -p-l and i=~, the equation (23) is repeated. If p=0, the processing is completed.
An embodlment of the present invention will now be described with reference to the drawings. Fig. 3 ls a block diagram showing an embodiment of the present lnventlon; Figs. 4 and 5, block dlagrams showing the partially detailed arrangement of the embodiment shown in Fig. 3; Fig. 6, a time chart showing the time relationship of the operations in Fig. 3; and Figs. 7A
to 7H, flow charts showing the flow of the operations in Fig. 3.
There has been proposed a method known as the Level Building Method for calculating DP-matching between input pattern and connected patterns of reference patterns.
This method is described on pages 284 to 297 in IEEE
TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOL ASSP-29, No. 2, APRIL, 1981.
The present invention may be applied to this method (Level Building Method) and realizes DP-matching with unfixed start and end points, that i5, the start and end points-free DP-matching. An automaton control can be used for connecting the reference patterns and any word connections can be expressed by suitably setting state transition rules described by automatons.
In the embodiment, an automaton is difined as Automaton a = (K, ~, ~, P0, F) , where K: A set of states ~plp=1,2, ..... , ~
5; A set of input words n{n¦n=1,2l ..., N}
~: A state transition rules ~r(p,q,n)~
where r(p,q,n) means a transition pn~q P0: An initial state, expressed as p=0 hereinafter F: A set of final states FCK.
The reference pattern Bn of a word n contained in the set of words ~ is stored in a reference pattern memory 130. Designatlon information of the state transition rules r(p,q,n) and the final states F are stored in an automaton memory 230.
When an unknown input speech is input from a microphone 100, the input speech is subjected to frequency analysis by an input portion 110 to develop feature vectors ai. Thus developed feature vectors ai are successively sent into an input pattern memory 120.
The input portion 110 has a function to determine a speech section by detecting a speech level, a speech section signal SP which is "1" in the speech section and "0" in other sections being generated. A control portion 240 generates an initialization pulse SETl at a time ~a temporary start point) at which this speech section signal SP rises (Fig. 6). Therefore, the initialization corresponding to the equation (12) and block 10 in Fig. 7A is performed for a T-memory 200.
After the above-described lnitialization has been completed, the dissimilarity measure calculation ls started for ~p,n,q) at p=0 ~initial state) among the state transition rules r~p,q,n) in DP-matching portion 310. Firstly, the setting of the boundary conditions corresponding to the equation ~13) and block 11 in Fig. 7 ls conducted for a g-memory 330 and a h-memory 340, which are work memories for DP-matching, by a signal SET2 from
2$ the control portion 240. Then, a reference pattern time signal j and an input pattern time signal il output from - 15 - 13~8195 the control portion 240 change from 1 to Jn and 1 to I, respectively. In each (i,j), the calculation of a recurrence equation corresponding to block 12 in Fig. 7B
is conducted in the start point DP-matching portion 310.
That is to say, the input pattern at a frame i and the n-th reference pattern at a frame j are read out from the input memory 120 and a reference memory 130 ana the distance d(i,j) between the feature vectors shown in the equatlon (8) is determined in a distance calculation portion 300. Then/ the recurrence equations shown by the equations ~14) and (15) are calculated in the start point DP-matching portion 310.
When the above-described calculations are completed, a dissimilarity measure g(j,Nn) at an end time point Jn of the reference pattern i9 obtained. Then, as a processing corresponding to block 13 in Fig. 7C, the dissimilarity measure g~i,Jn) is converted into a value proportional to an lnput pattern tlme length in a dissimilarity measure convertlng portion 350. In other words, the division ~hown in the equation ~16) is performed to obtain G~i).
Then, the comparlson -chown in block 14 in Fig. 7C
ls performed in accordance with the signal i2 generated from the control portion 240. Namely, the minimization of the word boundary shown in the equation ~17) is conducted. T~i,q) is read out from a table memory 200 in accordance with the signals i2 and q3 and compared with G(i) outputted from the dissimilarity measure converting portion 350 by a comparison circuit 170.
If T(i,q) ~ G(i), a signal Wp is generated and p, n, G(i) and h(i,Jn) are written into table memories 180, 190, 200 and 210 as P(i,q), N(i,q), T(i,q) and L(i,q), respectively.
After the above-described calculations are completed for r(p,q,n) at p=0 (initial state) among the state transition rules ~(p,q,n), the dissimilarity measure for r(p,q,n) at p~0 are calculated in a main DP-matching portion 320. The setting of boundary conditions corresponding to block 15 in Fig. 7D is performed for the g-memory 330 and h-memory 340, which are the DP-matching work memories, by the signal SET2 from the control portion 240.
In succession, a reference pattern time signal j and an lnput pattern time signal il outputted from the control portlon 240 change from 1 to Jn and 1 to I, respectively.
At each (1,;), the calculatlon of a recurrence equation corresponding to block 16 in Fig. 7E ls conducted in the maln DP-matching portion 320. That ls to say, the frame i of the lnput pattern and the frame j of the n-th reference pattern are read out and the dlstance d(i,j) between the feature vectors shown ln the equatlon (8) ls determined in a distance calculatlon portlon 300. Then, the recurrence equations shown by the equatlons (19) and 120) are calculated in the main DP-matching portion 320.
.
- 17 - 1308~95 When the above~described calculations are completed, a dissimilarity measure g(j,Jn) at an end time point Jn of the reference pattern is obtained.
Then, the comparison shown in ~lock 17 in Fig. 7F
is performed in accordance with the signal i2 generated from the control portion 240. Namely, the minimization of the word boundary shown in the equation (21) is conducted.
T(i,q) is read out from the table memory 200 in accordance with the signals i2 and p and compared with G(i) outputted from the difference converting portion 350 by a comparison circuit 170. If T(i,q) ~ G(i), a signal Wp is generated and G(i), n, p, and h~i,Jn) are written in the table memories 180, 190, 200 and 210 as P(i,q), N(i,q), T(i,q) and L(i,q), respectively.
By the above-described processing, the calculations are completed for ~p,q,n) at p~0 (inltial state) among the state transltion rules r(p,q,n) in response to a 5ignal R from the controller 240.
After the calculation of the dissimilarity measure have been completed for all the state transition rules r(p,q,n), an end time point is determined by an end decision portion 400. The end decision portion 400 is constructed as shown in Fig. 4 and performs the calculation corresponding to the equation (22) and block 18 in Fig. 7G.
T(i,q) is read out from the T-table memory 200 by a signal q3 indicating final state and a signal i3 indicating time 1 308~95 in the vicinity of the end point which are obtained from the control portion 240. me normalized dissimilarity measure l/i-T(i,q) is determined in a divider 401. The normalized dissimilarity measure l/i-T(i,q) is compared with the minimum normalized dissimilarity measure V, which has been previously determined, by a comparison circuit 403. If V >l/i~T(i,q), a signal WV is generated and l/i-T(I,q), i, and q are written in a V-register 402, a Ie-register 404, and a Qe-register 405, respectively.
The processing of all of q in the condition qeF is performed in the vicinity of the end point from i =iel to i =ie2 to obtain an end point Ie and a final state Qe in which the minimum normalized dissimilarity measure is obtained.
Finally, the recognition results are obtained on the basis of the DP-matching path through which the minimum normalized dissimilarity measure is developed in a result dlclslon portion 220. The result declslon portion 220 is constructed as shown in Fig. 5 and conducts the calculation corresponding to the equation (23) and block 19 in Fig.7H.
In the initlal condition in which a state is Qe and an end time polnt ls Ie~ a decision control portion 227 generates address signals i4 and q4 to table memories 190, 210, 180, assuming that i-Ie and q Qe~ and reads out N(i,q), L(i,q), and P(i,q) therefrom. This N(I,q) is output as a recognition result and L(i,q) and P(i,q) move to the ,.,, : ''' ~.,.,..,,,".,~ ,...
.. .. ::
- 19 - 1308~95 next i, q. This processing is repeated until the q becomes zero so that recognition results are successively obtained.
Fig. 8 is a block diagram showing another embodiment of the present invention, Fig. 9 is a time chart showing the relationship between the time and the operation in Fig. 8, and Figs. lOA to lOF are flow charts showing the flow of operations in Fig. 8.
This embodiment utilizes so-called a CWDP (Clo~se DP) methDd described in U.S. Pat. No. 4,555,796.
A difference from the first embodiment shown in Fig. 1 will be described below. The DP-matching work memories involve six memories comprising a gi-memory 331, a gi_l-memory 332, a gi_2-memory 333, a hi-memory 341, hl_l-memory 342, and a hi_2-memory 343. A dissimilarity measure ls calculated along the input pattern time axis i.
Namely, initlalization correspondlng to block 10 ln Flg. lOA ls performed for the T-memory 200 by the signal SETl from the control portion 240. In succession, for each input pattern tlme point i, the state transition rules r(p,n,q) are read out from an automaton memory 230.
If p=0, the dissimilarity measure is calculated in the start point DP-matching portion 310, and if p~0, the disslmilarity measure i9 calculated in the main DP-matching portion 320. A9 the calculation of a dissimilarity measure at the start point, the setting of the boundary conditions - .. :. .... , :
- 20 - ~308195 corresponding to block 11 in Fig. 10A is first performed by the signal SET2 from the control portion 240. Then, the reference pattern time signal j changes from 1 to Jn and the calculation of a recurrence equation corresponding to block 12 in Fig. 10B is performed in the start point DP-matching portion 310. When the reference pattern time becomes Jn, the conversion of dissimilarity measure corresponding to block 13 in Fig. 10B is performed in the dissimilarity measure conversion portion 350. Then, the minimization of the word boundary corresponding to block 14 in Fig. 10B is performed by using a comparison circuit 170. Finally, the replacement of the work memories corresponding to block 20 in Fig. 10D is performed by signal SET3 from the control portion 240.
On the other hand, for the calculation of dissimilarity measure at the tlme points other than the start point, the setting of the boundary conditions corresponding to block 15 in Fig. 10C is first performed by the signal SET2 from the control portion 240. Then, reference pattern tlme signals j change from 1 to Jn and the calculation of a recurrence equation corresponding to block 16 in Fig. 10C is performed in the main DP-matching portion 320.
When the reference pattern time point becomes Jn, the minimization of the word boundary corresponding to block 17 in Fig. 10C is performed by using a comparison circuit 170. Finally, the replacement of the work memories ...
- 21 - ~308195 corresponding to block 21 in Fig. lOD is performed by signal SET3 from the control portion 240.
The above-described processing is performed for any state transition rules r and the input pattern time i is determined from 1 to I. The end decision and the decision of recognition results are made in a similar manner to that in the first embodiment (Figs. lOE to lOF).
The present invention is described above with reference to the embodiments, but the above description does not limit the range of the scope of the present invention. The present invention can be applied to all of speed recognition system based on DP-matching.
is conducted in the start point DP-matching portion 310.
That is to say, the input pattern at a frame i and the n-th reference pattern at a frame j are read out from the input memory 120 and a reference memory 130 ana the distance d(i,j) between the feature vectors shown in the equatlon (8) is determined in a distance calculation portion 300. Then/ the recurrence equations shown by the equations ~14) and (15) are calculated in the start point DP-matching portion 310.
When the above-described calculations are completed, a dissimilarity measure g(j,Nn) at an end time point Jn of the reference pattern i9 obtained. Then, as a processing corresponding to block 13 in Fig. 7C, the dissimilarity measure g~i,Jn) is converted into a value proportional to an lnput pattern tlme length in a dissimilarity measure convertlng portion 350. In other words, the division ~hown in the equation ~16) is performed to obtain G~i).
Then, the comparlson -chown in block 14 in Fig. 7C
ls performed in accordance with the signal i2 generated from the control portion 240. Namely, the minimization of the word boundary shown in the equation ~17) is conducted. T~i,q) is read out from a table memory 200 in accordance with the signals i2 and q3 and compared with G(i) outputted from the dissimilarity measure converting portion 350 by a comparison circuit 170.
If T(i,q) ~ G(i), a signal Wp is generated and p, n, G(i) and h(i,Jn) are written into table memories 180, 190, 200 and 210 as P(i,q), N(i,q), T(i,q) and L(i,q), respectively.
After the above-described calculations are completed for r(p,q,n) at p=0 (initial state) among the state transition rules ~(p,q,n), the dissimilarity measure for r(p,q,n) at p~0 are calculated in a main DP-matching portion 320. The setting of boundary conditions corresponding to block 15 in Fig. 7D is performed for the g-memory 330 and h-memory 340, which are the DP-matching work memories, by the signal SET2 from the control portion 240.
In succession, a reference pattern time signal j and an lnput pattern time signal il outputted from the control portlon 240 change from 1 to Jn and 1 to I, respectively.
At each (1,;), the calculatlon of a recurrence equation corresponding to block 16 in Fig. 7E ls conducted in the maln DP-matching portion 320. That ls to say, the frame i of the lnput pattern and the frame j of the n-th reference pattern are read out and the dlstance d(i,j) between the feature vectors shown ln the equatlon (8) ls determined in a distance calculatlon portlon 300. Then, the recurrence equations shown by the equatlons (19) and 120) are calculated in the main DP-matching portion 320.
.
- 17 - 1308~95 When the above~described calculations are completed, a dissimilarity measure g(j,Jn) at an end time point Jn of the reference pattern is obtained.
Then, the comparison shown in ~lock 17 in Fig. 7F
is performed in accordance with the signal i2 generated from the control portion 240. Namely, the minimization of the word boundary shown in the equation (21) is conducted.
T(i,q) is read out from the table memory 200 in accordance with the signals i2 and p and compared with G(i) outputted from the difference converting portion 350 by a comparison circuit 170. If T(i,q) ~ G(i), a signal Wp is generated and G(i), n, p, and h~i,Jn) are written in the table memories 180, 190, 200 and 210 as P(i,q), N(i,q), T(i,q) and L(i,q), respectively.
By the above-described processing, the calculations are completed for ~p,q,n) at p~0 (inltial state) among the state transltion rules r(p,q,n) in response to a 5ignal R from the controller 240.
After the calculation of the dissimilarity measure have been completed for all the state transition rules r(p,q,n), an end time point is determined by an end decision portion 400. The end decision portion 400 is constructed as shown in Fig. 4 and performs the calculation corresponding to the equation (22) and block 18 in Fig. 7G.
T(i,q) is read out from the T-table memory 200 by a signal q3 indicating final state and a signal i3 indicating time 1 308~95 in the vicinity of the end point which are obtained from the control portion 240. me normalized dissimilarity measure l/i-T(i,q) is determined in a divider 401. The normalized dissimilarity measure l/i-T(i,q) is compared with the minimum normalized dissimilarity measure V, which has been previously determined, by a comparison circuit 403. If V >l/i~T(i,q), a signal WV is generated and l/i-T(I,q), i, and q are written in a V-register 402, a Ie-register 404, and a Qe-register 405, respectively.
The processing of all of q in the condition qeF is performed in the vicinity of the end point from i =iel to i =ie2 to obtain an end point Ie and a final state Qe in which the minimum normalized dissimilarity measure is obtained.
Finally, the recognition results are obtained on the basis of the DP-matching path through which the minimum normalized dissimilarity measure is developed in a result dlclslon portion 220. The result declslon portion 220 is constructed as shown in Fig. 5 and conducts the calculation corresponding to the equation (23) and block 19 in Fig.7H.
In the initlal condition in which a state is Qe and an end time polnt ls Ie~ a decision control portion 227 generates address signals i4 and q4 to table memories 190, 210, 180, assuming that i-Ie and q Qe~ and reads out N(i,q), L(i,q), and P(i,q) therefrom. This N(I,q) is output as a recognition result and L(i,q) and P(i,q) move to the ,.,, : ''' ~.,.,..,,,".,~ ,...
.. .. ::
- 19 - 1308~95 next i, q. This processing is repeated until the q becomes zero so that recognition results are successively obtained.
Fig. 8 is a block diagram showing another embodiment of the present invention, Fig. 9 is a time chart showing the relationship between the time and the operation in Fig. 8, and Figs. lOA to lOF are flow charts showing the flow of operations in Fig. 8.
This embodiment utilizes so-called a CWDP (Clo~se DP) methDd described in U.S. Pat. No. 4,555,796.
A difference from the first embodiment shown in Fig. 1 will be described below. The DP-matching work memories involve six memories comprising a gi-memory 331, a gi_l-memory 332, a gi_2-memory 333, a hi-memory 341, hl_l-memory 342, and a hi_2-memory 343. A dissimilarity measure ls calculated along the input pattern time axis i.
Namely, initlalization correspondlng to block 10 ln Flg. lOA ls performed for the T-memory 200 by the signal SETl from the control portion 240. In succession, for each input pattern tlme point i, the state transition rules r(p,n,q) are read out from an automaton memory 230.
If p=0, the dissimilarity measure is calculated in the start point DP-matching portion 310, and if p~0, the disslmilarity measure i9 calculated in the main DP-matching portion 320. A9 the calculation of a dissimilarity measure at the start point, the setting of the boundary conditions - .. :. .... , :
- 20 - ~308195 corresponding to block 11 in Fig. 10A is first performed by the signal SET2 from the control portion 240. Then, the reference pattern time signal j changes from 1 to Jn and the calculation of a recurrence equation corresponding to block 12 in Fig. 10B is performed in the start point DP-matching portion 310. When the reference pattern time becomes Jn, the conversion of dissimilarity measure corresponding to block 13 in Fig. 10B is performed in the dissimilarity measure conversion portion 350. Then, the minimization of the word boundary corresponding to block 14 in Fig. 10B is performed by using a comparison circuit 170. Finally, the replacement of the work memories corresponding to block 20 in Fig. 10D is performed by signal SET3 from the control portion 240.
On the other hand, for the calculation of dissimilarity measure at the tlme points other than the start point, the setting of the boundary conditions corresponding to block 15 in Fig. 10C is first performed by the signal SET2 from the control portion 240. Then, reference pattern tlme signals j change from 1 to Jn and the calculation of a recurrence equation corresponding to block 16 in Fig. 10C is performed in the main DP-matching portion 320.
When the reference pattern time point becomes Jn, the minimization of the word boundary corresponding to block 17 in Fig. 10C is performed by using a comparison circuit 170. Finally, the replacement of the work memories ...
- 21 - ~308195 corresponding to block 21 in Fig. lOD is performed by signal SET3 from the control portion 240.
The above-described processing is performed for any state transition rules r and the input pattern time i is determined from 1 to I. The end decision and the decision of recognition results are made in a similar manner to that in the first embodiment (Figs. lOE to lOF).
The present invention is described above with reference to the embodiments, but the above description does not limit the range of the scope of the present invention. The present invention can be applied to all of speed recognition system based on DP-matching.
Claims (8)
1. A continuous speech recognition apparatus comprising:
an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern;
a reference pattern output means for outputting a plurality of connected reference patterns obtained by connecting previously prepared reference patterns;
a start vicinity setting means for setting a start vicinity of said input pattern;
a first dissimilarity measure calculating means for calculating a first dissimilarity measure porportional to the time length of said reference pattern between said input pattern and said reference pattern in a portion for the head word of said input pattern; and a second dissimilarity measure calculating means for calculating a second dissimilarity measure proportional to the time length of said input pattern from a temporary start point on the basis of said first dissimilarity measure, said temporary start point ranging within said start vicinity.
an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern;
a reference pattern output means for outputting a plurality of connected reference patterns obtained by connecting previously prepared reference patterns;
a start vicinity setting means for setting a start vicinity of said input pattern;
a first dissimilarity measure calculating means for calculating a first dissimilarity measure porportional to the time length of said reference pattern between said input pattern and said reference pattern in a portion for the head word of said input pattern; and a second dissimilarity measure calculating means for calculating a second dissimilarity measure proportional to the time length of said input pattern from a temporary start point on the basis of said first dissimilarity measure, said temporary start point ranging within said start vicinity.
2. A continuously speech recognition apparatus according to Claim 1, further comprising a third dissimilarity measure calculating means for calculating a third dissimilarity measure between said input pattern and said reference pattern on the second and following digits, which is proportional to the time length of said input pattern based upon said second dissimilarity measure as a boundary condition, said digit being the connection order number of said connected reference pattern.
3. A continuous speech recognition apparatus according to Claim 2, further comprising an end vicinity setting means for determining the end vicinity of said input pattern and an end decision means for deciding an end point by using a minimum normalized dissimilarity measure normalized by the time length of said input pattern in said obtained vicinity of the end.
4. A continuous speech recognition apparatus according to Claim 3, further comprising a recognition means for outputting said connected reference pattern giving minimum normalized dissimilarity measure as a recognition result.
5. A continuous speech recognition apparatus according to Claim 1, wherein said start vicinity setting means is a means for setting as said start vicinity a portion having a given time length before and after the temporary start point at which the level of said uttered speech becomes greater than a given value.
6. A continuous speech recognition apparatus according to Claim 3, wherein said end vicinity setting means is a means for setting a portion having a given time length before and after the time point at which the level of said uttered speech becomes smaller than a given value as said vicinity of the end point.
7. A continuous speech recognition apparatus according to Claim 1, wherein said conversion in said second dissimilarity measure calculating means is performed in accordance with the following equation:
G(i) = where G(i) denotes the second dissimilarity measure;
Jn, the time length of the reference pattern; i, the time length of the input pattern; and g(i,Jn), the first dissimilarity measure obtained at the end time of a reference pattern n.
G(i) = where G(i) denotes the second dissimilarity measure;
Jn, the time length of the reference pattern; i, the time length of the input pattern; and g(i,Jn), the first dissimilarity measure obtained at the end time of a reference pattern n.
8. A continuous speech recognition apparatus comprising:
an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern;
a reference pattern output means for outputting connected pattern of the reference patterns specified by a definite state automaton;
a start point matching means for determining a dissimilarity measure proportional to the time length of the reference pattern between said reference pattern in the initial state of the definite-state automaton and a portion for a head word of said input pattern;
a dissimilarity measure converting means for converting the dissimilarity measure obtained by said start point matching means into a dissimilarity measure proportional to the time length of said input pattern from a temporary start point, said temporary start point ranging within a vicinity of said input pattern;
a main matching means for determining a dissimilarity measure proportional to the time length of said input pattern between said input pattern and said reference pattern in each of states other than said initial state of said definite-state automaton under that the converted dissimilarity measure is used as the boundary condition;
an end point decision means for determining a point having a minimum value of normalized dissimilarity measure normalized by the time length of said input pattern in an end vicinity as an end point; and a recognition means for obtaining recognition results by determining the sequence of said reference pattern along the DP-matching pass through which said minimum normalized dissimilarity measure is obtained.
an input pattern output means for outputting a signal of a continuously uttered speech as an input pattern;
a reference pattern output means for outputting connected pattern of the reference patterns specified by a definite state automaton;
a start point matching means for determining a dissimilarity measure proportional to the time length of the reference pattern between said reference pattern in the initial state of the definite-state automaton and a portion for a head word of said input pattern;
a dissimilarity measure converting means for converting the dissimilarity measure obtained by said start point matching means into a dissimilarity measure proportional to the time length of said input pattern from a temporary start point, said temporary start point ranging within a vicinity of said input pattern;
a main matching means for determining a dissimilarity measure proportional to the time length of said input pattern between said input pattern and said reference pattern in each of states other than said initial state of said definite-state automaton under that the converted dissimilarity measure is used as the boundary condition;
an end point decision means for determining a point having a minimum value of normalized dissimilarity measure normalized by the time length of said input pattern in an end vicinity as an end point; and a recognition means for obtaining recognition results by determining the sequence of said reference pattern along the DP-matching pass through which said minimum normalized dissimilarity measure is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA000529863A CA1308195C (en) | 1987-02-17 | 1987-02-17 | Continuous speech recognition apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA000529863A CA1308195C (en) | 1987-02-17 | 1987-02-17 | Continuous speech recognition apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CA1308195C true CA1308195C (en) | 1992-09-29 |
Family
ID=4134985
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA000529863A Expired - Fee Related CA1308195C (en) | 1987-02-17 | 1987-02-17 | Continuous speech recognition apparatus |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA1308195C (en) |
-
1987
- 1987-02-17 CA CA000529863A patent/CA1308195C/en not_active Expired - Fee Related
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2692581B2 (en) | Acoustic category average value calculation device and adaptation device | |
JP3168779B2 (en) | Speech recognition device and method | |
US6662159B2 (en) | Recognizing speech data using a state transition model | |
JPS6350747B2 (en) | ||
US4426551A (en) | Speech recognition method and device | |
US4901352A (en) | Pattern matching method using restricted matching paths and apparatus therefor | |
US5086472A (en) | Continuous speech recognition apparatus | |
CA1308195C (en) | Continuous speech recognition apparatus | |
EP0614169B1 (en) | Voice signal processing device | |
US5845092A (en) | Endpoint detection in a stand-alone real-time voice recognition system | |
US5751898A (en) | Speech recognition method and apparatus for use therein | |
US4794645A (en) | Continuous speech recognition apparatus | |
JPH0247760B2 (en) | ||
Jiang et al. | A minimax search algorithm for robust continuous speech recognition | |
JPS61145599A (en) | Continuous voice recognition equipment | |
JPH05181498A (en) | Pattern recognition device | |
JP3226716B2 (en) | Voice recognition device | |
JP3052520B2 (en) | Pattern classification device | |
KR100298118B1 (en) | Speech recognition device and method using similarity of HMM model | |
JPS6118200B2 (en) | ||
JPH04218100A (en) | Pattern recognizing device | |
JPH0346840B2 (en) | ||
JPH05224686A (en) | Method and device for judging voiced/voiceless | |
JPH0554678B2 (en) | ||
KR930005772B1 (en) | Hidden markov model including neural network for speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MKLA | Lapsed |