EP0551374A1 - Boundary relaxation for speech pattern recognition - Google Patents
Boundary relaxation for speech pattern recognitionInfo
- Publication number
- EP0551374A1 EP0551374A1 EP91917937A EP91917937A EP0551374A1 EP 0551374 A1 EP0551374 A1 EP 0551374A1 EP 91917937 A EP91917937 A EP 91917937A EP 91917937 A EP91917937 A EP 91917937A EP 0551374 A1 EP0551374 A1 EP 0551374A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- point
- pattern
- path
- score
- feasible
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000003909 pattern recognition Methods 0.000 title claims description 19
- 238000000034 method Methods 0.000 claims abstract description 52
- 239000013598 vector Substances 0.000 claims description 45
- 230000001755 vocal effect Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 11
- 230000006870 function Effects 0.000 abstract description 9
- 238000010606 normalization Methods 0.000 abstract description 3
- 238000004590 computer program Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 24
- 238000001514 detection method Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- LFYJSSARVMHQJB-QIXNEVBVSA-N bakuchiol Chemical compound CC(C)=CCC[C@@](C)(C=C)\C=C\C1=CC=C(O)C=C1 LFYJSSARVMHQJB-QIXNEVBVSA-N 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000002040 relaxant effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/12—Speech classification or search using dynamic programming techniques, e.g. dynamic time warping [DTW]
Definitions
- the present invention relates to pattern recognition processing generally and more particularly to speech recognition using a dynamic programming algorithm, typically a modification of a standard Dynamic Time Warping (DTW) or similar algorithms (for example Hidden Markov Model based on Viterbi s algorithm) .
- DTW Dynamic Time Warping
- Hidden Markov Model based on Viterbi s algorithm for example Hidden Markov Model based on Viterbi s algorithm
- the degradation in recognition accuracy due to mismatch in boundary determination can be reduced by various approaches.
- the method of Wilpon et al uses the approach of improving the accuracy in boundary determination to a certain degree of uncertainty.
- a procedure be developed that is immune to small endpoint errors.
- Rabiner et al attempts to improve speech recognition by relaxation of the boundary constraints and modification of the standard dynamic time warping algorithm, allowing the warping path to begin and end within a specified range with respect to the estimated boundaries.
- the accumulated distance of the final path is normalized by its length.
- the method of Rabiner et al is enhanced by the algorithm described in "Dynamic Time Warping with Boundaries Constraint Relaxation", by I.D. Shallo , R. Haimi Cohen and T. Golan, and published in Proc. Conf. IEEE Israel. 1989, paper 3.1.3.
- the algorithm of Shallom et al also uses relaxation of boundary constraints. Their method uses the dynamic time warping algorithm — that is, where a path length normalization factor is applied in the dynamic equation at each grid point. This improves the path optimization process.
- the present invention provides a method of improved pattern recognition which may be used for speech recognition by relaxation of boundary constraints so as to account for boundary detection errors.
- the dynamic programming algorithm is modified so that the known and predicted path lengths are taken into account when determining the optimal path to each gridpoint. Additionally, the present invention provides a method for improving the accuracy of the estimated boundaries of a tested pattern.
- a method for determining the predicted path length and for utilizing it in a dynamic programming algorithm is outlined below.
- apparatus for pattern recognition including apparatus for providing a digital pattern to be inspected which contains a plurality of feature vectors, apparatus for providing at least one digital reference pattern containing a different plurality of parameter vectors and apparatus for comparing the digital pattern to be inspected with the at least one digital reference pattern.
- the apparatus for comparing includes apparatus for providing a search area including a grid with the feature vectors on a first axis and the parameter vectors on a second axis and apparatus for calculating a final normalized score which is the estimated minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in the search area.
- SUBSTITUTESHEET apparatus for calculating includes, for each point in the search area, apparatus for computing an accumulated score for a plurality of feasible paths which contain the point, apparatus for computing an overall weight for each of the plurality of feasible paths which contain the point, apparatus for computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall weight for the point, for each of the plurality of feasible paths which contain the point, and apparatus for selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
- the search area includes a plurality of path beginning points and a plurality of path ending points.
- the apparatus for pattern recognition also includes an apparatus for determining beginning and ending points of that feasible path which is associated with the final normalized score thereby to determine beginning and ending points of the digital pattern.
- the overall weight includes an accumulated weight and a predicted weight.
- the pattern to be inspected is a speech utterance and the reference pattern is based on a Hidden Markov Model.
- the pattern to be inspected is a speech utterance
- the reference pattern is a reference template
- the feasible paths are calculated according to a Dynamic Time Warping algorithm.
- SUBSTITUTESHEET ending points of the feasible path which is associated with the final normalized score are used to estimate beginning and ending points of the pattern to be inspected. Additionally, in accordance with a preferred embodiment of the present invention, the digital pattern is derived from a speech signal.
- a method for producing a final normalized score which is the minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in a search area and wherein the search area includes a set of points characterized by a plurality of path beginning points and a plurality of path ending points.
- the method For each point in the search area, the method includes the steps of computing an accumulated score for a plurality of feasible paths which contain the point, computing an overall weight for each of the plurality of feasible paths which contain the point, computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall weight for the point, for each of the plurality of feasible paths which contain the point, and selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
- the method also includes the step of determining beginning and ending points of that feasible path which is associated with the final normalized score.
- the overall weight includes an accumulated weight and a predicted wei ht.
- SUBSTITUTESHEET normalized score indicates the similarity between a reference form and a pattern to be inspected.
- the pattern to be inspected is a speech utterance and the reference form is based on a Hidden Markov Model.
- the pattern to be in ⁇ spected is a speech utterance
- the reference form is a reference template
- the feasible paths are cal ⁇ culated according to a Dynamic Time Warping algorithm.
- the beginning and ending points of the feasible path which is associated with the final normalized score are used to estimate beginning and ending points of the pattern to be inspected.
- a method for pattern recognition including the steps of providing a digital pattern to be inspected which contains a plurality of feature vectors, providing at least one digital reference pattern containing a different plurality of parameter vectors, and comparing the digital pattern to be inspected with the at least one digital reference pattern.
- the step of comparing includes the steps of providing a search area including a grid with the feature vectors on a first axis and the parameter vectors on a second axis, and calculating a final normalized score which is the minimum of a plurality of optimal normalized scores each associated with a corresponding feasible path, wherein each of the feasible paths is located in the search area.
- the step of calculating includes, for each point in the search area, the steps of computing an accumulated score for a plurality of feasible paths which contain the point, computing an overall weight for each of the plurality of feasible paths which contain the point, computing a normalized score, whereby the normalized score is the accumulated score for the point divided by the overall
- SUBSTITUTESHEET weight for the point for each of the plurality of feasible paths which contain the point, and selecting the normalized score which is least, from the plurality of normalized scores, as an optimal normalized score for the point.
- Fig. 1 is a schematic block diagram illustration of the architecture of a preferred embodiment of speech recognition apparatus constructed and operated in accordance with a preferred embodiment of the present invention
- Fig. 2 is a schematic block diagram illustration of a speech recognition system constructed and operated in accordance with the principles of a preferred embodiment of the present invention
- Fig. 3 is a graphical representation illustration of an optimization procedure of a preferred embodiment of the invention.
- Fig. 4 is a pseudo-code illustration of a scoring algorithm for pattern recognition in the speech recognition system of Fig. 2 in accordance with a dynamic programming technique of the invention.
- Fig. 1 shows a schematic block diagram of the architecture of a microprocessor-based speech recognition system operated in accordance with the principles of the present invention.
- a user codec 2 such as an Intel 2913, from Intel Corporation, interfaces with digital signal processing circuitry 4, typically a TNS 320C25 from Texas Instruments Corporation.
- SUBSTITUTESHEET comprises a static random-access memory, such as a 32K by 8 bit with an access time of 100 nsec, is connected to the digital signal processing circuitry by means of a standard address data and read-write control bus.
- Fig. 2 shows a schematic block diagram of a microprocessor-based speech recognition system operated in accordance with the principles of the present invention.
- Fig. 2 The algorithms of Fig. 2 are typically carried out by software run on digital signal processing circuitry 4, such as the digital signal processing circuitry of Fig. 1.
- An analog signal 12 which may be obtained from a microphone or similar device, is typically provided to a standard sampling device 14.
- the output of the sampling device, the digital signal 16, is then supplied to a voice activated detection device 18 which may be a device as described in U.S. Patent Application 07/151,740 to the same assignee, which is incorporated herein by reference.
- the output of the voice activated detection device 18 is a digital speech signal 20.
- the voice activated detection device may be incorporated by digital signal processing circuitry 4(Fig. 1).
- the digital speech signal 20 After the digital speech signal 20 has been extracted from the input signal, the digital speech signal 20 is provided to a boundary detector 22 which typically determines the beginning and end points of an utterance that is found in the digital speech signal. The determination may be carried out by a standard boundary detector algorithm such as the type described by Wilpon et al.
- the utterance is then conveyed to a feature extraction device 26 where spectral or other features
- SUBSTITUTESHEET are typically extracted, typically through LPC analysis.
- the feature extraction procedure transforms the utterance into a sequence of test feature vectors 28.
- each test vector contains the features of a speech frame of approximately 30 msec.
- An overlap of typically 50% may be applied between adjacent speech frames.
- the sequence of test feature vectors 28 supplied by the feature extraction 26 is provided to a pattern recognition algorithm 30.
- the pattern recognition algorithm consists of two primary parts — a scoring algorithm 31 and a decision procedure 36.
- a set of reference templates 32 from a memory 34 is passed to the scoring algorithm 31 to serve as a reference.
- the memory storage area 34 is typically of the type depicted in Fig. 1.
- reference templates consisting of sequences of parameter vectors, are stored in the memory 34 during a process called training (not shown) .
- Training typically consists of inputting signals of a certain class to the system according to the steps of voice detection through feature extraction described above. Following these steps, the input signals are processed, and reference templates 32 are generated and stored in the memory area 34.
- the parameter vectors of the template provided by the training procedure represent characteristic features of the class of input signals.
- a template may represent utterances of a particular word or of a particular subword word unit such as a syllable or a phoneme.
- the template may represent the voice of a particular person.
- each parameter vector is a feature vector of a reference utterance.
- the parameter vectors may include parameters defining a model for a feature sequence of a test utterance.
- SUBSTITUTESHEET novel approach to pattern recognition using a modification of the dynamic programming method for the scoring procedure, is achieved based on a method of path estimation and normalization of an accumulated similarity score as described in detail hereinbelow.
- the novel approach to pattern recognition uses a modified Dynamic Time Warping algorithm or alternatively, a Hidden Markov Model algorithm for the scoring algorithm 31.
- any other suitable dynamic programming based algorithm may be used instead of the examples offered herein.
- the output of the scoring algorithm 31 is a set of final similarity scores (as defined hereinbelow) , with each score indicating the similarity between the sequence of test vectors 28 and each of the reference templates 32.
- the scoring algorithm output is typically provided to decision procedure 36 which may comprise a k-NN (k-Nearest Neighbor) rule for determination of the class of inputs to which the pattern between the beginning and endpoints in input signal 12 belongs.
- decision procedure 36 may comprise a k-NN (k-Nearest Neighbor) rule for determination of the class of inputs to which the pattern between the beginning and endpoints in input signal 12 belongs.
- the overall output of the pattern recognition procedure provides a code or index 40, which describes the class of inputs to which the pattern between the beginning and the endpoints in input signal 12 belongs.
- this code or index indicates the verbal contents of input signal 12.
- the code or index indicates the identity of the speaker who uttered the speech embodied in the input signal 12.
- FIG. 3 shows a graphical representation of a preferred embodiment of a part of the sequence of the pattern recognition procedure of Fig. 2 in accordance with a preferred embodiment of the invention.
- the graph representation shows a non-linear time warping function which may be used for scoring the
- the time warping function maps the time axis of a test feature sequence 50 to the time axis of a reference template 52.
- the mapping provides a time registration between the reference template 52, which is preferably provided by the memory storage area 34 (Fig. 2) and the test feature 50, which may be provided by the feature extraction device 26 (Fig. 2) .
- the reference template 52 comprises a sequence of M parameter vectors representing a word from a vocabulary recognizable by a speech recognition system such as the speech recognition system of Fig. 2. M may vary according to the particular reference template.
- the test feature sequence 50 comprises a sequence of N test feature vectors.
- the graph comprises a grid with points associated with a local similarity score for the point (n,m) where m is the m * *-* 1 parameter vector of the reference template and n is the n*-* *1 ** test feature vector in the sequence of test feature vectors.
- the skilled professional may determine the local similarity score associated with each pair of test feature vectors and reference parameter vectors according to his considerations.
- the local similarity scores may be determined by computing standard Euclidean or
- the local similarity score may be determined by a speech specific distortion measure such as the likelihood ratio distortion measure proposed by Itakura in the article, "Minimum Prediction Residual Principle Applied to Speech
- the local similarity score may be probabilistic.
- the probabilistic local similarity score could be computed using a parametric function of the test feature vector, which depends on the reference parameter vector. The function value provides a statistical estimate of the minus log of the likelihood of observing the test feature vector in a particular segment of the reference word.
- a feasible warping path, 54 is a sequence of grid points which satisfy certain constraints. Specific constraints are determined by the skilled professional. A typical constraint requires the feasible warping path to map the beginning and ending feature vectors of the test to the beginning and ending parameter vectors of the reference, respectively. Another typical constraint is that the slope of the warping path will be within a specified limit, typically between 1:2 and 2:1.
- Fig. 4 shows a pseudo-code description of a scoring algorithm as part of the pattern recognition in the speech recognition system of Fig. 2 in accordance with a preferred embodiment of a dynamic programming
- the algorithm of Fig. 4 can be implemented by the digital processing circuitry 4 of Fig. 1.
- the algorithm can be implemented using other suitable computing hardware in accordance with state-of-the-art electronic design and programming techniques.
- the scoring procedure which is typically based on a Dynamic Time Warping algorithm, or alternatively, on a Hidden Markov Model algorithm, is preferably used to determine the similarity between a test utterance and reference word in speech recognition procedures.
- initial values are assigned to each point in search area 56, where the search area is as defined above.
- This step is independent of the content of the sequence of test feature vectors, and depends only on the number N of test feature vectors in a certain sequence and the number M of parameter vectors in a reference template.
- a set of path beginning grid points and a set of path ending grid points are defined.
- a typical definition of the beginning set is:
- x., x. are the maximum expected beginning and end errors of the boundary detector at the beginning and at the end of the test word (assuming that the reference boundaries are sufficiently accurate) .
- SUBSTITUTESHEET (2) For each grid point in the search area, as defined hereinabove, a list of "access paths" is defined.
- An access path is a short path leading from a neighboring grid point to a given grid point.
- the access paths should be defined in such a way that a concatenation of access paths leading from a path beginning grid point to a path ending grid point constitutes a feasible path (as defined above) . Additionally, any feasible path must be representable as a concatenation of access paths from a path beginning grid point to a path ending grid point.
- the rule is described in the article, incorporated herein by reference, "Dynamic programming Algorithm Optimization for Spoken Word Recognition", published in the IEEE Trans. Acoustic. Speech and Signal Processing. Vol. ASSP-26, Feb. 1978, pp. 43-49.
- an access path may be defined by a left to right finite state automaton where each reference parameter vector is represented by a state and each grid point (n,m) indicates that at time n, the automaton has reached state m.
- An access path to a grid point (n,m) is a two-point path of the form [(n-l,k), (n,m)] where there exists a transition leading from the state representing the k-th reference parameter vector to the state representing the m-th reference parameter vector.
- Such a definition is common in Hidden Markov Models.
- STEP 2 LOOP ON GRID POINTS IN SEARCH AREA:
- a local weight may be defined indicating the significance of the local similarity score at that point.
- a bias at the point (n,m) may be defined to indicate the apriori likelihood of the feasible path passing through that point.
- the accumulated similarity score, D(n,m) of a feasible path containing the grid point (n,m) is the sum of all biases along the path from the path beginning to the point (n,m) , plus the sum of all local similarity scores from the path beginning to the point (n,m) , where each local score is multiplied by a corresponding local weight.
- the local similarity score is calculated according to the methods outlined above and the bias and local weight are calculated as defined below.
- the overall weight, W(n,m) of a path con ⁇ taining the point (n,m) is the sum of all local weights along that path from its beginning to its ending.
- the accumulated weight, B(n,m) of a path containing the point (n,m) is the sum of all local weights along the path, from the path beginning till the point (n,m) .
- the future weight, F(n,m) of a path containing the point (n,m) is the sum of all local weights along the path, from the point following (n,m) till the path end.
- the overall weight is the sum of the accumulated weight and the future weight.
- the optimal normalized similarity score, A*(n,m) is the minimum of the normalized similarity scores A(n,m) , taken over all feasible paths containing (n,m) .
- the optimal feasible path through (n,m) is the path for which A(n,m) was minimal. If there are more than one such paths, the choice of the optimal one is
- the optimal overall weight W*(n,m), the optimal accumulated weight B*(n,m), the optimal future weight F*(n,m) and the optimal accumulated similarity score D*(n,m) are the overall weight W(n,m) , the accumulated weight B(n,m) , the future weight F(n,m) and the accumulated similarity score D(n,m) respectively, associated with the optimal feasible path through (n,m) .
- the optimal path beginning grid point b* (n,m) , and the optimal path ending grid point _£*(n,m) are the beginning and ending points, respectively, of the optimal feasible path through (n,m) (the underline in _ and b indicates that each represents a pair of coordinates) .
- the local similarity score D(n,m) at point (n,m) is computed according to the methods outlined above.
- STEP 2.2 ESTIMATING THE FUTURE WEIGHT.
- F*(n,m) the optimal future weight is predicted.
- F*(n,m) is the average of the future weights from (n,m) to each of the path ending grid points which are accessible from (n,m) by a feasible path.
- F*(n,m) may be the median of those future weights.
- initial estimates for the optimal scores of a grid point (n,m) are established, based on the assumption that the optimal path begins at that point.
- step 2 If (n,m) is in the set of path beginning grid points (as defined in step 1) , the initial estimates are computed according to the following steps.
- a typical value for the bias is 0 and a typical value for the local weight is 2.
- a typical value for the bias may be minus log of the likelihood that the path begins at the given point (n,m) and the local weight may be set equal to 1.
- the value of the bias is estimated during the training procedure.
- the optimal beginning point is set to be the same point: fe*(n,m) - (n,m) .
- the optimal accumulated weight, B*(n,m) gets the value of the local weight.
- the optimal overall weight W*(n,m) is the sum of optimal accumulated and future weights, B*(n,m)+F*(n,m) .
- the optimal accumulated similarity score, D*(n,m), is the bias for the point (n,m) plus the local similarity score of that same point multiplied by the local weight of the point.
- the optimal normalized similarity score, A*(n,m), is the optimal accumulated similarity score divided by the optimal overall weight D*(n,m)/W*(n,m) .
- one of the access paths leading to a point (n,m) is checked for the hypothesis that the optimal path through (n,m) contains that particular access path. This is done by computing the normalized similarity score for a particular access path under this hypothesis and then comparing it to the current estimated value of the optimal normalized similarity score. If the computed value is smaller than the current estimate, all current estimates of optimal scores for that point (n,m) are replaced by the computed value.
- the bias may be minus log of the likelihood of moving to the current grid point from the preceding one (this likelihood may typically be determined during training) and the local weight is 1. This is the common ca ⁇ a in Hidden Markov Model devices.
- the accumulated similarity score D(n,m) is computed for a path which comprises the concatenation of the optimal path to (p,q) and the given access path. Therefore D(n,m) is calculated as D*(p,q) plus the sum of all biases along the given access path (except for the first point (p,q)) plus the sum of all local similarity scores along the access path (except for the first point (p,q)), each multiplied by the corresponding local weight.
- the overall weight W(n,m) is computed by adding the accumulated weight B(n,m) to the estimated optimal future weight F*(n,m). 2.4.4: COMPUTE NORMALIZED SIMILARITY SCORE FOR GIVEN ACCESS PATH
- A(n,m) is computed for a path which contains the concatenation of the optimal path to (p,q) and the given access path. Therefore A(n,m) is calculated as D(n,m) divided by W(n,m).
- SUBSTITUTESHEET path, A(n,m) is less than the current estimate of the optimal normalized similarity score, A*(n,m), the following step is performed: STEP 2.4.5.1: ASSIGN NEW OPTIMAL VALUES
- the current estimate for the optimal path through (n,m) is updated to be a path which contains the concatenation of the optimal path to point (p,q) and the given access path.
- D*(n,m), B*(n,m), W*(n,m), and A*(n,m) are replaced by the values corresponding to the updated optimal path, that is, D(n,m) , B(n,m) , W(n,m) , and A(n,m) , respectively.
- path beginning grid point b*(n,m) is set to be equal to fe(p,q), the optimal path beginning grid point of the beginning point of the given access path.
- the minimal value of A*(n,m), over all the points in the set of path ending grid points (as defined in step 1) is the final normalized similarity score.
- the feasible path associated with the final normalized score is the final path.
- the path ending grid point (n,m) of the final path is the final path ending grid point.
- the optimal path beginning grid point of the final path, b*(n,m) is the final path beginning grid point.
- STEP 3.2 DETERMINE FINAL BEGIN AND END ESTIMATES
- the first coordinates of the final path beginning grid point and of the path ending grid point are the final estimates for the beginning and ending of a test utterance, respectively.
- the second coordinate of these grid points indicates the beginning and ending, respectively, of the part of a reference template
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
L'algorithme de reconnaissance de la parole est mis en oeuvre dans un programme informatique en envoyant un signal d'entrée vocal dans un codeur (2) et en le traitant dans un ordinateur standard (4) au moyen de structures de référence stockées en mémoire (6). L'algorithme met en oeuvre la technique bien connue de la programmation dynamique pour inclure les fonctions de pondération et de normalisation.The speech recognition algorithm is implemented in a computer program by sending a voice input signal to an encoder (2) and processing it in a standard computer (4) using reference structures stored in memory. (6). The algorithm implements the well-known technique of dynamic programming to include the weighting and normalization functions.
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL95869 | 1990-10-02 | ||
IL9586990A IL95869A (en) | 1990-10-02 | 1990-10-02 | Boundary relaxation for speech pattern recognition |
IL98092 | 1991-05-09 | ||
IL98092A IL98092A0 (en) | 1991-05-09 | 1991-05-09 | Boundary relaxation for speech pattern recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0551374A1 true EP0551374A1 (en) | 1993-07-21 |
EP0551374A4 EP0551374A4 (en) | 1995-02-15 |
Family
ID=26322136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP91917937A Withdrawn EP0551374A4 (en) | 1990-10-02 | 1991-10-02 | Boundary relaxation for speech pattern recognition |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0551374A4 (en) |
WO (1) | WO1992006469A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19930522A1 (en) * | 1999-07-05 | 2001-02-01 | Univ Ilmenau Tech | Detecting sound signals involves weighting negative deviations of test vector coefficients from reference vector coefficients more heavily than positive deviations for score computation |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5569880A (en) * | 1978-11-22 | 1980-05-26 | Nec Corp | Pattern recognition unit |
JPS57147781A (en) * | 1981-03-06 | 1982-09-11 | Nec Corp | Pattern matching device |
US4400828A (en) * | 1981-03-27 | 1983-08-23 | Bell Telephone Laboratories, Incorporated | Word recognizer |
US4400788A (en) * | 1981-03-27 | 1983-08-23 | Bell Telephone Laboratories, Incorporated | Continuous speech pattern recognizer |
US4570232A (en) * | 1981-12-21 | 1986-02-11 | Nippon Telegraph & Telephone Public Corporation | Speech recognition apparatus |
US4624008A (en) * | 1983-03-09 | 1986-11-18 | International Telephone And Telegraph Corporation | Apparatus for automatic speech recognition |
US4751737A (en) * | 1985-11-06 | 1988-06-14 | Motorola Inc. | Template generation method in a speech recognition system |
-
1991
- 1991-10-02 WO PCT/US1991/007165 patent/WO1992006469A1/en not_active Application Discontinuation
- 1991-10-02 EP EP91917937A patent/EP0551374A4/en not_active Withdrawn
Non-Patent Citations (2)
Title |
---|
No further relevant documents disclosed * |
See also references of WO9206469A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO1992006469A1 (en) | 1992-04-16 |
EP0551374A4 (en) | 1995-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6125345A (en) | Method and apparatus for discriminative utterance verification using multiple confidence measures | |
US4918732A (en) | Frame comparison method for word recognition in high noise environments | |
US6226612B1 (en) | Method of evaluating an utterance in a speech recognition system | |
US7447634B2 (en) | Speech recognizing apparatus having optimal phoneme series comparing unit and speech recognizing method | |
US6615170B1 (en) | Model-based voice activity detection system and method using a log-likelihood ratio and pitch | |
US5613037A (en) | Rejection of non-digit strings for connected digit speech recognition | |
US5440662A (en) | Keyword/non-keyword classification in isolated word speech recognition | |
US7693713B2 (en) | Speech models generated using competitive training, asymmetric training, and data boosting | |
JP3049259B2 (en) | Voice recognition method | |
US6029124A (en) | Sequential, nonparametric speech recognition and speaker identification | |
US6535850B1 (en) | Smart training and smart scoring in SD speech recognition system with user defined vocabulary | |
US7027985B2 (en) | Speech recognition method with a replace command | |
US7013276B2 (en) | Method of assessing degree of acoustic confusability, and system therefor | |
US5822728A (en) | Multistage word recognizer based on reliably detected phoneme similarity regions | |
US8271283B2 (en) | Method and apparatus for recognizing speech by measuring confidence levels of respective frames | |
JPH09127972A (en) | Vocalization discrimination and verification for recognitionof linked numeral | |
US20020049593A1 (en) | Speech processing apparatus and method | |
JPH07334184A (en) | Calculating device for acoustic category mean value and adapting device therefor | |
McDermott et al. | Prototype-based minimum classification error/generalized probabilistic descent training for various speech units | |
CN112750445B (en) | Voice conversion method, device and system and storage medium | |
US5487129A (en) | Speech pattern matching in non-white noise | |
EP0255529A4 (en) | Frame comparison method for word recognition in high noise environments. | |
Sanchis et al. | Improving utterance verification using a smoothed naive bayes model | |
EP0551374A1 (en) | Boundary relaxation for speech pattern recognition | |
EP0177854B1 (en) | Keyword recognition system using template-concatenation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19930426 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LI LU NL SE |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 19941228 |
|
AK | Designated contracting states |
Kind code of ref document: A4 Designated state(s): AT BE CH DE DK ES FR GB GR IT LI LU NL SE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19950323 |