CA1258917A - Apparatus and method for identifying spoken words - Google Patents

Apparatus and method for identifying spoken words

Info

Publication number
CA1258917A
CA1258917A CA000515760A CA515760A CA1258917A CA 1258917 A CA1258917 A CA 1258917A CA 000515760 A CA000515760 A CA 000515760A CA 515760 A CA515760 A CA 515760A CA 1258917 A CA1258917 A CA 1258917A
Authority
CA
Canada
Prior art keywords
template
utterance
bias
positions
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000515760A
Other languages
French (fr)
Inventor
Priyadarshan Jakatdar
Hoshang D. Mulla
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alcatel Lucent NV
Original Assignee
ITT Industries Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ITT Industries Inc filed Critical ITT Industries Inc
Priority to CA000515760A priority Critical patent/CA1258917A/en
Application granted granted Critical
Publication of CA1258917A publication Critical patent/CA1258917A/en
Expired legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

ABSTRACT OF THE DISCLOSURE

An apparatus for identifying spoken words includes a plurality of reference templates having information at each position thereof representing the probability of a particular binary value occurring at a corresponding position in an utterance template. In addition, a system bias template is included to substantially eliminate recognition inaccuracies characteristic to the system.

Description

- 1 - P. JAKATDAR et al 1-2 ~'~ 5~ ~'7 APPARATUS AND METHOD FQR IDENTIFYING SPOKEN WORDS

BACKGROUND OF THE INVENTION

The present invention generally relates to an apparatus and method for identifying spoken words and, in particular, relates to such an apparatus compensating for any system-wide bias and employing reference templates containing all information generated during the enrollment phase.

The electronic identification of a spoken word or phrase has been the goal oE many researchers for many years. One common solution included generating a "voice print" or pattern that, essentially, was the electronic result of a time-based amplitude varying signal. The pattern was compared to a library of previously generated word patterns. Such an approach encountered numerous difficulties that were speaker dependent and/or required elaborate analog circuits to reproduce and compare the various word patterns.

More recently, considerable work has been dedicated to a word recognition technique generally referred to as template matching. In template matching, an array of binary numbers representative of features of any spoken word is generated.
These templates, generally known as-utterance templates, are then compared, i.e. soored, against a stored vocabulary of reference templates. Conventionally, reference templates are developed during an enrollment stage wherein a particular Z5 word, or list of words, is repeated and the resultant template for a given word is developed from the features common to all ~ `

~'~ S~9 ~
- ~ - P. JAKATDAR et al 1-2 repetitions of that word. The array positions wherein the result perceived was not constant throughout the entire enrollment are no~ used in the scoring. The rational for such a reference template generation scheme is that it is generally believed that, for accura~e comparisons~ the format and content of reference templates must be identical to the format and content o~ the u~terance templates being analyzed.

For example, a conventional reference template is formed such that, on any given repetition during enrollment, if the binary value for a specific bit position is identical to all previous binary values for that position, the updated position is assigned that binary value. However, if, on any repetition, that bit position is perceived as a binary value different than all the previous binary values therefor, that bit location becomes empty, i.e., becomes an unknown.
Nevertheless, unless a bit postion assumes a particular value for all enrollment repetitions, less than all of the total information extracted from speech signals for that word during enrollment is used during the word recognition process.

During conventional scoring, an utterance template, representing the extracted features of a speech signal of the word to be identiied, is compared with all reference templates in the library. After comparing each bit position in the utterance template with the corresponding bit position of a reference template, a score is developed for that reference template. The utterance template is scored against each reference in the library. The scores so developed are then analyzed to determine whether or not any reference ~; template scored high enough to identify the word represented by the utterance template. However, based on the reference template bit retention technique discussed above, each reference template includes a number of bit positions which are '~unknown." Further, the number of "unknowns" varies among the reference templates. Consequently, the comparison between any given reference template and the utterance template being analyzed is ambiguous.

Another difficulty that compounds the ambiguity of conventional systems is that a system-wide bias occasionally exists during enrollmen~. Such a bias results in a bit ~'~5~

position in all the reference templates being assigned a speci~ic bi.nary value regardless of ~he binary value in that bi~ po.sitlon of the utterance template for any word spoken, or the number of times a word is enrolled.
In addition~ SQme present template scoring schemes develop a score based on algorithms formulated to require a substantial number of multlplications and additions to calculate scores. Hence, due to the substantial number of computational steps required, the ldentiflcation o~ spoken words lsr presently, quite slow even with the use of modern high speed microprocessors. This is inherent because inter alia, multiplication is one of the slowest tasks that any arithmetic mlcroprocessor performs.
SUMMARY QF THE INVENTION
The present invention provides an apparatus for speech reaognitlon, said apparatus comprising~ an utterance template having a plurality of positions, eaah position having a binary value stored therein, each said binary value in each said posltlon representing a spectral fea~ure O.e speech to he recognized; a plural~ty of reference ~emplates, each havlng a plurallty o~ posl~lons, each o~ sald posltions of each said reference template representin~ a spectral feature corresponding to the spectral features represented by said plurallty of positlons of said utterance template, each said position ln each said reference templa~e having a value stored therein, said stored value being represenkatlve of a probability of a particular binary value occurrlng in said corresponding position of said utterance templa~e; and means for establishing a first score for each said reference template, each said score belng indlcatlve of a relative match be~ween that said re~erence ~emplate and sald utterance template, said first score establlshing means lncludes means ~3a- 65993-164 for providing outputs f.rom said reference template, each sald o~tput corresponding to one of said plu~ality of positions of said referen~e template and having a value determined from said stored value ther~in, m~ans for selec~ing outputs o~ the reference templates corresponding to positions of the utterance template having the particular blnary value stored ~herein and means for summing said selected outputs, whereby the sum is indicative of said relative match, and said speech represented by ~aid utterance template ~an be recognized.
The invention also provides apparatus for use in speech recognition, said apparatus comp~islng: an utterance template, composed of a plurality of positions, each position having a binary value stored therein, each said binary value representiny a spectral feature of speech to be recognized; a plurality of re~erence templates, each having a plurali~y of positions, each of said positions of each said reference template representing a spectral feature correspollding to the spectra~ features represented by said plurality of positions of said utterance template, each said position in each said reference template having a value stored therein, said stored value being representative o~ a probability of a particular binary value occurring in said corresponding position of said utterance template; a bias template having a plurality of positlons, each of one of said plurality of positions representing a spectral feature corresponding to the spec~ral features represented by said plurallty of positions in said utterance template, each position of said bias template having a value stored therein representing the probability of a particular binary value occurring at sald coxrespondln~
position in any utterance template; and means for establishing a score for each reference te~plate based on said values stored therein and on the values stored in said bias template~ said -3b- 65993-164 scores being indicative of a relatlv~ match between said reference templ~te and said utterance template, whereby said speech represen-ted by said utt~rance templa~e can be recognized.
Prom another aspect, the lnvention provides a method for matching templates~ said method comprises the steps of:
providing an utterance template having a plurality of positions, each having a binary value stored tharein each said binary value in each said position representing a spectral feature of speech to be recognlzed; providing a plurality of reference templates each having a plurallty of positlons, each of said positions representing a spectral feature corresponding to the spectral ~eatures represented by said plurality of positions in said utterance template, and having a value stored therein, each s~ored value representing a probabillty of a particular binary value occurring in the corresponding position in said utterance template; providlng outputs ~rom said reference templates, each outpu~. corre~ponding to one of said plurality of positions and having a value determined by said value stored therein; selec~ing outputs of the re~erence templates corresponding to positions of the utterance template having the particular blnary value stored therein; and summing the selected outputs of each said reference ~.emplate, whereby the sum is indicative of a relative match.
Other advantages will become apparent to those skilled in the ar~ from the following detailed descrip~ion read in conjunction with the appended claims and the drawing~
attached hereto.
~RIEF D~SCRIPTION OF THE DRAWING
Figure 1 is comprised o:~ Figures lA and lB and is a block diagram of a word recogni~ion apparatus embodying the principle~ of ~he present invention;

513~
- 4 - P. JAKATDAR et al 1-2 Figure 2 is a logarithmic scale modified for use in conjunction with the present invention.

Fig~lre 3A - 3C is an example of various template formats useful in the present invention; and Figure 4 is a flow diagram of a word recognition method used with the apparatus shown in Figure 1.

DETAILED DESCRIPTION OF THE INVE_TION

An apparatus, generally indicated at 10 in Figure 1 and embodying the principles of the present invention, includes means 12 for creating and storing a plurality of reference templates, means 14 for creating and storing a system bias template, means 16 for accessing an utterance template, and means 18 for modifying the plurality of stored reference `" template~and the system bias template. In addition, the apparatus 10 further includes means 20 for storing the modified reEerence and system bias templates, means 22 for estabishing a score for the modified templates with respect to an accessed utterance template and means 24 for accepting or rejecting a score, whereby when a score is accepted the reference template corresponding thereto represents the utterance represented by the accessed utterance template.

In the preferred embodiment, the means 12 for creating and storing a plurality of reference templates includes a storage medium 26, a shift register 28, a summer 30 having an output 32 and first and second inputs, 34 and 36 respectively, means 38 ~or addressing a particular reference template and means 40 for simultaneously addressing a particular position of the particular reference template addressed by the addressing means 38 and the corresponding position of shift register 28.

Similarly, the means 14 fOI' creating and storing a system bias template includes a storage medium 42, a shift register 44, a summer 46 having an output 48 and irs~ and second ~5~9~7 inputs, 50 and 52 respectively, and means 54 for simul-taneously addressing a particular position of both the storage medium 42 and the shift reyister 44. As more fully di~cussed below, the creating and storing means, 12 and 14, respect-ively, are active during enrollment and inactive during recognltion.
The means 16 for accessing an utterance template includes a shift register 56 adapted to receive an outpu~ted utterance template from an utterance template former 58.
Preferably, the shift regiæter 56 is connected to the reference template creatlng and s~oring means 12 and the bias template creatlng and storing means 14 via switching means 60.
The storage medium 26 of tha reference template creating and storing mean.s 12 has a capacity at least equal to the product of the number of bykes per template times the number of templates to be stored. E'or example, using templates having 256 byte~ and enrolling 200 different templates, for example, wherein eaah template represents one spoken word, a capacity of 512 Icilobytes is re~uired. The contents of the reference template storage medium 26 addressed by the ~S~3~317 -5a- 65993-164 addressing means 38 is outputted to the shi~ register 28. The shift reglster ~3 includes, in this example. 256 bytes i.e. one reference template worth of bytes. The output of the shift register 28 is connected to the summer 30 via, for example, the second input 36 thereto. The ~irst input 34 to the summer 30 is connected to the utterance shift register 56 via swi~ching means 60. The output of ~he summer 30 represents, for each word, the number o times a selected binary value (for example, a binary one~ occurred at that position in the shift register 28, i.e. the previous total, plus the occurrence of that bit value in the ut~erance ~hift register 56. Thus, the output of summer 30 is an updated total of occurrences of the selected binary value for each position. Hence, by this arrangement, each stored reference template maintains a running g~L~

sum of the occ~rrences of the selected binary value for each position ~or each re~erence template. Consequently, as more fully discussed below, all da-ta provided by the utterance tem-plate former 58 is retained.
The storage medium 42 of the bias template creating and storing means 14 has a capacity at least equal to the number of addressable positions ln an utterance template. In the preferred embodiment this capacity is 256 bytes. The contents of the bias template, during enrollment are wri-tten into the shift register 44 which is connected to the second input 52 of the summer 46. The second input 50 of the summer 46 receives the hit information from the corresponding position in the shift register 56 having the accessed utterance template therein. The output 48 of the summer 46, therefore, delivers a continuous updated total of the occurrence of a selected binary value for each position to the storage medium 42.
For example, for any given utterance template the bits therein are ones or zeros, i.e. binary highs or lows.
Fiowever, the information in the corresponding position of the shift register ~4 represents the total number of occurrences of the selected value that previously occurred at that position, regardless of the associated utterance template or the number of repetitions of reEerence templates. Naturally, the summer ~6 can be alternatively adapted to count only the occurrence of binary zeros at each posit:ion. Nevertheless, as more fully discussed below, all information from al] utterance templates formed and used throughout the entire enrollment is retained in the bias template storage medium 42 and used in the subsequent scoring of unknown utterance templates.
In the preferred embodiment, when utterance templates representing unknown words are to be identified, the contents of the storage medium, 26 and 42 are downloaded to the means 20 after being modified by the means 18. Essentially, the means 18 for modifying the plurality of reference templates and the system bias template, for each position, provides a pair of outputs representing the logarithmic values of the number of occurrences at that position of binary ones and binary zeros.

- 7 - P. JAKATDAR et al 1-2 In one particular embodiment, the means 18 includes a first logarithmic converter 62 adapted to received the system bias template from the storage medium 42 on a byte-by-byte basis. In addition, the converter 62 is connected in parallel with a complementary binary byte logarithmic former 64. The former 64 includes a subtraction means 66 serially connected with a second logarithmic converter 68. The subtraction means 66 is provided, as more fully discussed below, with the total number of utterance templates enrolled (T) from which the number in each position is subtracted on a byte-by-byte basisO That is, if the positions in storage medium 42 represent t~e occurrences of binary ones, the output of the subtraction means 66 represents the number of occurrences of binary zeros for those same positions. Thus, the outputs from the first and second logari~hmic converter, 62 and 68, respectively, represent the logarithmic values of all data for that bit position established during enrollment. The outputs from the logarithmic converter, 62 and 68, are stored in an ~~ expanded storage medium, 70 and 72, respectively, in the storing means 20.

The means 18 also includes a similar arrangement adapted to receive the plurality of reference templates from the storage medium 26 on both a per reference template basis and a byte-by-byte basis for each stored reference template.
Specifically, a third logarithmic converter 74 is connected in parallel with a reference template complimentary logarithmic former 76. The former 76 having a serially connected subtraction means 78 and a fourth logarithmic converter 80.
In this instance, the minuend stored in the subtraction means 78 is the number of times the word represented by a particular reference template was enrolled. Thus, the outputs for the third and fourth logarithemic converts, 74 and 80, respectively, represent the logarithmic values of all data for that position o~ each word established during enrollment. The outputs of the third and fourth logarithmic converters, 74 and 80~ are stored in a plurality of expanded storage medium, 82 and 84, respectively, in the s~oring means 20.

~ , As a resul~, the means 20 for storing the modified plurality of re~erence templates and the7modiied system bias templates includes the storage mediums, ~ and ~, each having ~5~7 256 bytes, i.e. one template array, and storage mediums, 82 and 84, containing the plurality oE reference templates formatted as complementary pairs of 256 by-te arrays for each word stored.
The means 22 for es-tablishing a score for the modi-fied templates wlth respect to an utterance template includes a means 86 for selecting a ~yte Erom one of the storage medium, 70 or 72, and a byte from one of -the storage mediums, 82 or 84 for each reference template, means 88 for summing all bytes from the selecting means 86 for the system bias template and Eor each of the reference templates, and means 90 for storing the sums of the means 88.
In the preferred embodiment, -the means 86 includes first and second shift registers, 92 and 94 respectively, adap-ted to receive the data stored in the storage mediums, 82 and 84, respectively, for a given reference template. The means 86 further includes a system bias template byte selection means 100 and a reference temp]ate byte selection means 102. The byte selection means, 100 and 102, are mutually positioned according to the bit presented by the corresponding binary value oE the bit in the position in the utterance shift regis-ter 56. For example, iE the binary value of a particular bit position in the shiEt register 56 is a binary one the shift regis-ters, 92 and 96, are selected. The shift registers, 92 and 96, as previousJy discussed, correspond to the storage mediums, 70 and 82 respectively, having the values of the occurrences of binary ones therein. Conversely, if the binary value in the shift register 56 is a binary ~ero, the shift registers, 94 and 98, are selected.
The values Erom the selected shift registers, i.e.
either, 92 and 96, or, 94 and 98 are separately summed by the summing means 88A and 88B, each maintaining a single running total for all values of the selected registers, 92 and 96, or 94 and 98, for each 256 bytes, i.e. each reference template.
Preferably, as more fully discussed below, after each 256 bytes, i.e. after each reference template, the accumulated sums from the summing means, 88A and 88B are weighted.
The weighted sum rom the system bias summer 88A is stored in a first buEfer 104 and, by means of a summer 106, subtracted from each weighted sum from the word storage ~ 8'~

mediums, 82 and 84. The output oE sumlller 106, for each reference template, is stored in a second bufer 108. Prefer-ably, the second buffer 108 includes means 110 for selecting and outputting ~he highest value stored therein. The outputs from the first and second buffers, 104 and 108, are inputted to the means 24 for accepting or rejecting a score.
In the preferred embodiment, the means 24 includes first and second comparators, 112 and 114, and a logic AND gate 116. The second comparator 114 receives, as an input t~ereto, the system bias score from buffer 104 and includes, i.e.
compares that score with, a preselected, or threshold value.
If the sum from buffer 104 exceeds the threshold value a binary one, for example, is outputted to one input of the AND gate 116. The first comparator 112 receives, as an input thereto, the highest value in the buEfer 108 and includes, Eor comparison purposes, a preselected value, or threshold, stored therein. If the value Erom the buffer 108 exceeds the thres-hold a binary one, Eor example, is outputted to the other input of the AWD gate 116. If, and only if, both inputs to the AND
gate 116 are binaxy ones, an acceptance signal, i.e., a parti-cular binary value, is outputted Erom the AND gate 116. The signal so outputted can then be used, by known techniqueæ, for any purpose SUCII as to execute a preselected command within a computer.
The apparatus 10 further includes a bit clock genera-tor 118 that provides pulses at a regular, preselected Ere-~uency for ensuring coordination oE data flow and synchroniza-tion of data scoring. The output pulse of the generator 118 is designated as BCLK in Figure 1. As shown, the active position of each shift register, 28 and 44 during enrollment, and, 56, 92, 94, 96 and 98 during recognition is controlled by a pointer according BCLK. Further, the summing means 88 and the storage mediums, 26 and 42 during enrollment, and 70, 72, 82 and 84 during recognition are stepped according to the BCLK. The number of pulses from the bit clock generator 118 is monitored by a counter 120 which produces an output signal, RESET B, after every 256 BCLK clock pulses~ i.e. after one template has been operated on.

3~9 ~

The RESF,T B signal controls the reference template presented by the storage mediums, 2~, 82 and 84. In addition, the ~ESET B signal controls the summing means 88 such that, upon receiving the RESET B signal, the accumulated sums therein are outputted thereEore. A bias template counter 122 counts the total number of utterance templates enrolled and maintains that sum in subtrac-tion means ~6. A reference template counter 124 is provided to count the number of times a particular word is enrolled. The output of the reference template counter 124 is provided to the subtraction means 76.
During enrollmen-t the output from the 256 bit counter 120 is connected, via switch means 126, to a prompter for indicating to the trainer to recite the next word. During recognition the output from the counter 120 is connected to the shift register, 92, 94, 96, and 98, and the bit summing means ~38.
Referring to Figure 2, exemplary modiEied logarithmic scale 128 is shown which scale is preferably used in the log converters, to deternline the outputs thereof. Most noticeably is that if the input is equal to zero the output is set to zero. Ordinarily, of course, the logarithmic value of zero is equal to negative infinity. However, such a value would have no reaL meaning in the present apparatus or the calculations made therein. In addition, to ensure that the entire log converter, which essentially is a look up table, is within a one byte segment of memory the output maximum is set to 255.
The present log table is derived utilizing the empirical formula Output = 14 t 100 x log (Input), except, as previously mentioned, that when Input = 0 the Output is set to zero.
Thus, the desired constralnts are provided and those values which, in a conventional voice recognition apparatus, would ordinarily be multiplicants are now sums.
As previously mentioned, the resultant array sums from the summing means 88 are preferably weighted. In one implementation the weighting mechanism includes summers 130A
and 130B having as one input thereto a constant value, for example stored in an associated memory, 132A and 132B. The constant values are, by means of the summers, 130A and 130B, ~'~5~9~
~ P. JAXATDAR et al 1-2 subtracted from the output from the summing means 88. For the systcm bias summation, the weighted value is equal to 256 times the log (T), whereas as the weighting value for the reference ~emplate sum is 256 times the log (N).

The weighting values are, effectively, normalization constants that enable the resultant scores to be independant of the number o times each word was enrolled. As a result, the thresholds selected for the comparators, 112 and 114, are constant regardless of the number of enrollments. But for this normalization the threshold values used in scoring would have to be adapted to vary according to the number of times a particular word was enrolled. This would increase the complexity of scoring and deter a user from enrolling a limited part of the total vocabulary library more than the remainder. Such selected increased enrollment of particular words is preferable due, for example, to the inherent nature of any language having words sounding similar to each other.

Referring to Figures 3A to 3C, one major difference of the present apparatus 10 over conventional voice recognition systems is depicted by the template format for the utterance template 13~ and the enrolled templates, 136 and 138. As shown, the utterance template 134 is composed of a plurality of binary values representing features extracted from a voice signal. The enrolled templates, 136 and 138, however, include values at each position thereof, representing the total number of occurrences of, for example, a binary one in any utterance template during enrollment. Hence 9 all features from all utterance templates are used for scoring during the recognition phase. Ulti~ately, rather than relying on those bits remaining in a stored utterance format reference template, the actual probability of occurrence of a particular binary bit is used to score each utterance template. Such an approach clearly increases the accuracy of the template matching procedure. In fact, the template scoring accuracy is such that the means 24 for scoring a template, in fact, only produces either an accept signal or a reject signal. This is in clear contrast to conventional systems that requently either request further information, request word repetition, or advise the user that there was insufficient information to score or recognize the word in question.

58~t7 - 12 - P. JAKATDAR et al 1-2 The employment of the present apparatus 10 thus provides a unique method ~or ascertaining and recognizing a spoken word by use of template matching. As shown in Figure 4, ~he method includes the step of first enrolling a plurality of words and maintaining all extracted features relative thereto. The enrolled data is expanded according to the logarithmic scale, discussed above, and stored in the expanded memory means.

During recognition the outputs from the selected memories are summed and weighted to result in a score which definitively rejects or accepts a spoken word.

It will be understood that the information stored in the buffer lOe further includes positional information with regard to the particular reference template being scored.
Consequently, once the highest score is established, the location in the expanded memories is known and is accessable to ascertain the exact intended consequence of the acceptance signal generated by the AND gate.

In one alternative embodiment it will be recognized that the means for modifying the stored enrolled templates can be inserted between the expanded memories and the respective shift reglsters. However, as this would increase the actual summation time it is preferred that the modification means be positioned bet~een the enrollment memory and the modified memory.

The apparatus 10 described herein can be substantially ~, implemented on a iAPx 8088 microprocessor chip, for example, that manufactured and marketed b~ Intel Corporation of Sunnyvale, Calif., in conjunction with a conventional 64 kbit dynamic random access memory (RAM) bank.

The present apparatus 10 offers numerous advantages over conventional devices. First, since it utilizes a bias template any differences resulting from differences in personnel during enrollment or during the use, as well as differences in particu~ar equip~en~ are eliminated. In addition, since all data generated ic retained and utilized, the accuracy is increased. Further, the entire scoring - 13 P. JAKATDAR et al 1-2 procedure is quite fast since all arithemic operations are summations rather than multiplications. Still further, the contents of the reference template is significantly different than that of the utterance template being scored. More specifically, the data in the utterance template is binary and derived from frequency versus time procedures whereas the data in the reference template is based on the probabilities of occurrence of a particular binary value in that particular position.

Although the present inven~ion has been described herein, with reference to an exemplary embodiment it will be understood that other configurations will be comtemplated which nevertheless do not depart from the spirit and scope of the present invention. Consequently, the present invention is is deemed limited only by the appended claims and the reasonable interpretation thereof.

Claims (28)

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. An apparatus for speech recognition, said apparatus comprising:
an utterance template having a plurality of positions, each position having a binary value stored therein, each said binary value in each said position representing a spectral feature of speech to be recognized;
a plurality of reference templates, each having a plurality of positions, each of said positions of each said reference template representing a spectral feature corresponding to the spectral features represented by said plurality of positions of said utterance template, each said position in each said reference template having a value stored therein, said stored value being representative of a probability of a particular binary value occurring in said corresponding position of said utterance template; and means for establishing a first score for each said reference template, each said score being indicative of a relative match between that said reference template and said utterance template, said first score establishing means includes means for providing outputs from said reference template, each said output corresponding to one of said plurality of positions of said reference template and having a value determined from said stored value therein, means for selecting outputs of the reference templates corresponding to positions of the utterance template having the particular binary value stored therein and means for summing said selected outputs, whereby the sum is indicative of said relative match, and said speech represented by said utterance template can be recognized.
2. Apparatus as claimed in Claim 1, further comprising:
a bias template, said bias template having a plurality of positions corresponding to said plurality of positions of said utterance template, each said position of said bias template having a value stored therein, said stored value being representative of the probability of a particular binary value occurring in said corresponding position of any utterance template; and means for establishing a second score by adding the values stored in the positions of the bias template corresponding to positions of the utterance template having the particular binary value stored therein, said second score being indicative of a relative match between said bias template and said utterance template.
3. Apparatus as claimed in Claim 2, wherein said second score establishing means includes:
means for providing outputs from said bias template, each said output corresponding to one of said plurality of positions of said bias template and having a value determined from said stored value therein; and means for summing outputs of the bias template corresponding to positions of the utterance template having the particular binary value stored therein, whereby the sum is indicative of said relative match between said bias template and said utterance template.
4. Apparatus as claimed in Claim 3, further comprising:
means for expanding each said reference template into a first expanded reference template and a second expanded reference template, the values of said first expanded reference template representing the probability of a binary one occurring in the corresponding position of said utterance template and the values of said second expanded reference template representing the probability of a binary zero occurring at said corresponding position of said utterance template; and means for expanding said bias template into a first expanded bias template and a second expanded bias template, the values of said first expanded bias template representing the probability of a binary one occurring in the corresponding position of any said utterance template and the values of said second expanded bias template representing the probability of a binary zero occurring at said corresponding position of any said utterance template.
5. Apparatus as claimed in Claim 4 wherein said first score establishing means includes:
means for selecting for each position of the expanded reference templates either the output from said first expanded reference template if the corresponding position in the utterance template contains a binary one or the output from said second expanded reference template if the corresponding position in the utterance template contains a binary zero for scoring each reference template.
6. Apparatus as claimed in Claim 5 wherein said second score establishing means includes:
means for selecting for each position of the expanded bias templates either the output from said first expanded bias template if the corresponding position in the utterance template contains a binary one or the output from said second expanded bias template if the corresponding position in the utterance template contains a binary zero for scoring the bias template.
7. Apparatus as claimed in Claim 4 wherein:
said expanding means includes means for logarithmi-cally converting said values in said plurality of reference templates and said values in said bias template.
8. Apparatus as claimed in Claim 7 further comprising:
means for selecting the highest of said first stores;
means for comparing the highest of said first scores with a threshold; and means for generating a recognition acceptance signal when said highest first score exceeds said threshold and a rejection signal when said highest first score is less than said threshold.
9. Apparatus as claimed in Claim 2 further comprising:
means for converting said values in said plurality of reference templates and said values in said bias template to logarithmic data.
10. Apparatus as claimed in Claim 9 further comprising:
means for selecting the highest of said first scores;
means for comparing the highest of said first scores with a threshold; and means for generating a recognition acceptance signal when said highest first score exceeds said threshold and a rejection signal when said highest first score is less than said threshold.
11. Apparatus as claimed in Claim 10 further comprising:

means for comparing said second score with a second threshold; and means for generating a recognition acceptance signal only when said highest first score exceeds said threshold and when said second score exceeds said second threshold.
12. Apparatus as claimed in Claim 1, further comprising:
means for enrolling said plurality of reference templates by increasing the values stored in positions of a reference template for a selected speech each time an utterance template for the selected speech contains in corresponding positions the particular binary value, each said reference template being enrolled from a plurality of utterance templates and each said plurality of utterance templates being composed of spectral features representing different instances of the same speech.
13. Apparatus as claimed in Claim 12, further comprising:
means for enrolling a bias template, said bias template having a plurality of positions corresponding to said plurality of positions of said utterance template, each said position of said bias template having a value stored therein, said stored value being representative the probability of a particular binary value occurring in said corresponding position of any utterance template, said bias template being enrolled from all said utterance templates employed to enroll said reference templates by increasing the values stored in positions of said bias template each time an utterance template contains in corresponding positions the particular binary value.
14. A method for matching templates, said method comprises the steps of:
providing an utterance template having a plurality of positions, each having a binary value stored therein each said binary value in each said position representing a spectral feature of speech to be recognized;
providing a plurality of reference templates each having a plurality of positions, each of said positions representing a spectral feature corresponding to the spectral features represented by said plurality of positions in said utterance template, and having a value stored therein, each stored value representing a probability of a particular binary value occurring in the corresponding position in said utterance template;
providing outputs from said reference templates, each output corresponding to one of said plurality of positions and having a value determined by said value stored therein;
selecting outputs of the reference templates corresponding to positions of the utterance template having the particular binary value stored therein; and summing the selected outputs of each said reference template, whereby the sum is indicative of a relative match.
15. Method as claimed in Claim 14, comprising the further step of:
providing a bias template having a plurality of positions corresponding to said plurality of positions of said utterance template each said position having a value stored therein, said stored value representing the probability of a particular binary value occurring in the corresponding position in any said utterance template; and establishing a second score, indicative of a relative match between said bias template and said utterance template, by adding the values stored in the positions of the bias template corresponding to positions of the utterance template having the particular binary value stored therein.
16. Method as claimed in Claim 15, wherein said second score establishing step includes:
providing outputs from said bias template, each output corresponding to one of said plurality of positions of said bias template and having a value determined by the value stored therein; and summing the outputs of the bias template corresponding to positions of the utterance template having the particular binary value stored therein .
17. Method as claimed in Claim 15, further comprising the steps of:
expanding each said reference template into a first expanded reference template and a second expanded reference template, the values of said first expanded reference template representing the probability of a binary one occurring in the corresponding position of said utterance template and the values of said second expanded reference template representing the probability of a binary zero occurring at said corresponding position of said utterance template; and expanding said bias template into a first expanded bias template and a second expanded bias template, the values of said first expanded bias template representing the probability of a binary one occurring in the corresponding position of any said utterance template and the values of said second expanded bias template representing the probability of a binary zero occurring at said corresponding position of any said utterance template.
18. Method as claimed in Claim 17 wherein said first score establishing step includes:
selecting for each position of the expanded reference templates either the output from said first expanded reference template if the corresponding position in the utterance template contains a binary one or the output from said second expended reference template if the corresponding position in the utterance template contains a binary zero for scoring each reference template.
19. Method as claimed in Claim 18 wherein said second score establishing step includes:
selecting for each position of the expanded bias templates either the output from said first expanded bias template if the corresponding position in the utterance template contains a binary one or the output from said second expanded bias template if the corresponding position in the utterance template contains a binary zero for scoring the bias template.
20. Method as claimed in Claim 17 wherein said expanding step includes:
logarithmically converting said values in said plurality of reference templates and said values in said bias template.
21. Method as claimed in Claim 20 further comprising the step:
selecting the highest of said first scores;

comparing the highest of said scores with a threshold; and generating a recognition acceptance signal when said highest first score exceeds said threshold and a rejection signal when said highest first score is less than said threshold.
22. Method as claimed in Claim 19 further comprising the step of:
converting said values in said plurality of reference templates and said values in said bias template to logarithmic data.
23. Method as claimed in Claim 22 further comprising the step of:
selecting the highest of said scores;
comparing the highest of said first scores with a threshold; and generating a recognition acceptance signal when said highest first score exceeds said threshold and a rejection signal when said highest first score is less than said threshold.
24. Method as claimed in Claim 23 further comprising the step of:
comparing said second score with a second threshold;

and generating a recognition acceptance signal only when said highest first score exceeds said threshold and when said second score exceeds said second threshold.
25. Method as claimed in Claim 14, further comprising the step of:
enrolling said plurality of reference templates by increasing the values stored in positions of a reference template for a selected speech each time an utterance template for the selected speech contains in corresponding positions the particular binary value, each said reference template being enrolled from a plurality of utterance templates and each said plurality of utterance templates being composed on spectral features representing the different instances of the same speech.
26. Method as claimed in Claim 25, further comprising the step of:
enrolling a bias template, said bias template having a plurality of positions corresponding to said plurality of positions of said utterance template, each said position of said bias template having a value stored therein, said stored value being representative the probability of a particular binary value occurring in said corresponding position of any utterance template, said bias template being enrolled from all said utterance templates employed to enroll said reference templates by increasing the values stored in positions of the bias template corresponding to positions of the utterance template having the particular binary value stored therein.
27. Apparatus for use in speech recognition, said apparatus comprising:
an utterance template, composed of a plurality of positions, each position having a binary value stored therein, each said binary value representing a spectral feature of speech to be recognized;
a plurality of reference templates, each having a plurality of positions, each of said positions of each said reference template representing a spectral feature corresponding to the spectral features represented by said plurality of positions of said utterance template, each said position in each said reference template having a value stored therein, said stored value being representative of a probability of a particular binary value occurring in said corresponding position of said utterance template;
a bias template having a plurality of positions, each of one of said plurality of positions representing a spectral feature corresponding to the spectral features represented by said plurality of positions in said utterance template, each position of said bias template having a value stored therein representing the probability of a particular binary value occurring at said corresponding position in any utterance template; and means for establishing a score for each reference template based on said values stored therein and on the values stored in said bias template, said scores being indicative of a relative match between said reference template and said utterance template, whereby said speech represented by said utterance template can be recognized.
28. Apparatus as claimed in Claim 27 further comprising:
means for expanding said bias template into a first and a second bias template, each having positions corresponding to said plurality of positions in said utterance templates, said first bias template having bias values stored in each position therein representing the probability of a binary one occurring at said corresponding position in said utterance template and said second bias template having bias values stored in each position therein representing the probability of a binary zero occurring at said corresponding position in said utterance template.
CA000515760A 1986-08-12 1986-08-12 Apparatus and method for identifying spoken words Expired CA1258917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA000515760A CA1258917A (en) 1986-08-12 1986-08-12 Apparatus and method for identifying spoken words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA000515760A CA1258917A (en) 1986-08-12 1986-08-12 Apparatus and method for identifying spoken words

Publications (1)

Publication Number Publication Date
CA1258917A true CA1258917A (en) 1989-08-29

Family

ID=4133714

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000515760A Expired CA1258917A (en) 1986-08-12 1986-08-12 Apparatus and method for identifying spoken words

Country Status (1)

Country Link
CA (1) CA1258917A (en)

Similar Documents

Publication Publication Date Title
EP0319140B1 (en) Speech recognition
EP0099476B1 (en) Identity verification system
CA1172363A (en) Continuous speech recognition method
EP0112717B1 (en) Continuous speech recognition apparatus
US4241329A (en) Continuous speech recognition method for improving false alarm rates
US4736429A (en) Apparatus for speech recognition
US5195167A (en) Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition
US4297528A (en) Training circuit for audio signal recognition computer
US4227176A (en) Continuous speech recognition method
EP0295876A2 (en) Parallel associative memory
US5073939A (en) Dynamic time warping (DTW) apparatus for use in speech recognition systems
JP2819039B2 (en) Magnetic character reading apparatus and method
GB2033637A (en) Method of verifying a speaker
NL8104217A (en) WORD RECOGNITION.
JPS6131477B2 (en)
US4776017A (en) Dual-step sound pattern matching
US5263117A (en) Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer
US5832108A (en) Pattern recognition method using a network and system therefor
US4852171A (en) Apparatus and method for speech recognition
EP0452023A2 (en) Method and apparatus for pulse sorting
US4388491A (en) Speech pitch period extraction apparatus
CA1258917A (en) Apparatus and method for identifying spoken words
US4405838A (en) Phoneme information extracting apparatus
Rabiner et al. Isolated word recognition using a two-pass pattern recognition approach
Unnikrishnan et al. Speaker-independent digit recognition using a neural network with time-delayed connections

Legal Events

Date Code Title Description
MKEX Expiry