CA1236577A - Apparatus and method for performing acoustic matching - Google Patents

Apparatus and method for performing acoustic matching

Info

Publication number
CA1236577A
CA1236577A CA000494697A CA494697A CA1236577A CA 1236577 A CA1236577 A CA 1236577A CA 000494697 A CA000494697 A CA 000494697A CA 494697 A CA494697 A CA 494697A CA 1236577 A CA1236577 A CA 1236577A
Authority
CA
Canada
Prior art keywords
phone
probability
label
given
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000494697A
Other languages
French (fr)
Inventor
Lalit R. Bahl
Robert L. Mercer
Steven V. Degennaro
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of CA1236577A publication Critical patent/CA1236577A/en
Expired legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

ABSTRACT

The invention herein provides, in a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) at least one transition and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of: forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone machine corresponding thereto.

Description

Y0983-0~9 L2365i7~7 APPARATUS AND ~ETHOD FOR PERFOR`1ING ACO~STIC `IATCHI~G

BAC~GRO~ND OF rHE INVE~TTION

I. FIELD OF THE INVENTION

The present invention relates primarily to the art of speech recognition, specifically the art of perfo-ming a statistical match between vocabulary words and incom-ing labels produced by an acoustic processor in response to a speech waveform input.

II. DESCRIP~IO~l OF THE PRIOR ART

Typically, the purpose of a speech recognition sys~em or machine i5 to automatically transform natural speech into some other form, for example written form. In achieving this aim, various general approaches have been considered. One approach is directed ~o simulating human speech interpretation processes. Another approach is to view speech in a statistical context.

In the statistical approach itself, several techniques have been considered as suggested bv the Bahl, Jelinek, and Mercer article, "A Maximum Likelihood Approach to Con~inuous Speech Recognition," IEEE Transactions on
-2- ~3~S77 Pattern Analvsis and Machine Intelligence, Volume PAMI-5, Number 2, pp. 179-190 (1983). In the 8ahl et S al. article, it is noted thaL the typical model of a speech recognition sys~em includes a text generator followed by a speaker, or talker. The text generator determines what is to be said and the speaker produces a natural speech waveform. The natural speech waveform enters an acoustic processor the output from which en-ters a linguistic decoder. Depending on the technique employed, the above-noted elements may be associated in various ways. Bahl et al. combine the speaker and acoustic processor to function as an acoustic channel wherein the speaker provides text as a speech waveform and wherein the acoustic processor acts as a data compressor which provides a string of labels (also re-ferred to as symbols or fenemes) to the linguistic de-coder. The labels may be generated in any of a number of ways and are commonly identified collectively as the string Y made up of sequential labels YlY2Y3---- The purpose of the linguistic decoder is to represent the original spoken text, in some prescribed form, based on the incoming string of labels.

In the above~noted article, one acoustic -~
~rocessor the IBM* centisecond acoustic, processcr (CSAP) -- is describe~

* Trade l~ark
3 ~2~

as transforming the speech waveform into a slring of parame~er vectors. Each parameter vector is compared to stored prototypes (or standard vectors) -- the distance from the parameter vector and each prototype being de-termined. The "label" for the proto~;ype which is clos-est is then assigned to the waveform parameter vector.
The label can have any of various forms and may be de-termined in any of various 1rnown manners in accordance with existing technology.

The purpose of the linguistic decoder is to perform a matching process between the incoming labels and words provided in the system vocabulary. In the probabilistic approach set forth in the Bahl et al. article, the lin-guistic decoder aims at de~ermining a word strinO l; that has the highest probability of having produced the string of labels Y1Y2Y3--- Mathematically, this is re-presented by the expression:

2~ MaxPr(h¦Y~, (1) the maximum probability of h given Y over all word strings ~'. According to well-~nown prob~bility theory, this can be ~-ritten as:

Pr~WIY)= Pr(W) x Pr(YIW)tPr~y) (2)
-4- ~6~77 where Pr(Y) is independent of W and the probability of a given word string W, namely Pr(W), is determined by a language model in the linguistic decoder.

Suppose that at some point in the decoding process some initial substring. for example YlY2 YT~ has been ten-tatively decoded as the ~ord string WlW2...Wn. The present invention is directed to determining a set of candidate words W( +i) for which ( ~n+l)lYl YTYT+l YT'k' 1 n) is relatively large --compared to other words in ~he vocabulary - for some value of k.

In determining Pr(YIW), Markov modelling has been con-sidered. The number of computations required by several linguistic decoding techniques is noted in the Bahl et al. article as being fairly high, especially with larger vocabularies on the order of ~000 words and more for example.

A key question in linguistic decoding has therefore been how to determine Pr(Y¦W) for word strings from a vocab-ulary without requiring inordinate computation time and without sacrificing accuracy in decoding.

~365~

This object and others are addressed by the present in-vention.

SU~I!~RY OF THE I~VE~TIO~

In accordance with the present invention, a linguistic decoder is provided which facilitates the determination of which word or words in a vocabulary have the highest probability of producing a par~icular string of labeis.
To achieve this object, the presen~ invention features apparatus ,?nd method for performing a statistical match of words to labels with a number of appro~imations that expedite the matching determination without introduclng undesired loss in accuracy. In addition, the presen~
invention relates to apparatus and method wherein ~.ords having similar phonetic beginnings are matched against the incoming labels simultaneously for so lonO as the words have similar phonetic beginnings.

In noting the approximations embodied in the invention, it should be recogni~ed that the invention models each vocabulary word as a sequence of phones. ~ach phone is represented by a phone machine. Each phone machine is precisely characterized as having (a) a plurality of states, (b) transitions from state to state and a prob-2S ability associated with each transition, and (c) an ac-Y0983-0~9 ~ ~3~

tual output probabilit-y that a given label is produced by a given phone machine at a given transition -- each phone machine (corresponding to a given phone) defining the probability that a given label is generated at a given transition thsreof. It is possible to determine match scores for words based on these characteris~ics, however the number of computations is high. In accord-ance with the present invention, prefer2bly each ohorle machine is simplified by replacing the ac;ual l_bel probability for each label at all transitions in a g ven phone mach ne with a speciflc replacement value. The specific replacement value is preferably selected so 1~ that the match value for a given phone with the rn-placement value included is an overestimation o' tlls match value achieved by the detailed match ~here ~he replacement values do not replace the actual label probabilities. One way of assuring this condition is by selecting each replacement ~-alue so that no probabili~y corresponding to a given label in a given phone machine is greater than the replacement value thereof. 3y sub-stituting the actual label probabilities in a phone ma-chine with corresponding replacement values, the number 2_ of required computations in determining a match sco.e for a word is reduced greatly. ~oreover, since the re-placement value is preferably an overestimation, the resulting rnatch score is not less than would have pre-Y0~83-029 ~3~S~7 viously been determined without the replacement. Ac-cordingly, the object of recucing computation wlthout eliminating likely candidate words is achieved.

Another approximation relates to an additional facLor that enters into the determination of 2 match score be-tween a word and a string of labels, n~mely a label length dis~ribution associated with each particular phone machine. That is, for each phone machine~ there is a probability distribution that each nUmDer of labels contained between a minimum L i and a maximum number of labels Lmax is produced by the particular phone ma chine. To further facili-ate and reduce computation in 1; accordance ~ith the invention, the l~bel leng~'l proba-bility distribution (between the minimum length and maximum length of iabels) is considered uniform so tha~
the probability of each length of labels (between L .
mln and L ) is the same.
max As an additional refinement! the present invention in-cludes a limitation on the number of labels examined b~-the phone machines in order to determine a match v~lue between a corresponding word and a string of incoming labels. This added feature achieves the objects of re-2; ducing decoding delay and reducing inequalities arising Y09S3-0~9 -8- ~2~S77 from compa-ing match scores for words of different leng~hs.

o Furthermore, it is an object of the invention to derive a list of candidate words b;- means of a basic f25~ ma.ch (which approximates actu~l label probabillties for each label in a given phone with a respective replaccment value) or an alternative fast m~.ch lwhich also approx-imates label length probabilities ~;ith a specifiec value for a given phone) wherein the candida~e words are processed by successive detailed match phone machines and/or a language model in order to arrive at a single word and, as appropriate, some possible alterna~i~es--1~ especially where the fast match machines limit the num-ber of ~ncoming labels examined to de~ermine a ma~ch value for a phone.

In achieving the object of processing the beginnings of words simultaneously, the present in~tention defines words or portions thereof as phonetic baseforms arranged in a tree structure. Each baseform, it is noted, is re-presented by a sequence of phones each of wnich has i~s own phone machine corresponding thereto. For each baseformf a sequence of phone machines extends from ~he 2~ root of the tree. For so long as two or more baseforms have similar phonetic begirmings starting from the root, ~09~ 0~9 g ~ 77 a commcn branch of phone machines is provided therefor Two or more baseforms can hereby be selected or elimi-na~ed as candidate words by simultaneously processing ~hem through Ihe same phone machines at the same time for so long as the baseforms have sir"ilar beginnin~s The object of reducing computation without loss in ac-curacy is thereby rur her achie~ed In a preferred form, the invention r_lates to perforl~ing an acoustic match in a 1inguistic decoder wherein each phone machine therein is characteri~ed as naving (a) a plurality of stales and .ransi~ion pqths between sta-es, (b) transi.ions tr(Sj/Si) havi!g probabili~ies Tti`j) 1~ each of which represents the DroD~qbili-'; OL a ~ransition to a state Sj given a current state Si where Si and Sj may be the same state or diffsrent states, and (c) actual label probabilities wherein each actual label probabil-ity p(ykli';) indicates .he probabili~y that a label y~
is produced by a given phone machine at a given transi-tion from one state to a subsequent state where k is a label identifying notalion; each phone machine including (a) means for assigning to each Yk in said each phone machine a single specific value P~(yk) and (b~ means for 2- replacing each actual output probability p(ykli~j) at each transition in a given phone machine by the single specific value p'(y~) assigned to the corresponding Yk Y0983-0~9 .A 1 0 ~2;3~5~7 Preferably, ~he replacement value is at least as great as the maximum actual label probabilitv for the corre-; sponding Yk label at any transition in a particular phone machine.

The foregoing and o~her objec~s, fea~ures and advantages of the invention will be apparenr rom rhe more par~ic-ular desc-iption of the preferred embodiments o ~he invention, as illustrated in the accompanying dra~ing.

BRIEF DESCRIPTIO~' OF THE DRAI;I~GS

FIG. 1 is an illus~ration of a ~ree s~ruc~ure of baseforms according tO the inven~ion.

FIG. 2 is an illustration of an example detailec match phone machine in accordanse ~ith the invention.

FIG. 3 is a trellis diagram showing the state t ar.si-tions over time for the detailed ma~cn ?hone machine.

FIG. 4 is a general diagram cf a phone machine accord ng to the invention.

FIG. ~ is a diagram showing a fast match computation under predefined conditions.

Y0~83-0~9 ~L~36~77 FIG. 6 is an illustration showing ~he derivation of a starr-time distribution for a phone from the end-~ime S dist-ribution generated by the phone machine correspond-inO to the ~IOSt recent previous phone.

FIG. 7 is an illustration of an alternative fast match phone machine accorair.~ to the in~-en.ion.

FIG. 8 is an illustration o' an ~lternativA fast macch phone machine wherein the number of labels e~amined in the matching procedure is limiced in accordance with the invenlion.
: .

In the drawings, li~.e elements are des.cna~ed with sim-ilar reference numbe~s~ and identical elements in dif-1~ ferent specific embodimencs are designated by iden~ical reference numbers.

Y~5~3-029 ~236S77 DESCP~IPTIC.~ OF PRE~Er~P~ED E~I~ODIMENTS OF THE I~-~E~lIO~
I. TREE STR~CT~P~E OF BAS~FOR`IS

; In Fig. 1, a plurality or -~70rcs are arranged in a tree structure 100. Tlle tree has a root 102 from ~-hich a pluralit~ of baseforms extend. Each baseform is a pho-netic baseform and extends to a leaf of the tree~ eacll leaf of the tree representing a ..orc in the vocabulary.
Pre~erably, the voc2buiary is on the order of SOOO-horis or more, although smaller vocabularies may also be em-ployed.

Each baseform is shohn including a string of pllones, each baseform s~ar~ing at tne roo~ 102 of ~he tree 1~ structure 100 and ending at a respective leaf. A number of baseforms, it is noted, have similar phonetic begin-nings. For so long as a plurality of Dase,~orms na~e similar phonetic beginnings, such baserorms share common initial branches along the tree structure 100. Starting at the tree root 10~, the baseforms 10~ through 118 for manager, managers, memo, memory, memoranda, and memo-randum, respectivel~y, all extend along a common branch identified as the phonetic element MX. Similarly, the baseforms 112 through 11~ further extend together along ~ the branches identified as the phoneti_ elements ~X, EHO, MX. Where the baseforms de?art phonetically, the ~0~83-Q~9 -13~ 6S77 tree structure lG0 branches in different direclions. Tn accordance with the inventior, the malching of ?hone~ic S elemenls ~l~ and then EH0 is performed once for all the baserorms 112 through 118. That is, baserorms are not e.xamined se?aratek- one at a time. Instead, processino takes place along ~ranches in the tree structure 100 so tnat words na~ing similar phonetic oeginnin~s are ?roc-essed simultaneousl5- for so lona as ~heir res?ecti~-e phones lie along a common path of branches. ~y ?rsc-essing along the common path, a number of words ma~- De considered simultaneouslS- for inclusion in a list of candidate words lrom which the actual recognized wora lo may be selected. Li~ewise, a number or WO.da can be eliminated simultaneously.

In Fig. ll the phonetic elemenls are phones each of which corresponds to a conven-tionally definec sound. ~-hen strung together, the phones form word-representinv baseforms. For example, a first pronuncialion of the word "the" is represented by ~he phone DH followed by the phone ~-Hl. Approximately /0 ~o lOG phones, when appropriatel~- selected and s~rung -~ogether, can re?re-sent the words in the English 'anguage.

~; A word --when pronounced in ~ore than way-- is repres-ented in Fig. 1 by distinct strings of conventional Y0983-0~9 -14- ~36~7~

phones, as su~gested b-~ the disLinct baseforms for the two pronunciations of the ~ord "~he", namel~ lHLi and THE2. Alternativel~, the same ~ord mav be represer.ted by a basefor~ of "clinks" wherein each ciin~ can e?re-sent various ?ronunciations of a par-~icular por~ion of a word. The t~-o pronunciations a~ter the DH for the word "the" could be represen~ed by 2 sir.Ole clink which would account for ~he ~Hl and EEl phones. Similar1~T-, a se-quence of phones ma~- be in par311el wilh one or more other phones as alternative pronunciations for the same segment of a ~-ord or string of words --the parallel paths together representing a clink. Darallel p~ths which l~ cross word boundaries can also be re~re.,en~ed by a c.link. Whethsr defininO basefor~s in cerr.ls or conven-tional phones or clinks, similar principles .~re in-volved. Hence, in this patent application, references hereafter to "phones" are considered generic to inclu~e conventional phones, clinks and other sim_la pllonetic elements which represent elemental inputs.

II. PHONE ~1ACHINES

A. I~TROD~CTION

In the present invention, a statis,ical àetermination is made as to ~hich words in the vocabulary have the ~'~983-079 highest proba'oility OL^ hz~ing generaled a string of in-coming labels generated by an acoustic channel. .4s noted herelnbefore, each h~ord is represented by a se-quence of phones. Each phone is, n turn, character zed by a respecti~-e ohone machine. Each phone machine storcs data which provides an indication of the likelihood that the phone corres?oncing there~o generates any label string. ~hen a particular label s,.ing s entcred in~o a gi~en pnone machine, the gioen phonc macnine derer-mines --from its stored aata-- the li~;elihood ~hat he given phone machine oroduces the p.~rticular incom~n~
label string.

l; ~s described below~ phone machinns o~ SeVera1 L~.'peS .1. C
em'bodied in accordance ~-ith the inven~ion. Fi.st~ .he.o i5 a detailed match phone machine ~hich incorporates predefined probabilities into a phone rllodcl. The de-tailed match phone machine is charac~erized D~' ~a) a plurality of states and l_ansitions between the s~ates:
(b) a probability associated hith each transition, that the particular transition occurs; and (c) a probabil'ty that, at a given transition, a particular label is gen-erated by the phone. ~;hile the detailed match phcne 7; machine can be used in determining ~:ith grea~ exactness how closely a phone matches a string of incoming labels, tremendous calculation demands attend the detailed match YQYS3-0''9 -16- ~65~

phone machine. To overcome ~he calcula.ion demands as-sociated with the cetailed match phone ma_nine, a seccnd phone machine is considered by rhe invention. The secvnd phone machine, hereaf.er refer-ed to as a. basic ~~ast match phone machine, mai;cs an appro.~imation ~ha~ sim-plifies calculations relative LO the detailed match phone machine. Specifically~ wherever a given lahcl has cL probability of occurring at any .r_n~i~ioD in a gi~-?.n phone, tha~ probability is assigned â single specific value --which preferably is at least as large as the highest probabilit~ of the label occurring at an,~ tr~rl-sition in the phone. In a further embodiment~ h~reaf-er 1~ referred to as the alternatLve f~st ma~ch, the proDa-bility of a phone genera~ing ~ string o' labels of an~
of a number of lengths is considered unirorm 'oy the pnone machine corresponding thereto. In the alternative fast match phone machine, a minimum Length and a ma~i~um ~0 length are specified bet~-een ~-hich tn_re is defined a length distribution. In the al~ernative fast malch Dhone machine, the probability OL any length ~-ithin the leng~h dis~ribution is repiaced by a single defined ~alue and tr,e probabilit~; of any leng~h ou~side ~he disrribu~ion is zero. The alternative fast malch phone machine can be applied to the basic fast match phone machine to further reduce the amount of calculation required in 'tO~S3-0~9 - 1, - ~.;~36S77 determinin~ whe~her a word defined b~- a string of ~n~ne machines qualifies as 2 candidate t;ord.

In accordance t~ith the in~ention, a list OL- words is e.Ytrac~ed from the vocabulary of~;ords by processing the hords through either basic fast match phone machines or through alternative fast match phone machines that pre~er~bl~- includes both label probabiiity reD~aceme!lt values and length distribution replâct-menl vâlues. .he words in the ex~rac~ed list of words are thereafter processed by detailed match ?hone machines to arrive at a single word. In this regard, i, should be reali~ed c"at a langua,ge model (not described hereinj is also pr~,er-1~ ably provided to eliminate inzppro?riate words ~an-guage models, such as models emplG~-ing tri-grams, h~ve been discussed in the art. Briefk , a language model emploved with the in~ention includes probabililies ror respective three t~ord sequences occurring, such proba-bilities being based on data deri--ed rrom a text. Spe-ciricall~, the language moeel is im?lemented in PL/I on an IB~I 4341 using an IB~i 33S0 dis~ drive ror the lang~ave model s~orage. The IB~I 4341 is also connected to the L~S~
mâtch processor, the detailed match processor. and the ~5 rront end processor. In addition, the IB~' 4341 is also connected to â work station implemented in, for example, Pascal on an Apollo Domain computer to enable a user to ~0983 ~07~
36~77 enter input to the s stem. The wor~ station is connected to the host IBM ~341 through an IB~ person31 computer and a ~,Q'~ commur.icztions conlroller.

E~ch of the above-notec phone machines receives as ~n-puts a start-lime dis~ribution and a string of labels and determines therefrom a match value t~hich SUggeatS
the ll~elihood that a given phone produces a sequence of labels in the string.

Each of ~he above-mentionea typGs of phone machines is now described in greater detail.

B. DETAILD ~TCH PHONE ~IACHI~E

In Fig. 2, a sample detailed match phone machine ~00 is 1~ depicted. Each detailed match phone machine is a ?rob-abilistic finite state machine cnarac~erized bv ~aj a plurality of states Si, (b) a plurality of transitior.s tr(SjlSi), some of the transitions extending between differen~ states and some e~tending from a stale back 7 to itself, aach transieion having associated therewith a corresoonding probabiliry, and (c) for each label that can be generated at a Darticuiar transition~ a corre-sponding actual label probability.

~0~o3-0~9 i77 In Fig. ~, seven states 51 th..ougn 5 are provided and thirteen transitisns trl ~hrou~h trl~ are pro~lded in ; Ihe detailed match phone macnine ~OO. A review o' Fig.
'~ shows that phone machine '~OO has three transitions with dashed line paths, namely transitions trll, trl~, and trl3. At each of these three transitions, the Phone can change Lrom one state to ano~her with3ut prod~cing a label and such a transi,ion is, accordirgl~-~ referred to as a null transition. Along transitior.s Lrl through trlO labels can be produced. Speciri^all!,-, along each transition trl through trlO, one or more labels may ha~-e a distinct probability of being gener~.ed the.eat.
1~ Preferably~ fo- each transl.tion ~here is a ?robabil e;
associated ~ith e~ch label ,h-t c~n be ge~er2ted in the system. Tha~ is, if there are ~wo hundred 'abels ~ha, can be selectively generated by the acousric channel, each transition ~that is not a null) has ~wo hundred "O "actual label probabiliLies" associated rherewith --each of which corresponds to the p-obability ~hat a corre-spondinv label is generated b~- the phone a, the partic-ular transition. The actual label probabilities for transition trl are represented b~ tne symbol p followed rr, b~- the bracketed column of numerais 1 through ~OO~ each numeral representin~ a given label. rOr label 1~ ~here is a probability pll¦ that the detailed pllone machine "OO generates the label 1 at transition trl. The ~rarious VQ98~-0~9 actual label pro;~abilities are stored with relacion to the labei and a corres?onding Iransiticn.

S When ~ st.ing of labels YlY~Y~--- is presented to a de-tailed match phone machine ~uO corresponding to a Oiven phone, a match procedure is performed. The procedure associated wilh the detailed malch phone machir.e is e.~;-plained ~.~ith rere_ence to Fi~. 3.
.

i.C Figure 3 is trellis diagram of the phone machine oi Fig.
2. As in the phone machine, the Ireliis diagram shohs a null transition from slate Sl to state S_ and ~ransi-tions from state Sl to sta e S~ and ' om sla~e Sl ~o state S4 The transitions be~i.;ecn othor s~ales are ~lso 1~ illustrzted. The trellis diaOram also sho~;s time meas-ured in the horizon~al direction. Start-time probabili-ties qO and ql represenl the probabilities that a pnone has a start time at time t=to or t=tl, .espectivaly, for the phone. At each start time tQ and 'l~ the various transitions are shown. It should be noted, in this re-gard, that the inrerval belweer. successive atart ~and end) times is preferably equal in length ~o ~he time inter~al of a label.

In employinO the detailed match phone machine ~00 to determine how closely a given phone matches the labels :.,.,,., .: :

Y09~ 9 57~7 -Zl-of an incoming string, an end-time distribution for the phone is sought and used in determining a match value for the phone. rne notion of rel~-ing on the end-ti~.e distribu-tion is common to all embodiments of phone ma-chines according to the inven~ion. Tn genera~ing the end-time distribution to perform a delailed match, the delaiied match phone machine 200 involves computations whicr. are exact and complicated.

Loo~in~ at the trellis diagram of ~ig. 3, we first con-sider the computations required ~o have both a start time and end time at time t=to. For this to be the case accorting to the example phone machine structure sec lS forth in Fig. 2, the following probaDilit-; aDpi~es:
Pr(S7~t=tO) = qO.YT(1-'7) ~ pr(s~r=to)~:r(~
Pr(S3,t=tO)xT(3~7) (3) where Pr represents "probabilit~ of" and T represents the transition probabilit~- bet-~een the two parenthet-ically identified states. The above equation indica~es the respective probabilities '~or the three conditions under which the end time can occur at time t=to. 'lore-over, it is obse.ved tha~ the end time a. ~=to is limited in the current exampie to occurrence at state S7.

2S Loo~:ing next at the end time t=tl, it is noted that a calculation relating to ever~ state other than state S

~98~-0~ 3~577 must be c,ade. The state Sl starts a~ -ne end time of the previous phone. For ?urposes of e~planation, only the calculations pertaining to state S, a.e set for~h.

~or state S4, the calculation is :
Pr(S4,t=tl)= Pr(Sl,t=tO)xT(1~~4)xPr(yl1~4) Pr(s4~t=tO)~T(4~ Dr(vl4-~4) (4) In words, the equation ~4) set .orth mme&iately ab3~-e indicates that the ?robabilit; of the p!~one machine be-ing in state S4 at -ime t=tl is de?endent on the sum of the follor~ing two terms (a! the probabilitv of oein~ at state Sl at time t=t~ multiplied Dy t~e probâL)ilit-. !-.;
of the transition from state Sl ro s~ate S c.ul~i~iied l; further by the probability (Pr) of a Oiven label ---.--in the string being generated gi~ren a transition rrom state Sl to state S4 and (b) the probability of beir.g at state S4 at time t=to ~ultiplied by the probabili~y of the transition from state S, to itself and further multiplied by the probability of generating the given label --y-- during and gi~en the transition from state S to itselE.

Simil2rly~ calculations pertaining to the other states (excluding sta'e Sl) are also performed to generate corresponding probabilities that the phone i5 at a par-Y0983-029 ~236S ~7 ticular slate at time t=tl. Gensrall~-, in de~ermin_ng the probabilit~ of bein~ at a subject state at a given time, .he inven~isrn ~a) recogni7es each previollâ sta-e that has a rransition which leads co the subject slate anci the respec ive probability of each such previous state; (b) recognizes, for each such pre~ious s~ate, a value rep.esentirlg ~he prosâbilit~7 of the label that must be genera~ed at the trar.sition berween each such previous state and .he current state in order to con.orm the label slring; and (c~ combines the probability of each pre~ious slate and the resoecrive vdlue r~pres-enting the label probaDilic~; to provids a subjecr. slare 1~ probability over a correspollaing transition, rhe overall probability of beino at che subiec~ sr.a~e beinO deter-mined fxom the subject stare probLbilities ove. all transitions leading thereto. The calculation ror state S7, it is noted, includes lerms relating to he three O null transitions which permit the phorle to start and end at time t=tl with the phone ending in state S7.

As with the probabilit~ dererminations relLrive to timss t=to and t=tl, probabili~~- determi-lations for a series of other end times are prefe-abl~ g~nera.ed IO form an ~~ end-time distribution. The value of ths end-time dis-tribution for a given phone provides an indica-cion of ho~ well the given phone marches the incoming labels.

Y09~3-0~9 3L;;~3~S77 ", In determining hot.~ woll a word m2tches a st~ing of in-coming labels, the phcnes wnich represent the ~-ord ~re processed in seauence. Each phone generates ar. end-time distribution of probability v21ues. a match value ror the phone is obtained b~T summing up the end-time prcba-bilities and then taking the loOarithm of that sum. A
starl-time distribution for the ne~-.t phone is deri~ed b~- normaii~ing Ihe end-time dis.ribution b~-~ for e~am-ple, scaling each value thereof b~- di~-idinc each ~alue by the s~lm so thaL the sum of scaled ~-alues totals one.

It should be reali ed that the present in~ ntion cou-templates two rnethods of determinirlg h~ ~he num~er of phorles to be e.~amined for a gi~en word or word s~rlng.
III a depth first method, compu~atiorl is made along a baseform --compu~ing a running sublo~al with each SIIC-cessive phone. ~hen the subtotal is found to be beiow a predefined threshold for a given phone position ~0 therealong, the computation terminates. t~lternati~el~-, in a breadth fi st method, a compu-ation for similar phone positions in each word is made. The CCmDUtatiOns following the iirsL phone n each word, the second phone in each word, and so on are made. In the breadth first 2.~ method, the computaticns along the same number of phones for the various words are compared at the same relati~e phone positions Iherealong. In either method, the '~09~3-079 - ~, ~;~;3~77 word(s) ha~7ing the l~rgest sum or match values is ~~e sought obie Ct .

; As noted previousl~, a language moael which stor~s in-rormation --such as tri-grams-- relating to words in context ma~ be included to enhance the prob2Dilit~ of a correct ~ord selection. Langua~,e modsls have bee~ re-ported in the iiterature.

'0 ~le detailed match has been impiemenled in .~PAL ~;rray Processor Assembl~j Language) which is the native assem-bler for the Floating Poin~ S~-ste~.s, Inc. l90L. In his regard, it should De recognized ~hat the ie ailed m.atch reauires considera~le memor~ for stor_ng each o, the actual label probabilities (i.e., the probabilit-j- th~t a given phone generales a gi~-en label ~- at a given transition); the transition probaDilities for eacn phone maohine; and the probaDilities of a given phone being at a given state at a gi~en ~ime after a defined start time. '~e above-r.oted i90L is set up to ma~e the .arious computations of elld times, match values based on ,for example, a sum --prererabl~- the l-,garithm of the sum of end time probabilities; start times based on the prsvi-ousl~ generared end time probabilities; and ~orc match 2~ scores based on the match values for sequential phones in a ~ord.

'~09&3-~ 7 ~
- 7 6- ~ ~3~57~

Because the detailed match is c~mputationally eY~pensive~
the present in~-ention includes a basic fest mz~ch and o an aiternative fas~ match which reduces the compula ion requirements withou~ sacrificing accuracy.

The fast ma~ch embodiments are employed to define a list of on the order of tan ,o one hundred candida~e words selected as the mos~ ely words in he vocabulary to lG correspond to the incoming labels. The candida e words are preferably subjected to the lan~ua~e model and to the detailed match. By paring the number of ~-ords con-sidered by the detailsd match to on the order of 1~ of the words in the vocabulary, rns compu~ational cos~ is '5 greatl~7 reduced while accuracy is main~~inea.

C. BASIC FAST MATCH

The basic fast match simplifies the detailed match by replacing with a single value ~he actual label prooa-bilities for a given label at all transitions a~ which ~0 the given label may be gener~ted in a given phone ma-chine. That is, regardless of the transition in a given phone machine ~-hereat a label has a probability of oc-curring, the probabilily is replaced by a single spe-cific value. The value is preferably an overestimate, being at least as great as the largest probabllity of ~0~83-079 ~36577 .
the label occur-ing at any transition in ~he given phone machine.

; By setting the l~bel probabilit~- replacement value 25 the maximum of the ac;ual label probabililiss fo- the given label in the gi~en phone machine~ it is assured that the match v~lue generated with the basic fast~.asci~
is at least as high as the match ~-aiue th;.t would .2s~1 from employing the detailed ma~ch. In ~hls way, the basic fast match typic~lk- o~,-eres~imates th" matCil ~-21u~
of each phone so that more words are generall selected as candidate'words. ~ords considered candisi~es acco c-ing tG the dstailed match also pass muste~ in accordanc~
with the basic fast match.

Referring to Fig. 4, a phone machine 40G for the basic fast match is illustrated. Labels ~aiso referred to as symbols and fenemes) enter the basic fast match ?hone machine 400 together ~ith a start-time distribution. Ths ~0 start-time distribution and the label string inpu~ is like that entering the detailed match pllone machine de-scribed hereinabo~-e. I should be realized chat the start time may, on occasion, not be a distribution ~ver a piurality of tlmes but may, instead, represent a pre-~; cise time --for example following an interval of si-lence-- at which the phone begins. ~ihen speech is Y0983-0~9 6~i~77 soncinuous, however, Ihe end-time dist ibution is used to define the start-cime dis~ribution (as is discussed o in greater delail hereinbelow). The phone macr.ine 400 generates an end-time distribution and a match v~luo for the particular pnone from the generaced end-time dis-tribution. The match score for a ~ord is defined as the sum of match v~lues for co3ponent phones --~t least tne rirst h phones in the ~;ord.

Referring no~ to Fig. ~, z diagram of a basic fast match CompUtaliOn is illuscrated. The basic fast match com-putaeion is only concerned ~ h the start-lime di3trib-ution, the number --or length Ot- labcls-- prcduced by lo the phone, and the replacement values D~ associat-d with each label Yk. By subst tuting all aclual label probabilities for a given label in a given phone machine by a correspondlng replacement value~ the basic fast match replaces transition prob bilities with length distribution probabilities and obviat s the need for including actual label probabilities ~hhich can difrer for each transition in a given phone macnine) and prob-abil ties of being at a given scate at a given ti~e.

In this regard, the length distributions are determined 2~ from the detailed match model. Specifically, for each length in the length distribution, the invention pref-Y0~3-0~9 3~577 erably examirles each slate individuall. and determines for each state the various ~ransition paths b~; ~nich the currentlv ex2mined state can occur ~a? glven a partic-ular label length and (b) recardlsss of the out?uts along the transitions. The probabilities for all tran-sition paths of the particular length to each subject s,are are summed and the sums for ~11 the subject states are ,hen adaed to indicate the probaDilit~ of a Oi--en len.gth in the distribution. The above procedu-e is re-peated for each length. In accordance ~i~h the pre~erred form of Ihe invention, these computations are made with reference tC a trellis diagram ?S is known in ~he ~rt 1~ of ~arko~ modellinc. For transi~ion ?aths w-hich share branches along the trellis s~ructure, the compll~aticn for each com~on branch need be maae onl)7 once and is applied to each path that includes the common branch.

In the diagram oE Fig. ~, th-o limita~ ons are inciuded bv hay of example. First, it is assumed that the length of labels produced b~- the phone can be zero, one, tho, or three having respective probabilities of lo~
and 13. The start time is also li~.ited, permi .ir.g onl~;
four start times having respective probabilities OI q0 7~ ql, q7, and q3. With these limitations, the following equations define the end-time distribution of a subject phone as:

~0983-Q~9 ~23~577 ~o = qOlo ~1 = qllo + qO1121 _ ~ ~ q A 1 0 q 1 1 2 q~ 3 ql-~P P3 qO 3-1P~P3 q~llP4+ q212P3P4+ql 13P2P3 P4 ~;=q3l2pip5+ q2l3p3p4p;
~= q313p4~

In sxâmining the equa ions, it is ooser~-ea ~hat ~3 in-cludes a ~erm correspondinO to ~ach or rour start times.
The first term represents the probabiii~y that the phone starts at time t=t3 And produces a length of zero l~Dels --the phone slartin.~ ~nd ending a~ the same time. ~le second ter~ represen~s the prob~bili y ~hat ~he pnone starts at time t=~2, that the length or labels is one, and that a label 3 is produced by the phone. The third term represents the prob~bilit~; that the phone star~s at time t=tl, that the len~th Oc labels is two ~nar..ely labels 2 and 3), and that labels 2 and 3 are produced by the phone. Similarlyj the fourth ~erm represents the probability that the phone starts at time t=to; tha~ the lenOth of labels is three; and that the three labels 1 2, and 3 are producsd by the phone.

7~ Comparing the computations recuired in the basic fast match with those required by the detailed match su~gest Y~3-0~ 5~7 the relative simplicit~ of the former relative to the latter. In this regard, it s noted that the p'y ~alue remains the s2me for each a?pearance in all ~he equations as do the label length probabilities. `lore-over~ with Ihe length and start time~limitations~ .he computations for the later end times become simpler. For e.~ample, at ~6~ the phons must start at time t=t and lQ all three labels 4, ;, and 6 mUsl be produced o~- che phone for that end time to ap?ly.

In generating a match value for a su~joct p~one~ the end time prob~bilities along the deiined ond-time dislrib-ulion are summed. If desired, the iog of t~e sllm is ta~n~n :1~ to provide the e.YprCSSiOn:
match value - logi0(30 l --- +

As noted previously, a match score for a word is readil~
determined by summlng the malch values for successi~e phones in a particular ~-ord.

.
: 20 In describing she genera ing of the star~ time distri~-ution, reierence is made to Fig. ~. In Fig. 6(a), the word THE1 is repeated, bro~en down into its compone.nt : phones.~In Fig. 6~b), the string of labels is d~epicted over time. In Fig. 6(c), a first start-time distribution 2; is shown. The first start-time distribution has been .

1.

Y0')83-0~9 ~36~7~
3~

derived rrom the end-time dist~ibution of the most re-cen. previous phone (in the previous word which mav in-o clude a "hord" of silence). ~ased on the label inputs and the start-time distribution of Fig. o(c', the end-time distribu~ion for the phone Dh~ ~DH~ is generaled.
The start-time distribulion for the ne~t phone, ~H, is determined by recognizing the time during which the prs~.~ious pnone enc-time dis~.ibution exceeded a thresn-old (~) in Fig. o(d). ~A) is de~ermine~ inditiGually for each end-time distribu~ion. Preferabl~-~ (h) is a fur.c-tion of the sum of the end-time distribution values for a subject phone. The inter~-al bstween ~imes a and b ~hlls 1~ reprssents ~he time during which ~hs ster~-ti:ne dis-tribution for the pnone ~'H is ss~. (See Fig. ote).) The interval between times c and d in F~g. o~sJ corresponas to the times between which t~.e end-~ime distribution for the phone DH e~ceeds the threshold (A) and bet-weer which the start-time distribution of the ne~L phone is set.
The values of the start-time dis~ribution are obtained bv normalizing the end-time dis~ribution by, for e.~am-ple, dividing each end-time value b~ tlle sum OI the end-~ime values wh ch ex~eed the threshoid (A).

2~ The basic fast match phone machine 400 has been imple-mented in a Floating Point Systems Inc. 190L with an ~.PAL program. Other hardware and softhare mav also be o ~ ~
~LZ~S~

used to develop a specific form of Ihe invention by following the teachings se-~ forth harein.

C. ALTER`;ATI~E FAST `IiTCH

The basic fast match employed alone or, preferably, in conjuncrion -~;ith the detailed match and/or a lar.gua,e model greatlv reduces computation requirements. Io -u.-ther reduce computation21 requiremen,s, the pr~sent ir-vention iurther simplif-es the detailed match bv defining a uniform label length d stribution bet~een t~-o lengths --a minimum length Lmin and ~ maximum len~t;
L . In the basic fas~ match, the ?ror;abilities of a max phone generating labsls cf a given leng~h --namely i~, 1~ 11, 12, etc.-- typically hav2 differing valuss. ~ccord-ing to the alternative fast match, ~he probabilitv for each length of labels is replaced by a single uniform value.

Preferably, the minimum leng h is equal to the smallest ~0 length having 2 non7ero probabilit)~ in the original length distribution, although other lengths may be se-lected if desired. The selection oi the maximum length is more arbitrary than the selection of the minimum length, but is significant in that the probability of lengths iess than the minimum and greater than the max-~&3-0~9 7~

imum ~re set as zero. B~ defining the length r)robabilitv to e.~ist between only th~ mir.imum ler.g~h and the;na~:imum lenath, a uniform pseudo-dis,ribution c&n be set fo:-Lh.
In one approach, t~le uni'orm p-obabil~t~- can be set 2S
the average probabilit~ o~er the pseudv-distribution.
Alternatively, the unirorm probabilit~ can be set as the ma~imum o. the lenOtr. probabili~ies that are replaced bi7 the uniform ~-alue.

The effec~ of characLer,~ing all the l~bel ier~.gth ~rsv-abilities as equal is resdil~- obse~vcd hith reference to the equations se- forth above for the en~-~ime cis-trib-ltion in the bssic f2st ma~ch. Specificali~ .he 1~ length probabilities can be fac~ored OUt ?S a consta!~t.

ith L i being set at zero and all leng~h ?robabilities being replaced by a single constznt value, the end-time distribul;ion can be characterized as:
= ~ /1 = q + ~ P
m ` m m m-l m wllere "1" is the single uniform replacement ~alue and where the ~alue for p corresponds prererably ~o the replacement value for a ~iven label being gener~ted in the given phone at time m.

~or the above equation for ~ , the malch value is defined 2~ as :

Y0983-0~9 3~ ~3~

match ~-alue = log10(~0 + ~1 + ~~ + ~m) glû ~

In comparirg the basic fast match and the alterna~i~e fast match, it has beerl found that -he number of requi-ed additions and multipli-ations are greatl~- reauced by employing the alternative fas~ match phone machines.
~Tith L = û, it has been found that the basic fast mln malch requires fort~ multipllcaLions and ~t.~ent~ addi-lû tions in that the leng~h probabilities must be consid-ered. ~t~ith the alternative fast match, ~ is determined recursively and re~uires one multiplication and one a~-dition for each successi~Te ~ .
m To further illuât~a~s ho~ ~he al;hrnati~e fas- m~tch l; simplifies compu~ations, Figures , and ~ are provided.
In Fig. 7(a), a phone machine emDodimen~ 7ûO corre-sponding to a minimum leng~h L i =û is aepic~ed. The maximum length is assumed to be infir.i e so that ~he length distribution may be characterized as uniform.
In Fig. 7(b), the trellis diagram resulLing -rom the phone machine 700 is sho~n. Assuming -h3t stl,t times af~er q are outside the start-time distribution, all det~rminations of each successive ~ ~;ith m-n require one addition and one multiplication. For determinations 2~ of end times thereafter, there is only one required multiplication and no additions.

~09O3-0~9 -36- ~23~577 In Fig. 8, L i = '~ Fig. ô(a) illustrates a specific embodiment of a phor.e machine 800 therefor and Fig. S(b shows a corresponding trellis diagram. Because L i =' the trellis diagra~ of Fig. 8~b) has a zero probabili ;
aiong the paths mar~ed u, ~r~ h'~ and 7. For those end times which e~tend betheen ~4 and ~ , it is noced that four multiplications and one addition is required. ror end times ~reater than n+4, ons multip]ic~tion ~nd no additions are required. This embodim nt has been imDle-menled in AP.~L code on a FPS l~OL.

It should be noted tha~ additionai s;ates may be adaed to the Fi~. / or Fig. o embodimAnts as c'esired in ~c-1; cordance with the in~ention. For s~:amDlA, ar.y number of states with null ~rarsitions ma~ be included witr~out altering the value of L .
mln D. IIATCHI~'G BASED 0~ FIRST J LABELS

As a iurther refinemenl to the basic fast matcn and al-ternati~e fast match, the p~esent invention contemplates that only the first J labels or a string which enters a phoIIe machine be considered in the match. Assuming that labels are produced by the acoustic processor of an acoustic channel at the rate of one per centisecond, ~5 a reasonable ~-alue for J is one hundred. In other ~-ords, iO'~o3-0~9 ,7- ~z3~77 labels corresponding to on the order of one second of soeech ~-ill be proiided to determine a match be.tween a phone and the labsls entering the phone machine. B~
limiting the number of labels e.iamined, two advantages are realized. FirsL~ decoding dela~ s reduced and, second, problems in comparing the scores of short words ~ith long words are substantially a~oided. ~le len~th o~ J can~ of course~ be variea as desired.

The effect of limiling the mlmber Gf labels e~amined can be noted with reference to the trellis dia~ram of r ~.
8~b). Without the presen. ref~nement, the f2s~ match sco~e i5 the sum o. the p-obabilities OI ~. ~S a]on~ ~he 1; bottom row of the dia~ram. That is. the ~robabi' ir, of being at state S4 at each time sLarting a~ ~=to (for Lmin=O) or t=t4 (for Lmin=4) is determined as a a and all ~ '5 are then totalled. For Lmin=4, ~here is no probability of being in state 54 at an~ ~ime bei'ore ~4.
~O With the refinemen~, the summing of a ~5 termiIlates at time J. In Fig. 8(b), time J corresponds to time t +,.

, Termina~ing the e~amination of J labels over J time in-` :
tervals can result ln the following two probabllit~

summations in determining a match score. First, as de-2; scribed hereinbefore, there is a row calculation along the bottom row of the trellis diagram but only up to the ..

~G983-0~9 ~:~3~77 , ~

time J-l. The probabilities of being in state S4 at each time up to time J-l are summec to form a row score.
; Second, there is a column score which corresponds tO the sum of probabilities that the phone is at each respec-tive state S0 through S4 at time J. That is, the column score is: 4 column score = ~ Pr~Sf,J) f=0 The match score for a ?hone is obtained by summin~ the roh- score and column score and then ~a~ing ~he logarithm of that sum. To contlnue the fast m~tch for the next phone, the values along the bottom row --prefer~bi~- in-cluding time J-- are used to derive the next ?hone l; start-time dislribution.

After determining a match score for each of b consec-utive phones, the total for all phones is, as before noted, the sum of the match scores for all the pnones.

In eY.amining the manner in which the end-time prob~bil-ities are generated in the basic fast match and alter-native fast match embodiments set forth abcve~ it is noted that the determination of column scores does not conform readily to the fast match computations. To bet-ter adapt the refinement of limiting the number of la-bels examined to the fast match and alternative match, '~0~&3-0.~9 39 ~23~577 the present inven~iGn provides tnat ths column ssore be replaced by an additional ro-~ score. That i5, an adci-tional rot.~ score is determined ror the phone being at state S, (in Fi~ure 8(bj) bet~;een t1mes J and Jt~ -~-nere ~ is the maximum number of states in~any phorle machine Hence, if any phons machine has ten states, the present refinement adds ten end times alonO the bo~tom ro~i of the trellis for each of ~hich a probabiliby is deber-mined. All the probabilities along the bott3m ro~ up to and including Ihs prob~bility .?t ~ime J.~ are added to produce a ma~ch score for the given phone. ~s before, consecutive phone match values are summed to provide a 1~ : word match score.

This embodiment has been implementcd in ~?.~L cGde on a FPS l90L; however as with other portions of the in-vention may be implemented hith othe~. codcs on other systems.

~: :
0 E. THE TREE STR~CT'~RE L~D rAST M.~TCH EM80DIMENTS
.

By employing the baalc fast match or al~ernative fas~
match --hith or without the maximum label limitatiorl--the computatlonal time reauired in determining phone match values is tremendously reduced. In addition, the computational savings remain high even when the detailed .

yo983-0~9 ~36~7~

march is performed on ~r.e-~ords in rhe fast match deri.ed list.

Tlle phone match values~ once determined, are compared alon~ the branches of the ~ree structure to determine which paths o' phones are most probable. In Fig. 1, the phone match values for DH and ~Hl should sum to a much higher value for the spo~en ~ord "the" th1n the various sequences of phones brancning from the phone ?1.~:. In th s regard, it should be o~served that the ?hone match ~alue of ~he 'irst ~1~ phone is computed only once and is ~hen used for each baseform e;~tendir.g thererrom. In addition, when the total score calculatetl alon~ a first seauence l; of branches is -ound to be much lo~;er th~n a ~hrcsholc value or much lot~er than the total score tor other se-auences of branches, all b seforms e~tending rrom the first sequence Day be simultaneously e~limina.ed as can-didate words.

~0 With the fast match embodimGnts and the tree structure, an ordered list of candidate hords is generated t~ h greaL computat~onal sa~ings.

F. F~RTHER E~30DI!IE~TS

~ ~Q~ 3657~

Thus, r~hile the in--ention has been described ~ith ef-erence to preferred embodiments thereof, it ~iill be un-; derstood b~T those sl.~illed in the art .hat ~arious changes in form and details may be made withou~ de?ar -ing from the scope of the in~ention. ~or e.~;amDle, al-though it is preferred that each replacement ~-alue be no smaller than the ma.~imum actual prob~bilit~; it re-places, ot;~er methods ma~- also be emplo~;ed to acilier~e the desired ob~ec~ of realizing an overeslimatea ma~ch ~-alue at least mos~ OL the t me.

Claims (40)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. In a speech recognition system which represents each vocabulary word or a portion thereof by at least one sequence of phones wherein each phone corresponds to a respective phone machine, each phone machine having associated therewith (a) at least one transition and (b) actual label output probabilities, each actual label probability representing the probability that a subject label is generated at a given transition in the phone machine, a method of performing an acoustic match between phones and a string of labels produced by an acoustic processor in response to a speech input, the method comprising the steps of:

forming simplified phone machines which includes the step of replacing by a single specific value the actual label probabilities for a given label at all transitions at which the given label may be generated in a particular phone machine; and determining the probability of a phone generating the labels in the string based on the simplified phone ma-chine corresponding thereto.
2. A method as in Claim 1 wherein each string of phones corresponding to a vocabulary word or predefined portion thereof represents a phonetic baseform, the method including the further step of:

arranging the baseforms into a tree structure in which baseforms share a common branch for as long the baseforms have at least similar phonetic beginnings, each leaf of the tree structure corresponding to a com-plete baseform.
3. A method as in Claim 2 wherein the labels are produced at successive time intervals, the method com-prising the further step of:

matching no more than a predefined maximum number of labels with corresponding vocabulary words.
4. A method as in Claim 3 wherein labels occur at intervals on the order of one centisecond each and wherein the predefined maximum length is on the order of one hundred centiseconds.
5. A method as in Claim 1 wherein the specific value assigned to all label probabilities for a particular label in a given phone machine is equal to at least the maximum actual label probability over all transitions for said particular label in the given phone machine.
6. A method as in Claim 5 wherein each phone machine also has associated therewith (c) a length distribution for each phone which indicates probability as a function of the number of labels produced by a particular phone, the method including the further step of:

converting the length distribution for each phone into a uniform probability length pseudo-distribution which includes the step of ascribing a uniform value to the probability of a given phone producing any number of labels defined in the length distributior.
7. A method as in Claim 6 comprising the further steps of:

finding the minimum length of labels having a nonzero probability; and confining the uniform pseudo-distribution to lengths at least as long as the minimum length, lengths less than the minimum length being assigned a probability of zero.
8. A method as in Claim 7 including the steps of, for each phone machine:

inputting a string of labels to the phone machine;

generating a phone match value;

generating a phone end-time distribution; and deriving a start-time distribution for the next phone from the end-time distribution of the current phone;

the match value and the end-time distribution being generated in response to the inputting into the phone machine of (a) labels and (b) the start-time distrib-ution derived from the previous phone.
9. A method as in Claim 8 comprising the further step of:

forming a match score for a given baseform including the step of summing the match values for sequential phones which represent at least a beginning portion of the given baseform.
10. A method as in Claim 9 including the further step of:

selecting a number of words represented by respective baseforms which have the highest match scores.
11. A method as in Claim 10 including the further step of:

performing a detailed acoustic match with an unsimpli-fied phone machine for each word of the selected number of words to determine a most probable word.
12. A method as in Claim 2 wherein each phone ma-chine also has associated therewith (c) a length dis-tribution for each phone which indicates probability as a function of the number of labels produced by a par-ticular phone, the method including the further step of:

converting the length distribution for each phone into a uniform probability length pseudo-distribution which includes the step of ascribing a uniform value to the probability of a given phone producing any number of labels defined in the length distribution.
13. A method as in Claim 12 wherein the specific value assigned to all label probabilities for a partic-ular label in a given phone machine is equal to at least the maximum actual label probability over all transi-tions for said particular label in the given phone ma-chine.
14. A method of performing an acoustic match of words in a vocabulary against a string of labels which represent a speech input, the method comprising the steps of:

entering as inputs to the phone machine of a given phone (a) the string of labels and (b) a respective start-time distribution for the given phone; and generating (a) an end-time distribution and (b) a match value of the given phone relative to the entered labels based on the generated end-time distribution;

the match value generated for each particular phone corresponding to the probability that said each partic-ular phone produced the entered string of labels.
15. A method as in Claim 14 wherein said end-time dis-tribution generating step for each given phone includes the steps of:

determining a start-time distribution for a current phone from the end-time distribution of the previous phone;

setting a uniform length probability for each label string length that (a) may have been produced by a given phone and (b) is between a minimum length having a non-zero probability and a predefined maximum length;

thereby defining a length pseudo-distribution that is characterized (a) as uniform between the minimum length and maximum length and (b) as zero for all other lengths;

setting, for each of a plurality of labels, a label probability replacement value that a given phone machine produces a given label at any transition in the phone machine; and determining the probability of end-times in an end-time distribution based on label output replacement values and start-time distribution.
16. A method as in Claim 15 including the further steps of:

forming a tree of baseforms wherein each baseform ex-tends from the base of the tree to a leaf thereof as a sequence of phones, each baseform representing a word;
and scoring the probability that a given word represented by a respective sequence of phones matches the entered labels, said scoring including the step of summing the match values of phones along the respective sequence.
17. A method as in Claim 15 wherein at least some of the start-time distributions are derived from the end-time distribution of the most recent previous phone.
18. Apparatus for matching words with a string of in-coming labels in a pattern recognition system, the ap-paratus comprising:

at least one phone machine;

each phone machine being characterized by having (a) a plurality of states and transition paths between states, (b) transition probabilities T(i?j) representing the probability of state Sj given a current state Si where Si and Sj may be the same state or different states, and (c) actual label probabilities wherein each actual label probability p(yk) indicates the probability that a label yk is produced by a given phone machine at a given transition from one state to a subsequent state where k is a label identifying notation; and each phone machine including (a) means for assigning to each yk in said each phone machine a single value p'(yk) and (b) means for replacing each actual output proba-bility p(yk) at each transition in a given phone machine by the single value p'(yk) assigned to the corresponding yk.
19. Apparatus as in Claim 18 wherein a p'(yk) for a re-spective label is no less in value than any p(yk) for the respective label yk in a given phone machine.
20. Apparatus as in Claim 18 wherein the length of the string of labels generable by each phone machine is characterized (a) as being between a minimum length and a maximum length and (b) as having a uniform length distribution wherein the probability for each length in the length distribution between the minimum and maximum length is a uniform value;

the probability of label lengths less than the minimum length or greater than the maximum length being set as zero.
21. Apparatus as in Claim 20 wherein said assigning means includes means for representing the value p'(yk) for a given yk in a given phone machine by the maximum p(yk) value for the given yk in the given phone machine.
22. Apparatus as in Claim 21 including means for setting the uniform value of the length probability to be equal to the maximum probability of any length generable by each particular phone machine.
23. Apparatus as in Claim 20 wherein said each phone machine further includes:

first means for generating an end-time distribution for each phone based on (a) the labels entering the phone machine for said each phone and (b) a start-time dis-tribution of said each phone; and second means for evaluating the probability that a par-ticular phone produced an input string of labels based on the generated end-time distribution of the particular phone.
24. Apparatus as in Claim 23 including means for deter-mining each of at least some of the start-time distrib-utions from the end-time distribution of the most recent previous phone.
25. Apparatus as in Claim 24 wherein the minimum length is zero, the apparatus further comprising:

means for characterizing the probability of each end time in the end-time distribution by a value .theta.n which is equal to the end time probability at time tn divided by the uniform length probability; and means for computing .theta.n for successive times tn as:
.theta.n = qn + .theta.n-1p'n wherein p'n represents the replacement value for the incoming string label that is to be generated at time tn by the phone machine to yield the corresponding end time.
26. Apparatus as in Claim 19 wherein said each phone machine further includes:

first means for generating an end-time distribution for each phone based on (a) the labels entering the phone machine for said each phone and (b) a start-time dis-tribution of said each phone;

second means for evaluating the probability that a par-ticular phone produced an input string of labels based on the generated end-time distribution of the particular phone; and means for determining each of at least some of the start-time distributions from the end-time distribution of the most recent previous phone;

wherein said first means includes, for each of a plu-rality of end times which together form the end-time distribution:

means for determining a sum of product terms, each product term corresponding to (a) a start-time probability, (b) a label length probability, and (c) the probabilities of label outputs generated to provide the label length in the product term;
and means for adding the sums of product terms to pro-vide a match score for the phone corresponding to the end-time distribution.
27. A method of determining at least one word in a vo-cabulary having the highest probability of having gen-erated a given string of incoming labels that were produced in response to a speech input. the method com-prising the steps of:

characterizing each word as a sequence of phonetic ele-ments, wherein each phonetic element has (a) a start-time distribution of probabilities qn corresponding to respective successive start times tn, (b) a plurality of states between which transitions occur, (c) a plu-rality of transition probabilities, each indicating the probability that a given transition in a given phonetic element occurs, (d) a plurality of actual label proba-bilities, each actual output probability indicating the probability that a particular phonetic element generates a particular label at a particular transition in the particular phonetic element; and forming an approximate match for a subject word includ-ing the steps of:
replacing all actual label probabilities associated with a given label generated by a given phonetic element at any transition therein with a corre-sponding specific replacement value; and determining for one phonetic element after another in the subject word the probability ?n of a pho-netic element ending at a respective one of a plu-rality of successive end times tn as a function of start-time distribution, the probability of the phonetic element generating a label string of each of various lengths, and the replacement value p'(yk) for each respective label yk that is to be generated by the phonetic element to produce the incoming string of labels.
28. A method as in Claim 27 wherein the approximate matching further includes the step of:

characterizing the label length distribution as uniform between a minimum length and a maximum length with the probability elsewhere being set as zero;

each ?n thereby being a function of start-time distrib-ution, the uniform probability for each length between the minimum length and the maximum length, and the re-placement value p'(yk) for each respective label yk that is to be generated by the phonetic element to produce the incoming string of labels.
29. A method as in Claim 28 wherein the ?n determining step includes the step of:

recursively computing successive values of .theta.n corre-sponding to successive values of ?n divided by the uni-form length;
wherein p'n represents the replacement value for the incoming string label that is to be generated at time tn by the phonetic element to yield the corresponding end time.
30, A method as in Claim 29 including the further step of:

limiting the number of labels examined in determining the probability of ?n for each phonetic element to a maximum of J labels.
31. A method as in Claim 30 wherein the end time proba-bilities are determined for only the first J+K incoming labels where K is the number of states in the corre-sponding phonetic element.
32. A method as in Claim 29 comprising the further step of:

combining the values for the successive .theta.n's to derive a match value for the phonetic element corresponding thereto.
33. A method as in Claim 28 comprising the further step of:

combining the values for the successive ?n's to derive a match value for the phonetic element corresponding thereto.
34. A method as in Claim 33 comprising the further step of:

combining match values for successive phonetic elements in a subject word to provide a word match score.
35. A method as in Claim 34 comprising the further step of:

forming a list of candidate words in order of word match scores, at least most of the words in the vocabulary being excluded from the formed list.
36. A method as in Claim 35 comprising the further step of:

performing a detailed match against each candidate word on the formed list, including the steps of determining for each candidate word the probability that a sequence of phonetic elements representing the candidate word generates labels in the incoming string based on, for each phonetic element, (a) the start-time distribution of probabilities qn corresponding to respective succes-sive start times tn therefor, (b) the plurality of states between which transitions occur therefor, (c) the plurality of transition probabilities therefor, each indicating the probability that a given transition in a given phonetic element occurs, and (d) the plurality of actual label probabilities therefor, each actual output probability indicating the probability that the partic-ular phonetic element generates a particular label at a particular transition in the particular phonetic ele-ment.
37. A method as in Claim 36 comprising the further step of:

performing a language model match on candidate words on the formed list.
38. A method as in Claim 33 comprising the further step of:

selecting the specific replacement values so that the match value for each phonetic element is an overestimate of a match value generated if the actual probabilities were not replaced with the replacement values.
39. A method as in Claim 28 wherein the minimum length is zero and wherein the ?n determining step includes the step of:

recursively computing successive values of ?n corre-sponding to successive values of ?n divided by the uni-form length as:

.theta.n = qn + .theta.n-1p'n wherein p'n represents the replacement value for the incoming string label that is to be generated at time tn by the phonetic element to yield the corresponding end time.
40. A method of performing an approximate acoustic match between incoming labels and words in a vocabulary where each word is represented by a sequence of Markov models, each Markov model indicating states and transitions therebetween, the method comprising the steps of:

determining the probability of being at a subject state at a given time, which includes the steps or (a) recog-nizing each previous state that has a transition which leads to the subject state and the respective probabil-ity of each such previous state; (b) recognizing, for each such previous state, a value representing the probability of the label that must be generated at the transition between each such previous state and the current state in order to conform to the label string;
and (c) combining the probability of each previous state and the respective value representing the label proba-bility to provide a subject state probability over a corresponding transition;

determining the overall probability of being at the subject state from the subject state probabilities over all transitions leading thereto; and combining the overall probabilities for a given phone in a word to determine the likelihood of the given phone producing the string of labels.
CA000494697A 1984-11-19 1985-11-06 Apparatus and method for performing acoustic matching Expired CA1236577A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US67297484A 1984-11-19 1984-11-19
US672,974 1984-11-19

Publications (1)

Publication Number Publication Date
CA1236577A true CA1236577A (en) 1988-05-10

Family

ID=24700799

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000494697A Expired CA1236577A (en) 1984-11-19 1985-11-06 Apparatus and method for performing acoustic matching

Country Status (1)

Country Link
CA (1) CA1236577A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761996A (en) * 2020-08-21 2021-12-07 北京京东振世信息技术有限公司 Fire disaster identification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761996A (en) * 2020-08-21 2021-12-07 北京京东振世信息技术有限公司 Fire disaster identification method and device
CN113761996B (en) * 2020-08-21 2023-11-07 北京京东振世信息技术有限公司 Fire disaster identification method and device

Similar Documents

Publication Publication Date Title
CA1257697A (en) Speech recognition system
CA1242028A (en) Speech recognition system with efficient storage and rapid assembly of phonological graphs
EP0750293B1 (en) Triphone hidden Markov model (HMM) design method and apparatus
US5835888A (en) Statistical language model for inflected languages
CN111916058B (en) Speech recognition method and system based on incremental word graph heavy scoring
Jelinek Continuous speech recognition by statistical methods
CN111210807B (en) Speech recognition model training method, system, mobile terminal and storage medium
CN111145729A (en) Speech recognition model training method, system, mobile terminal and storage medium
Bahl et al. Automatic phonetic baseform determination
JPH02273795A (en) Continuous speech recognition method
Brown et al. Automatic speech recognition in machine-aided translation
CN101515456A (en) Speech recognition interface unit and speed recognition method thereof
Ramabhadran et al. Acoustics-only based automatic phonetic baseform generation
Bahl et al. Preliminary results on the performance of a system for the automatic recognition of continuous speech
US7454341B1 (en) Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system
CA1236577A (en) Apparatus and method for performing acoustic matching
US20080059183A1 (en) Parsimonious modeling by non-uniform kernel allocation
CN111105799B (en) Off-line voice recognition device and method based on pronunciation quantization and electric power special word stock
Wolfertstetter et al. Structured Markov models for speech recognition
Eide Automatic modeling of pronunciation variations.
Paul The Lincoln large-vocabulary stack-decoder based HMM CSR
CA1246229A (en) Apparatus and method for producing a list of likely candidate words corresponding to a spoken input
JP3873418B2 (en) Voice spotting device
CA1259411A (en) Speech recognition employing a set of markov models that includes markov models representing transitions to and from silence
EP0238689B1 (en) Method for performing acoustic matching in a speech recognition system

Legal Events

Date Code Title Description
MKEX Expiry