NZ243055A - Speech recogniser: neural network and dynamically programmed pattern recogniser operate in parallel - Google Patents

Speech recogniser: neural network and dynamically programmed pattern recogniser operate in parallel

Info

Publication number
NZ243055A
NZ243055A NZ243055A NZ24305592A NZ243055A NZ 243055 A NZ243055 A NZ 243055A NZ 243055 A NZ243055 A NZ 243055A NZ 24305592 A NZ24305592 A NZ 24305592A NZ 243055 A NZ243055 A NZ 243055A
Authority
NZ
New Zealand
Prior art keywords
speech
neural network
recognition
vocabulary
pattern
Prior art date
Application number
NZ243055A
Inventor
Heidi Hackbarth
Original Assignee
Alcatel Australia
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alcatel Australia filed Critical Alcatel Australia
Publication of NZ243055A publication Critical patent/NZ243055A/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Abstract

An apparatus and method for speech recognition in the case of a progressive expansion of the reference terminology, particularly for automatic telephone dialling by voice input. Neuron-type and conventional recognition methods are combined with one another in such a manner that during the training and the (re-)configuration of the neuron network, a conventional recogniser, which operates in accordance with the principle of dynamic programming, is provided with the added word patterns for immediate classification as reference. After conclusion of the learning phase, the neuron network takes over recognition of the entire terminology. <IMAGE>

Description

<div class="application article clearfix" id="description"> <p class="printTableText" lang="en">2430 55 <br><br> Priority Date(s): . I <br><br> uwinpidie Specification Filed: £.*?. f&amp;., ;Publication Date: . 2 .5 . AUG. J99.5 P.O. Journal, No: .. ?T.. ;TRUE COPY ;-8 JUN 1992 ;RiZCi-iVED ;NEW ZEALAND PATENTS ACT 1953 COMPLETE SPECIFICATION ;"AN ARRANGEMENT AND METHOD FOR SPEECH RECOGNITION ;WE. ALCATEL AUSTRALIA LIMITED, (ftCN Ooo oos A Company of the State of New South Wales, of 280 Botany-Road, Alexandria. New South Wales, 2015, Australia, hereby declare the invention for which we pray that a patent may be granted to us, and the method by which it is to be performed, ;to be particularly described in and by the following statement: ;I ;£430 5*5 <br><br> This invention relates to an arrangement with a neural network and a method for speech recognition by successive extension of the reference vocabulary. <br><br> It is well known that neural networks with a hierarchical network structure can be used for pattern recognition. In such networks, every element of a higher level is 5 affected by elements of a lower level, with cach clement of a level typically being connectcd to every element of the level below it (e.g. A. Krausc, H. Hackbarth, "Scaly Artificial Neural Networks for Speaker-Independent Recognition of Isolated Words", IEEE Proceedings of 1CASSP 1989, Glasgow UK, with additional references). For specch recognition, neural networks offer the advantage of inherent robustness 10 against disturbing noise, compared to conventional techniques. <br><br> A fundamental disadvantage of neural techniques, however, lies in relatively long training phase, with currently available computers. If, in actual use of a neural specch rccogniscr, a vocabulary extension by only one word is envisaged, the whole network must be rc-traincd. Additional output elements arc added and all weighting 15 parameters arc recalculated. This means that a word newly entered into the vocabulary can only be recognised after the end of this learning phase, which takes place off-line and, in some circumstances, can last for hours. <br><br> Among other conventional methods, so-called dynamic programming is well known for pattern recognition. In this case, a word first spoken for learning by the 20 specch rccogniscr is immediately stored as a reference pattern in a speech pattern memory (eg. H. Ncy, "The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition", IEEE Transactions on Acoustics, Specch and Signal Processing, April 1984, with additional rcfcrcnccs). This method provides the advantage that the reference pattern is available within a few seconds for classification 25 by the speech rccogniscr. <br><br> 2 <br><br> 2430 5 5 <br><br> However, a disadvantage of dynamic programming lies in the higher sensitivity to disturbing noise, compared to neural techniques. <br><br> Recognition in real time is provided by both methods for small speech vocabularies (approximately 70 to 100 words, depending on the processor capability). <br><br> An object of the present invention is to provide a robust specch recognition system which is immediately available after successive extension of the vocabulary with single words or groups of words. <br><br> According to a first aspect of the invention there is provided an arrangement with a neural network for specch recognition, wherein in addition to the neural network, a conventional rccogniscr is provided which operates in accordance with the principles of dynamic programming and immediately stores all newly-spoken words which are to be added to the vocabulary of the arrangement, simultaneously with their being processed in the neural network, in the form of reference patterns into a speech-pattern memory which can be accessed by the conventional rccogniscr for immediate classification of the reference pattern. <br><br> According to a further aspect of the invention there is provided a method for specch recognition with successive extension of the rcfcrcncc vocabulary, wherein the combination of neural and conventional methods such that a word first spoken for learning by the specch recognition device - <br><br> a) is stored in a speech-pattern memory as a reference pattern, and becomes available for immediate classification by a conventional rccogniscr operating according to the principle of dynamic programming, and b) simultaneously initiates the training and (rc)configuration of the neural network. <br><br> An extension of the vocabulary by means of the neural network requires a long- <br><br> duration retraining of the whole network, taking up to several hours. The conven- <br><br> 243055 <br><br> tional rccogniscr, which operates in accordancc with the principle of dynamic programming, is provided with the newly entered word patterns as a rcfcrcncc for use in immediate classification. <br><br> Preferably, the conventional rccogniscr can be activated for recognising a word 5 spoken during the training phase of the neural network, either for the newly entered word patterns only, or for all word patterns, until the -basically more robust - neural network is trained for the extended word pattern and again takes over the recognition of the complete, now extended, vocabulary. With the aid of the invention it is possible to rccognisc spccch also during the training phase of the neural network. 10 In order that the invention may be readily carried into effect, telephone dialling by means of spccch input will be dcscribcd with the aid of two flow chart diagrams, in which: <br><br> Figure I shows the very first speaking of the vocabulary N, as Case 1; <br><br> Figure 2 shows the extension by the vocabulary M, as Case 2. <br><br> 15 When the vocabulary N is first spoken (Figure 1). the names spoken by the user <br><br> (possibly several times) arc stored in a spccch-pattcrn memory as rcfcrcncc patterns. This takes placc during a period of the order of seconds. At the same time as this, the newly-spoken names arc processed in the neural network. In this situation, the neural network is trained and configured over a period of several hours. A name 20 entered during the training of a neural network, for the purpose of dialling a telephone conncction by spccch, activates the conventional rccogniscr. which operates in accordancc with the principle of dynamic programming and compares the just-spoken name with all the rcfcrcncc patterns stored in the spcech-patlcrn memory, and carries out a classification. The training of the neural network continues to run as a back-25 ground program. After completion of the training and configuration of the neural <br><br> 4 <br><br> 243055 <br><br> network, the recognition of the names spoken for the purpose of dialling a telephone connection takes place exclusively by means of the noise-resistant neural network. <br><br> If now the list of subscribers N capable of being dialled is to be extended by M names, as shown in Figure 2, the names spoken by the user arc again stored as reference patterns. Simultaneously with this, it is required to extend the output layer of the neural network by M neural elements, to establish the corresponding connections and to retrain the weightings of the connections between all elements. This re-configuration of the neural network, in turn, takes several hours. However, the previous neural network, trained for N, remains intact. <br><br> !f the user wishes to use the automatic dialling facility during this learning phase, and speaks a name, the retraining is interrupted and both rccogniscrs arc activated. On the one hand, the "old" neural network compares the just-spoken name with the N-Vocabulary. The result is a proposed word with a probability value. On the other hand, the just-spoken name is compared conventionally with the M newly-entered reference patterns in the spccch-pattcrn memory, and there also a word is proposed with a probability value. In this situation the classical algorithm determines with which of the newly acquired names the just-spoken one agrees best, and how well it docs so. The larger of these two values, after appropriate normalisation, determines the word probably spoken, so that finally a single candidate is provided for the spoken name. After conclusion of the learning phase, the more robust neural network again takes over the recognition cf &lt;lic whole vocabulary N + M. <br><br> The method described allows for some variants: <br><br> During an input of spccch (use of the automatic dialling facility), while the neural network is being extended by the vocabulary M and retrained, the conventional rccogniscr takes over the classification of the entire vocabulary, ic. the comparison of the rcfcrcncc patterns for N and M. The method then proceeds in the manner <br><br> 24 30 5 5 <br><br> described with the aid of Figure 1 in connection with the very first speaking of the vocabulary. The previous neural network, trained for N, docs not need to be retained in this case. It also becomes possible to dispense with the costly probability determinations and their normalisation for the purpose of combining the two methods. Admittedly, this simplification can lead to diminished resistance to disturbing noise. <br><br></p> </div>

Claims (5)

<div class="application article clearfix printTableText" id="claims"> <p lang="en"> V<br><br> k era T*<br><br> 11■ 1<br><br> What we claim is:-<br><br>
1. A method of speech recognition with successive expansion of a reference vocabulary, including a speech recognition device comprising a neural network and a speech recognizer, wherein in response to a word being spoken for the 5 first time to train the speech recognition device the method comprises:<br><br> a) storing the word spoken for the first time as a new reference pattern in a speech pattern memcv and making this new reference pattern available for immediate use in making a recognition decision by the speech recognizer operating according to dynamic programming principles; and 10 b) simultaneously initiating training and configuration of the neural network to subsequently recognize the word spoken for the first time, wherein an already existing neural network is maintained until the training and configuration of the neural network are completed;<br><br> a word spoken, during the training of the neural network, for recognition 15 by the speech recognition device interrupts the training of the neural network and activates the existing neural network to furnish a first probability value from a previous vocabulary for the word spoken during training and simultaneously activates the speech recognizer which compares the word spoken during training with the new reference pattern from the speech pattern memory and 20 determines a second probability value; and the first and second probability values are standardized and compared with one another to make a recognition decision.<br><br> 7<br><br> 243055<br><br>
2. A method as claimed in claim 1, wherein a word spoken during the training of the neurai network, for recognition by the speech recognition device activates only the speech recognizer which thereafter compares the word spoken during training with all reference patterns from the speech pattern<br><br> 5 memory, including the new reference speech pattern, and makes a recognition decision.<br><br>
3. A method as claimed in claim 2, wherein upon completion of the training and configuration of the neural network, the neural network exclusively takes over recognition using the now expanded vocabulary.<br><br> 10
4. A speech recognition device comprising:<br><br> pattern recognition means, for receiving input speech, processing the input speech to form reference speech patterns, storing the reference speech patterns as a first vocabulary during an initial training operation, for subsequently receiving input speech, processing the input speech to form input 15 speech patterns, and comparing the input speech patterns with the previously processed reference speech patterns to look for a match during a recognition operation, and for forming an expanded vocabulary of reference speech patterns, including the first vocabulary reference speech patterns and at least one new reference speech pattern; and 20 neural network means operating in parallel with the pattern recognition means, for receiving the input speech and processing the input speech to form a first neural network corresponding to the first vocabulary during an initial<br><br> 2430 h §<br><br> configuration operation, for subsequently receiving input speech and processing the input speech to reach a recognition decision, and for reconfiguring the neural network to form a second expanded neural network, including the first neural network corresponding to the first vocabulary, when subsequently input speech is received which does not result in a positive recognition decision, wherein during the reconfiguration of the neural network, upon the inputting of speech, the reconfiguration is temporarily stopped, speech recognition operations are performed on the input speech by both the pattern recognition means, using the expanded vocabulary, and the neural network means, using the first neural network, in parallel, results of the respective recognition operations are assigned probability values, the probability values are compared, the result having the highest probability value is selected, and the reconfiguration of the neural network is subsequently continued.<br><br>
5. A method substantially as herein described with reference to Figures 1 - 2 of the accompanying drawings.<br><br> ALCATEL AUSTRALIA LIMITED (A.C.N. 000 005 363)<br><br> B. O'Connor Authorized Agent P5/1/1703<br><br> 9<br><br> </p> </div>
NZ243055A 1991-06-20 1992-06-08 Speech recogniser: neural network and dynamically programmed pattern recogniser operate in parallel NZ243055A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE4120308A DE4120308A1 (en) 1991-06-20 1991-06-20 DEVICE AND METHOD FOR RECOGNIZING LANGUAGE

Publications (1)

Publication Number Publication Date
NZ243055A true NZ243055A (en) 1995-08-28

Family

ID=6434327

Family Applications (1)

Application Number Title Priority Date Filing Date
NZ243055A NZ243055A (en) 1991-06-20 1992-06-08 Speech recogniser: neural network and dynamically programmed pattern recogniser operate in parallel

Country Status (5)

Country Link
EP (1) EP0519360B1 (en)
AT (1) ATE148253T1 (en)
AU (1) AU658635B2 (en)
DE (2) DE4120308A1 (en)
NZ (1) NZ243055A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19540859A1 (en) * 1995-11-03 1997-05-28 Thomson Brandt Gmbh Removing unwanted speech components from mixed sound signal
DE19942869A1 (en) * 1999-09-08 2001-03-15 Volkswagen Ag Operating method for speech-controlled device for motor vehicle involves ad hoc generation and allocation of new speech patterns using adaptive transcription

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4949382A (en) * 1988-10-05 1990-08-14 Griggs Talkwriter Corporation Speech-controlled phonetic typewriter or display device having circuitry for analyzing fast and slow speech
GB8908205D0 (en) * 1989-04-12 1989-05-24 Smiths Industries Plc Speech recognition apparatus and methods

Also Published As

Publication number Publication date
EP0519360A3 (en) 1993-02-10
EP0519360B1 (en) 1997-01-22
AU658635B2 (en) 1995-04-27
DE4120308A1 (en) 1992-12-24
ATE148253T1 (en) 1997-02-15
AU1828392A (en) 1992-12-24
DE59207925D1 (en) 1997-03-06
EP0519360A2 (en) 1992-12-23

Similar Documents

Publication Publication Date Title
US5842165A (en) Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes
US5895448A (en) Methods and apparatus for generating and using speaker independent garbage models for speaker dependent speech recognition purpose
CN1248192C (en) Semi-monitoring speaker self-adaption
US5983177A (en) Method and apparatus for obtaining transcriptions from multiple training utterances
JP4180110B2 (en) Language recognition
JP2963142B2 (en) Signal processing method
US5452397A (en) Method and system for preventing entry of confusingly similar phases in a voice recognition system vocabulary list
DE60125542T2 (en) SYSTEM AND METHOD FOR VOICE RECOGNITION WITH A VARIETY OF LANGUAGE RECOGNITION DEVICES
US6076054A (en) Methods and apparatus for generating and using out of vocabulary word models for speaker dependent speech recognition
US5664058A (en) Method of training a speaker-dependent speech recognizer with automated supervision of training sufficiency
EP1022725B1 (en) Selection of acoustic models using speaker verification
US20020091522A1 (en) System and method for hybrid voice recognition
CA2136369A1 (en) Large vocabulary connected speech recognition system and method of language representation using evolutional grammar to represent context free grammars
JPH0422276B2 (en)
JP2001509285A (en) Method and apparatus for operating voice controlled functions of a multi-station network using speaker dependent and speaker independent speech recognition
CN112634867A (en) Model training method, dialect recognition method, device, server and storage medium
US5758021A (en) Speech recognition combining dynamic programming and neural network techniques
EP1205906B1 (en) Reference templates adaptation for speech recognition
EP1159735B1 (en) Voice recognition rejection scheme
CN105472159A (en) Multi-user unlocking method and device
JP2003535366A (en) Rank-based rejection for pattern classification
US6226610B1 (en) DP Pattern matching which determines current path propagation using the amount of path overlap to the subsequent time point
NZ243055A (en) Speech recogniser: neural network and dynamically programmed pattern recogniser operate in parallel
JP2002524777A (en) Voice dialing method and system
WO1999028898A1 (en) Speech recognition method and system