WO2000043990A1 - Speech recognition device including a sub-word memory - Google Patents

Speech recognition device including a sub-word memory Download PDF

Info

Publication number
WO2000043990A1
WO2000043990A1 PCT/EP1999/010302 EP9910302W WO0043990A1 WO 2000043990 A1 WO2000043990 A1 WO 2000043990A1 EP 9910302 W EP9910302 W EP 9910302W WO 0043990 A1 WO0043990 A1 WO 0043990A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
information
sub
speech recognition
stored
Prior art date
Application number
PCT/EP1999/010302
Other languages
English (en)
French (fr)
Inventor
Heinrich Bartosik
Dietrich G. Klakow
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to EP99965533A priority Critical patent/EP1060471A1/en
Priority to JP2000595336A priority patent/JP2002535728A/ja
Priority to KR1020007009795A priority patent/KR20010085219A/ko
Publication of WO2000043990A1 publication Critical patent/WO2000043990A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/197Probabilistic grammars, e.g. word n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams

Definitions

  • Speech recognition device including a sub-word memory.
  • the invention relates to a speech recognition device including a word memory in which word information and assigned phoneme information of at least a first and a second word forming the vocabulary of the speech recognition device can be stored, and including speech recognition means to which speech information containing phoneme information can be applied and which are arranged for determining phoneme information stored in the word memory and corresponding to the applied phoneme information, and for producing as recognized word information the word information stored in the word memory and assigned to this stored phoneme information, and including a sub-word memory in which sub-words forming parts of words can be stored as sub-word information and assigned phoneme information of at least a first and a second sub-word.
  • the invention relates to a speech recognition method for recognizing spoken texts containing new words by means of a speech recognition device including a word memory in which word information and assigned phoneme information of at least a first and a second word forming the vocabulary of the speech recognition device is stored, and including speech recognition means to which speech information containing phoneme information of a spoken text is applied and which determine phoneme information stored in the word memory and corresponding to the applied phoneme information, and which produce as recognized word information the word information stored in the word memory and assigned to this stored phoneme information, and including a sub-word memory in which parts of words forming sub-words are stored as sub-word information and assigned phoneme information of at least a first and a second sub-word.
  • Speech information of a text spoken by a user of the speech recognition device and containing phoneme information can be applied to the known speech recognition device by a microphone.
  • the phoneme information can be applied to speech recognition means of the speech recognition device by which word information recognized by the speech recognition means can be applied as recognized text to a monitor that can be connected to the speech recognition device.
  • the word information of the recognized text can be shown by the monitor.
  • the speech recognition means include a word memory for recognizing word information contained in the speech information.
  • the word memory stores as word information all the words recognized by the speech recognition device, which words form the vocabulary of the speech recognition device. For each word information signal is stored phoneme information forming a phoneme sequence featuring the associated stored word.
  • the speech recognition means determine the word as a recognized word whose phoneme sequence stored in the word memory corresponds the most to the part of the phoneme information of the speech information that represents the new word. After the speech recognition method has been executed, the recognized text with the word erroneously recognized for a new word in the recognized text is shown on the monitor. A user of the known speech recognition device can then correct the spelling of the erroneously recognized word into the correct spelling of the actually spoken new word.
  • the known speech recognition device includes a sub-word memory in which parts of words forming sub-words can be stored as sub-word information and assigned phoneme information.
  • the known speech recognition device is arranged for determining the phoneme sequence of the new word and for the associated storing of the word information and phoneme information of the new word in the word memory by comparing sub-words contained in the new word with sub-words stored in the sub-word memory. As a result, the vocabulary of the known speech recognition device is enlarged by the new word.
  • One of the possible word sequences would be determined as the word sequence having the highest overall probability, calculated on the basis of the transition probabilities of the words of the word sequence, and would be produced as the recognized word sequence by the speech recognition device.
  • a word erroneously recognized for a new word has a high transition probability in words neighboring possible word sequences, for which sequences the new word would have a low transition probability.
  • one of the possible word sequences would have the highest overall probability, in which sequences also the words neighboring a new, but erroneously recognized, word were recognized erroneously. Therefore, complete word sequences would be recognized erroneously, which is a considerable disadvantage.
  • the speech recognition means are provided for determining phoneme information stored in the sub-word memory and corresponding to applied phoneme information, and for producing as recognized sub-word information the sub-word information stored in the sub-word memory and assigned to this stored phoneme information, and in that a speech model word memory is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed by these words can be stored as transition probability information, and in that the speech recognition means are arranged for forming at least two expression sequences which contain recognized word information and/or recognized sub-word information, and in that the speech recognition means, by evaluating transition probability information stored in the speech model word memory, are arranged for recognizing as recognized text with the highest overall probability one expression sequence from the at least two expression sequences.
  • a speech recognition method is carried out with a spoken text that contains a word not stored in the word memory, a sub- word sequence for this new word is inserted into the word sequence formed by recognized words of the spoken text, so that one expression sequence is maintained.
  • those sub-words stored in the sub-word memory are inserted into the sub-word sequence whose concatenated phoneme information corresponds to the part of the phoneme information of the spoken text, which part is to be assigned to the new word.
  • transition probabilities stored in the speech model word memory are evaluated and the expression sequence having the highest overall probability is recognized, for the sub-word sequence of the new word no transition probabilities for other words are stored in the speech model word memory, so that, advantageously, the words surrounding a new word in a spoken text are not recognized erroneously.
  • a speech recognition device is obtained having a considerably better recognition rate because, on the one hand, advantages are enjoyed of a speech model word memory that has been provided and, on the other, the disadvantage is avoided that occurs with new words when a speech model word memory is used.
  • a speech recognition device as claimed in claim 1 it has proved to be advantageous to provide the measures as claimed in claim 2.
  • a speech recognition device as claimed in claim 4 it has proved to be advantageous to provide the measures as claimed in claim 5.
  • the advantage is obtained that, after the speech recognition method has been executed in the speech recognition device, a user can correct the spelling of a sub-word sequence representing a new word into the correct spelling, after which the new word is stored in the word memory and, in consequence, the vocabulary of the speech recognition device is enlarged.
  • a speech recognition device as claimed in claim 5 there could be provided that a user is to speak a new word stored in the word memory a number of times into the microphone, in order to determine an associated phoneme sequence of the new word.
  • a further object of the invention is to eliminate the problems stated above and provide an improved speech recognition method in accordance with the type defined in the introduction in the second paragraph.
  • This object is achieved with a speech recognition method of this type in that the speech recognition means determine phoneme information stored in the sub-word memory and corresponding to applied phoneme information and produce sub-word information stored in the sub-word memory and assigned to this stored phoneme information as recognized sub-word information, and in that a speech model word memory is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed from these words is stored as transition probability information, and in that the speech recognition means form at least two expression sequences which contain recognized word information and/or recognized sub-word information, and in that the speech recognition means evaluate transition probability information stored in the speech model word memory in order to recognize the one expression sequence of the at least two expression sequences as recognized text that has the highest overall probability.
  • Fig. 1 diagrammatically shows a block circuit diagram of a speech recognition device comprising a sub-word memory and a speech model sub-word memory
  • Fig. 2 shows a first table containing word information and phoneme information stored in a word memory of the speech recognition device according to Fig. 1,
  • Fig. 3 shows a second table containing sub-word information and phoneme information stored in the sub-word memory of the speech recognition device shown in Fig. 1,
  • Fig. 4 shows a third table containing word sequence information and transition probability information stored in a speech model word memory of the speech recognition device shown in Fig. 1,
  • Fig. 5 shows a fourth table containing sub-word sequence information and transition probability information stored in the speech model sub-word memory of the speech recognition device shown in Fig. 1,
  • Fig. 6 shows a fifth table containing transition probability information and corresponding phoneme values of four possible expression sequences which are evaluated when the overall probability of each of the possible expression sequences is determined in speech recognition means of the speech recognition device, and
  • Fig. 7 shows a sixth table containing word information and phoneme information stored as background information in a background information memory of the speech recognition device shown in Fig. 1.
  • Fig. 1 diagrammatically shows in the form of a block circuit diagram a personal computer 1 in which a speech recognition device 2 is arranged.
  • the speech recognition device 2 can be supplied with speech information SI by a user and the speech recognition device 2 is provided for recognizing phoneme information PI contained in the speech information SI and for producing word information WI of a recognized text.
  • the speech recognition device 2 has an input terminal 3 to which a microphone 4 can be connected.
  • the microphone 4 can deliver speech information SI as an electric input signal to the input terminal 3 of the speech recognition device 2.
  • the microphone 4 has a control key 5 by which a control information signal ST can be delivered to the speech recognition device 2.
  • the speech recognition device 2 includes speech recognition means 6 which are arranged for recognizing phoneme information PI of a spoken text contained in the speech information SI of the input signal and for producing word information WI of a recognized text.
  • the speech recognition means 6 include an AID converter stage 7, a memory stage 8, calculation means 9, a word memory 10, a sub-word memory 11, a speech model word memory 12 and a speech model sub- word memory 13.
  • Speech information SI delivered as an electric input signal to the input terminal 3 can be applied to the A/D converter stage 7. Digitized speech information SI can be applied to the memory stage 8 by the A/D converter stage 7. Digitized speech information SI applied to the memory stage 8 can also be stored in this memory stage 8.
  • digitized speech information SI stored in the memory stage 8 can be applied to a D/A converter stage 14.
  • the D/A converter stage 14 can apply analog speech information SI as an electric output signal to a loudspeaker 15 for the acoustic reproduction of a text spoken into the microphone 4 by a user of the speech recognition device 2.
  • the calculation means 9 are formed by a microprocessor and connected by an address/data bus to the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model word memory 13. Digital speech information SI and the control information ST all stored in the memory stage 8 can be applied to the calculation means 9 by the microphone 4.
  • the calculation means 9 can determine expression sequences AF containing word information WI and/or sub-word information SWI, by evaluating information stored in the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model sub-word memory 13, which expression sequences AF will be further explained hereinafter.
  • the speech recognition means 6 further include word determining means 16 and a background information memory 17.
  • the phoneme information PI of the speech information SI applied to the calculation means 9, and an expression sequence AF recognized for this phoneme information PI by the calculation means 9 when the speech recognition method was executed, can be applied to the word determining means 16.
  • the word determining means 16 can determine, by evaluating background information stored in the background information memory 17, a probable spelling of at least one sub-word sequence contained in the recognized expression sequence AF, which will be further discussed hereinafter.
  • the word determining means 16 can apply to an output terminal 18a recognized expression sequence AF, in which at least one sub-word sequence contained in the expression sequence AF has been replaced by word information NWI of at least one new word, as word information WI of a recognized text.
  • a monitor 19 which forms display means and by which word information WI of a recognized text delivered by the output terminal 18 can be displayed.
  • a keyboard 20 which forms input means.
  • the spelling of a new word displayed on the monitor 19 can be changed by a user of the speech recognition device 2, and changed word information NWI of the new word can be displayed by the monitor 19.
  • the changed word information NWI of the new word can be stored as word information WI by the actuation of a key of the keyboard 20.
  • word memory 10 can be stored word information WI to a maximum of 64,000 individual words forming the vocabulary of the speech recognition device 2.
  • the speech recognition device 2 recognizes as words only those words contained in speech information SI of a spoken text that are also stored in the word memory 10.
  • the word memory 10 stores word information WI of words of a certain so- called “context”, which context corresponds to the vocabulary of a lawyer, as a result of which speech information SI of a spoken text can be recognized very well from this "context”. It may be observed that also word information WI of another "context", such as, for example, the context of a doctor or a salesman could be stored.
  • information in the German language is stored in the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model sub-word memory 13, so that the speech recognition device 2 is arranged for recognizing speech information SI of texts spoken in German.
  • the execution of the speech recognition method of the speech recognition device 2 is explained while taking typical formulations of the German language into account.
  • a speech recognition device according to the invention may be arranged for recognizing texts from speech information SI spoken in any language.
  • a phoneme sequence featuring the word can be stored in the word memory 10 as phoneme information PI(WI).
  • Phonemes of a phoneme sequence are the smallest distinguishable acoustic units into which spoken speech information SI can be subdivided.
  • a first table 21 of Fig. 2 is shown word information WI and also phoneme information PI(WI) assignedly stored in the word memory 10.
  • the word information WI has substituting letters A, B, C to G in the first table 21.
  • the word information WI entered in the first table 21 substitutes further word information WI stored in the word memory 10.
  • the vocabulary of the speech recognition device 2 thus also includes the seven words indicated in first table 21 as word information WI.
  • sub-word memory 11 can be stored as sub-word information SWI sub- words forming parts of words, and assigned phoneme information PI(SWI).
  • Sub-words here form both individual letters, syllables or parts of words which can be added together to form a word.
  • a second table 22 of Fig. 3 contains sub- word information SWI and phoneme information PI(SWI) assignedly stored in the sub-word memory 11. To simplify the explanation, letters a, b, c to g have been entered in the second table 22 for the sub-word information SWI.
  • the sub-word information SWI b for a sub-word "gen”
  • the sub-word information SWI c for a sub-word "f '
  • the sub-word information SWI d for a sub-word "r”
  • the sub-word information SWI e for a sub-word "i”
  • the sub-word information SWI f for a sub-word "sch”
  • the seven sub-word information signals SWI entered in the second table 22 substitute a plurality of further sub-word information signals SWI stored in the sub-word memory 11.
  • transition probability information UWI(WFI) a probability of occurrence of a second word stored in the word memory 10 after a first word stored in the word memory 10 in a word sequence formed by these words.
  • WFI word sequences having two words each, which word sequences are also known as bigrams.
  • Fig. 4 shows a third table 23 which contains word sequence information WFI of word sequences and assigned transition probability information UWI(WFI) which is stored in the speech model word memory 12.
  • a probability of occurrence of a second sub-word stored in the sub-word memory 11 after a first sub-word stored in the sub-word memory 11 in a sub-word sequence formed by these sub- words can be stored as transition probability information UWI(SWFI).
  • SWFI transition probability information
  • the speech model sub- word memory 13 can be stored sub-word sequences having each two sub-words as sub- word sequence information SWFI, which also forms so-called bigrams.
  • Fig. 5 shows a fourth table 24 which contains sub-word sequence information SWFI of sub-word sequences and assigned transition probability information UWI(SWFI) which is stored in the speech model sub-word memory 13.
  • Small values of the transition probability information UWI(SWFI) express a high transition probability.
  • the sub-word sequence "feu” is contained, for example, in the word “feuchten” but also in the word "feurigen”.
  • the seven sub-word sequence information elements SWFI entered in the fourth table 24 substitute a plurality of further sub-word sequence information elements SWFI stored in the speech model sub- word memory 13.
  • the word information WI in the word sequence information WFI is not again stored in word memory 10, but that, in order to save memory capacity in the speech model word memory 12, address pointers are stored at memory locations of the respective word information WI in the word memory 10 as word sequence information WFI in the speech model word memory 12.
  • address pointers are stored at memory locations of the respective word information WI in the word memory 10 as word sequence information WFI in the speech model word memory 12.
  • an address pointer to the third row of the first table 21 and an address pointer to the fourth row of the first table for the word sequence information WFI B+C.
  • sub-word information SWI is stored only in the sub-word memory 11 and address pointers to memory locations in the sub- word memory 11 are stored as sub-word sequence information SWFI in the speech model sub- word memory 13.
  • a section of the digitized speech information SI stored in the memory stage 8 is read out each time by the calculation means 9, and recognized words and word sequences contained in the section of the speech information SI are processed in accordance with the so-called "Hidden-Markov-Modell".
  • Fig. 6 shows the fifth table 25 in which are entered possible expression sequences AF determined during the execution of the speech recognition method.
  • the calculation means 9 determine the phoneme information PI contained in the section of the speech information SI, as this has already been known for a long time. Determined phoneme information PI is then compared with phoneme information PI(WI) stored in the word memory 10. When during this comparison phoneme information PI corresponding to the determined phoneme information PI is found in the word memory 10, stored word information WI assigned to this found phoneme information PI is inserted as recognized word information WI into a possible expression sequence AF of the fifth table 25.
  • the speech recognition means 6 are arranged for determining a corresponding phoneme value PUW for recognized word information WI that was inserted into possible expression sequences AF.
  • a corresponding value PUW here indicates the extent of correspondence or match of the stored phoneme information PI of the recognized word information WI with the phoneme information PI contained in the speech information SI delivered during the section of the speech recognition means 6 the word information WI, for which phoneme information was recognized by the speech recognition means 6.
  • a small magnitude of a corresponding phoneme value PUW characterizes a great correspondence or match of compared phoneme information PI and a high probability that a word was recognized correctly.
  • the word "feurigen" contained in the section of the speech information SI of the spoken text does not belong to the vocabulary of the speech recognition device 2 and is therefore not stored in the word memory 10.
  • a corresponding phoneme value PUW1 determined for the word "RRen” has the value "35", because the compared phoneme information PI of the section of the speech information SI and of the stored phoneme information PI(C) have only a moderate correspondence. The probability that the word "LERen” was recognized correctly is therefore not very high.
  • the corresponding phoneme value PUW1 determined during this operation for the word "Grii ⁇ en” has the value "20" because the compared phoneme information PI of the section of the speech information SI and of the stored phoneme information PI(F) of the word "Gr ⁇ en” has only a moderate correspondence.
  • the speech recognition means 6 and, in addition, the calculation means 9 are not only arranged for determining words of possible expression sequences AF by comparing phoneme information PI contained in the section of the speech information SI with phoneme information PI(WI) stored in the word memory 10. Additionally, transition probability information UWI(WFI) of word sequences contained in possible expression sequences AF are determined, which transition probability information is fetched from the speech model word memory 12, and entered in the fifth table 25.
  • the transition probability information UWI1 has a small value.
  • the speech recognition means 6 and thus the calculation means 9 are arranged for determining overall probability information GWI1 and entering this overall probability information GWI1 in the seventh column of the fifth table 25 based on the corresponding phoneme value PUWI of the first possible expression sequence AFl.
  • a low value of the overall probability information GWI indicates a high probability of the possible expression sequence AF corresponding to a spoken word sequence contained in the section of the speech information SI.
  • a weight factor can be multiplied by the corresponding phoneme values PUW or the transition probability information UWI in order to lend more weight to corresponding phoneme values PUW or transition probability information UWI.
  • the calculation means 9 determine a second possible expression sequence AF2 having the word sequence "mit fremden F ⁇ Ben” and put this sequence on the sixth row of the fifth table 25.
  • the calculation means 9 determine, as indicated above, corresponding phoneme values PUW2 for the word information WI of the second possible expression sequence AF2, which phoneme values PUW2 are put on the seventh row of the fifth table 25.
  • transition probability information UWI(WFI) stored in the speech model word memory 12
  • the calculation means 9 determine transition probability information UWI2 of the second possible expression sequence AF2 and enter it on the fifth row of the fifth table 25. Since the word sequence "mit fremden F ⁇ en" hardly ever occurs in the word sequence "mit fremden F ⁇ en" hardly ever occurs in the
  • the transition probability information UWI2 has a relatively high value.
  • the calculation means 9 determine a third possible expression sequence AF3 having the word sequence "mit freuchten K ⁇ ssen" and write this sequence on the ninth row of the fifth table 25.
  • the speech recognition means 6 and also the calculation means 9 are arranged for determining phoneme information PI(SWI) stored in the sub-word memory 11 and corresponding to the phoneme information PI contained in the section of the speech information SI, and for producing as recognized sub-word information SWI the sub-word information SWI stored in the sub-word memory 11 and assigned to this stored phoneme information PI(SWI).
  • the calculation means 9 determine a fourth possible expression sequence AF4 which is written on the twelfth row of the fifth table 25.
  • This sub-word sequence is developed from a concatenation of the sub-words "f", “eu”, "r”, "i", and "gen”.
  • UWI is split up from words into sub-words or built from sub-words to. words as a result of which no values are written on the eleventh row and in the third column and in the fifth column for the transition probability information UWI 4 of the fourth possible expression sequence AF4.
  • determining such transition probability information UWI may also be advantageous.
  • the speech recognition means 6 are arranged for recognizing an expression sequence AF containing recognized word information WI and/or recognized sub- word information SWI as recognized text that has the largest overall probability with the smallest overall probability information GWI.
  • Fig. 7 shows a sixth table 26 which contains word information WI and assignedly stored phoneme information PI(WI), which is stored in the background information memory 17 as background information.
  • Word information WI of a very large vocabulary common in the German language and not limited to a certain "context" is stored in the background information memory 17.
  • the phoneme information PI of the section of the speech information SI for which the sub-word sequence ("feuNTgen”) was determined is compared with phoneme information PI(WI) stored in the background information memory. If the word "feurigen" is stored in the background information memory 17, the word determining means 16 determine the new word with this spelling.
  • the word determining means 16 When the word determining means 16 for determining a probably correct spelling of a new word do not find any corresponding phoneme information PI in the background information memory 17 in the first step of the speech recognition method, the word determining means 16 carry out the second step indicated hereinafter. The word determining means 16 then compare parts of the phoneme information PI of the sub-word sequence ("f euNTgen" with phoneme information PI(WI) stored in the background information memory 17 and determine what spelling the parts of word information WI assigned to this stored phoneme information PI(WI) have.
  • the part of the phoneme information PI ("eur") of the phoneme information PI of the respective section of the speech information SI is also found, inter alia, in the phoneme information PI( ⁇ ) of the word "Heurigen" stored in the background information memory 17.
  • the spelling common to a plurality of words found is also used for the new word by the word determining means 16.
  • the word determining means 16 determine that the sub-words can simply be combined to obtain the probably correct spelling of the new word.
  • the recognized fourth expression sequence AF4 in which the sub-word sequence ("f eu " r ⁇ i ⁇ gen") is replaced by the word information NWI of the new word ("feurigen") determined by the word determining means 16, is delivered as recognized text to the output terminal 18 and from there to the monitor 19. Consequently, after the speech recognition method has been executed in the speech recognition device 2, the monitor displays for all the sections of the speech information SI stored in the memory stage 8 the recognized text "Hans verabdorfete sich von Anna mit familiarigen K ⁇ ssen und ging nach Dave".
  • a user of the speech recognition device 2 then has the option, by actuating one of the keys of the keyboard 20, to change the text shown on the monitor 19 and specifically correct the spelling of a new word. Such a new spelling of a new word would then be again delivered to the monitor 19 by the word determining means 16 via the output terminal 18 and displayed by means of the monitor 19.
  • the word information NWI of the new word is stored in the word memory 10 as word information WI and assigned phoneme information PI(WI) together with the phoneme information PI(NVVI) of the new word contained in the section of the speech information SI.
  • English language information is stored in a word memory 10, a sub-word memory 11, a speech model word memory 12, a speech model sub-word memory 13 and a background information memory 17 of a speech recognition device 2 whose structure corresponds to that of the speech recognition device 2 shown in Fig. 1.
  • speech information SI of a text spoken in the English language can be processed.
  • a user pronounces the text "The
  • Toscana is a friendly and kind region of Italy.
  • the calculation means 9 of the speech recognition device 2 determine amongst other possible expression sequences also a fifth possible expression sequence AF5 "and kind regards” and a sixth possible expression sequence AF6 "and kind r ⁇ i ⁇ gion".
  • the fifth possible expression sequence AF5 contains a formulation that is typical in the English language, as a result of which transition probability information UWI1 fetched from the speech model word memory 12 of the fifth expression sequence AF5 has small values.
  • the sixth possible expression sequence AF6 contains the sub- word sequence "r ⁇ i ⁇ gion", because the word "region" is not stored in the word memory 10.
  • a new word contained in the speech information SI has already been replaced by a sub-word sequence during the execution of the speech recognition method, so that a wrong recognition of a possible expression sequence which has a typical formulation and, therefore, a high overall probability is avoided.
  • the word determining means 16 are arranged for determining the correct spelling ("region") of the sub- word sequence "r ⁇ i ⁇ gion" as word information NWI of the new word by evaluation of background information stored in a background information memory 17.
  • this causes a word new to the speech recognition device 2 to be shown in a probably correct spelling on a monitor 19.
  • the new word information NWI can then be stored in the word memory 10 with a spelling modified, as required, by a user, so that, advantageously, the vocabulary of the speech recognition device 2 is enlarged. It may be observed that a recognized sub-word sequence may be formed, for example, by the sub-word sequence "k ⁇ o ⁇ m ⁇ p ⁇ j ⁇ u ⁇ t ⁇ a.
  • the word determining means are then arranged for determining the correct spelling of the new word "computer” by evaluation of the background information stored in the background information memory ", in that a comparison is made with the spelling customary in the German or English language. It may be observed that the background information memory may also store other background information containing statistical information about a language.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
PCT/EP1999/010302 1999-01-05 1999-12-20 Speech recognition device including a sub-word memory WO2000043990A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP99965533A EP1060471A1 (en) 1999-01-05 1999-12-20 Speech recognition device including a sub-word memory
JP2000595336A JP2002535728A (ja) 1999-01-05 1999-12-20 サブワードメモリを含む音声認識装置
KR1020007009795A KR20010085219A (ko) 1999-01-05 1999-12-20 서브단어 메모리를 포함하는 음성인식 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP99890001 1999-01-05
EP99890001.3 1999-01-05

Publications (1)

Publication Number Publication Date
WO2000043990A1 true WO2000043990A1 (en) 2000-07-27

Family

ID=8243954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1999/010302 WO2000043990A1 (en) 1999-01-05 1999-12-20 Speech recognition device including a sub-word memory

Country Status (5)

Country Link
EP (1) EP1060471A1 (ja)
JP (1) JP2002535728A (ja)
KR (1) KR20010085219A (ja)
CN (1) CN1299504A (ja)
WO (1) WO2000043990A1 (ja)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003457B2 (en) * 2002-10-29 2006-02-21 Nokia Corporation Method and system for text editing in hand-held electronic device
CN1308908C (zh) * 2003-09-29 2007-04-04 摩托罗拉公司 用于文字到语音合成的方法
KR100679042B1 (ko) * 2004-10-27 2007-02-06 삼성전자주식회사 음성인식 방법 및 장치, 이를 이용한 네비게이션 시스템
US9787819B2 (en) * 2015-09-18 2017-10-10 Microsoft Technology Licensing, Llc Transcription of spoken communications

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0564166A2 (en) * 1992-04-02 1993-10-06 AT&T Corp. Automatic speech recognizer
DE19639844A1 (de) * 1996-09-27 1998-04-02 Philips Patentverwaltung Verfahren zum Ableiten wenigstens einer Folge von Wörtern aus einem Sprachsignal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0564166A2 (en) * 1992-04-02 1993-10-06 AT&T Corp. Automatic speech recognizer
DE19639844A1 (de) * 1996-09-27 1998-04-02 Philips Patentverwaltung Verfahren zum Ableiten wenigstens einer Folge von Wörtern aus einem Sprachsignal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Growing Phonetic Baseforms From Multiple Utterances in Speech Recognition", IBM TECHNICAL DISCLOSURE BULLETIN, vol. 30, no. 4, September 1987 (1987-09-01), New York, US, pages 1467 - 1468, XP002103941 *
KENJI KITA ET AL: "PROCESSING UNKNOWN WORDS IN CONTINUOUS SPEECH RECOGNITION", IEICE TRANSACTIONS, vol. E74, no. 7, 1 July 1991 (1991-07-01), pages 1811 - 1815, XP000263044 *
MENG H ET AL: "Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology", SPEECH COMMUNICATION, vol. 18, no. 1, 1 January 1996 (1996-01-01), pages 47-63, XP004008922 *

Also Published As

Publication number Publication date
EP1060471A1 (en) 2000-12-20
CN1299504A (zh) 2001-06-13
KR20010085219A (ko) 2001-09-07
JP2002535728A (ja) 2002-10-22

Similar Documents

Publication Publication Date Title
US7983912B2 (en) Apparatus, method, and computer program product for correcting a misrecognized utterance using a whole or a partial re-utterance
US6094633A (en) Grapheme to phoneme module for synthesizing speech alternately using pairs of four related data bases
US6778962B1 (en) Speech synthesis with prosodic model data and accent type
US7260533B2 (en) Text-to-speech conversion system
KR19990008459A (ko) 개선된 신뢰도의 단어 인식방법 및 단어 인식기
JP5198046B2 (ja) 音声処理装置及びそのプログラム
JP2001312296A (ja) 音声認識システム、音声認識方法およびコンピュータ可読な記録媒体
WO2011064829A1 (ja) 情報処理装置
WO2004066271A1 (ja) 音声合成装置,音声合成方法および音声合成システム
JPH0713594A (ja) 音声合成において音声の質を評価するための方法
JP3723518B2 (ja) 文字処理装置および方法
US7844459B2 (en) Method for creating a speech database for a target vocabulary in order to train a speech recognition system
US4455615A (en) Intonation-varying audio output device in electronic translator
RU2320026C2 (ru) Преобразование буквы в звук для синтезированного произношения сегмента текста
JP5160594B2 (ja) 音声認識装置および音声認識方法
EP1060471A1 (en) Speech recognition device including a sub-word memory
JP2005049655A (ja) 文字データ修正装置、文字データ修正方法および文字データ修正プログラム
JP4684583B2 (ja) 対話装置
JP4318188B2 (ja) テロップ表示装置
JP2006031725A (ja) 文字処理装置
JP3284976B2 (ja) 音声合成装置及びコンピュータ可読記録媒体
JPH09244677A (ja) 音声合成システム
JPH08286697A (ja) 日本語処理装置
CN112988955B (zh) 多语语音识别及主题语意分析方法与装置
JP2002189490A (ja) ピンイン音声入力の方法

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99805784.3

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 1999965533

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020007009795

Country of ref document: KR

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1999965533

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020007009795

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1999965533

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1020007009795

Country of ref document: KR