EP1060471A1

EP1060471A1 - Speech recognition device including a sub-word memory

Info

Publication number: EP1060471A1
Application number: EP99965533A
Authority: EP
Inventors: Heinrich Bartosik; Dietrich G. Klakow
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 1999-01-05
Filing date: 1999-12-20
Publication date: 2000-12-20
Also published as: CN1299504A; WO2000043990A1; JP2002535728A; KR20010085219A

Abstract

In a speech recognition device (2) comprising a word memory (10) in which word information (WI) and assigned phoneme information (PI(WI)) of at least a first and a second word forming the vocabulary can be stored, and including a sub-word memory (11) in which sub-words forming parts of words can be stored as sub-word information (SWI) and assigned phoneme information (PI(SWI)) of at least a first and a second sub-word, and speech model word memory (12) is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed by these words can be stored as transition probability information (UWI(WFI)).

Description

Speech recognition device including a sub-word memory.

The invention relates to a speech recognition device including a word memory in which word information and assigned phoneme information of at least a first and a second word forming the vocabulary of the speech recognition device can be stored, and including speech recognition means to which speech information containing phoneme information can be applied and which are arranged for determining phoneme information stored in the word memory and corresponding to the applied phoneme information, and for producing as recognized word information the word information stored in the word memory and assigned to this stored phoneme information, and including a sub-word memory in which sub-words forming parts of words can be stored as sub-word information and assigned phoneme information of at least a first and a second sub-word.

The invention relates to a speech recognition method for recognizing spoken texts containing new words by means of a speech recognition device including a word memory in which word information and assigned phoneme information of at least a first and a second word forming the vocabulary of the speech recognition device is stored, and including speech recognition means to which speech information containing phoneme information of a spoken text is applied and which determine phoneme information stored in the word memory and corresponding to the applied phoneme information, and which produce as recognized word information the word information stored in the word memory and assigned to this stored phoneme information, and including a sub-word memory in which parts of words forming sub-words are stored as sub-word information and assigned phoneme information of at least a first and a second sub-word.

Such a speech recognition device of the type set out in the first paragraph and such a speech recognition method of the type defined in the second paragraph are known from document EP 0 590 173 Al. Speech information of a text spoken by a user of the speech recognition device and containing phoneme information can be applied to the known speech recognition device by a microphone. The phoneme information can be applied to speech recognition means of the speech recognition device by which word information recognized by the speech recognition means can be applied as recognized text to a monitor that can be connected to the speech recognition device. The word information of the recognized text can be shown by the monitor.

The speech recognition means include a word memory for recognizing word information contained in the speech information. The word memory stores as word information all the words recognized by the speech recognition device, which words form the vocabulary of the speech recognition device. For each word information signal is stored phoneme information forming a phoneme sequence featuring the associated stored word.

When the known speech recognition method in the known speech recognition device is implemented,- phoneme sequences contained in speech information of a spoken text are determined by the speech recognition means and compared with phoneme sequences stored in the word memory. When a match between a determined and a stored phoneme sequence is found during this comparison, stored word information assigned to this stored phoneme sequence is then taken from the word memory as the recognized word.

If the speech information of a spoken text contains a new word whose word information and phoneme information is not stored in the word memory, the speech recognition means determine the word as a recognized word whose phoneme sequence stored in the word memory corresponds the most to the part of the phoneme information of the speech information that represents the new word. After the speech recognition method has been executed, the recognized text with the word erroneously recognized for a new word in the recognized text is shown on the monitor. A user of the known speech recognition device can then correct the spelling of the erroneously recognized word into the correct spelling of the actually spoken new word.

The known speech recognition device includes a sub-word memory in which parts of words forming sub-words can be stored as sub-word information and assigned phoneme information. When a user has corrected the spelling of a new, erroneously recognized word into the spelling of the actually spoken new word, the known speech recognition device is arranged for determining the phoneme sequence of the new word and for the associated storing of the word information and phoneme information of the new word in the word memory by comparing sub-words contained in the new word with sub-words stored in the sub-word memory. As a result, the vocabulary of the known speech recognition device is enlarged by the new word.

In the known speech recognition device it has proved to be disadvantageous that there is no speech model word memory in which transition probabilities of words stored in the word memory can be stored, because when a speech recognition method is executed by evaluating transition probabilities stored in a speech model word memory, a considerably better recognition rate can be achieved. The fact that a speech model word memory can be provided in a speech recognition device and that this leads to a better attainable recognition rate has been known for a long time. The arrangement of a speech model word memory in the known speech recognition device, however, is the reason for a considerable drawback which the measures defined in claim 1 giving substance to the invention recognize and avoid. When such a speech recognition method of such a speech recognition device is executed with a speech model word memory, several possible word sequences for the phoneme information of the spoken text would be determined by the speech recognition means. As explained above, for a new word showing up in a spoken text, a word stored in the word memory, thus an erroneous word, would be determined as a recognized word and be inserted into the possible word sequences.

One of the possible word sequences would be determined as the word sequence having the highest overall probability, calculated on the basis of the transition probabilities of the words of the word sequence, and would be produced as the recognized word sequence by the speech recognition device. However, in one of the possible word sequences a word erroneously recognized for a new word has a high transition probability in words neighboring possible word sequences, for which sequences the new word would have a low transition probability. As a result, one of the possible word sequences would have the highest overall probability, in which sequences also the words neighboring a new, but erroneously recognized, word were recognized erroneously. Therefore, complete word sequences would be recognized erroneously, which is a considerable disadvantage.

It is an object of the invention to eliminate the problems stated above and provide an improved speech recognition device in accordance with the type set out in the first paragraph. This object is achieved with such a speech recognition device in that the speech recognition means are provided for determining phoneme information stored in the sub-word memory and corresponding to applied phoneme information, and for producing as recognized sub-word information the sub-word information stored in the sub-word memory and assigned to this stored phoneme information, and in that a speech model word memory is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed by these words can be stored as transition probability information, and in that the speech recognition means are arranged for forming at least two expression sequences which contain recognized word information and/or recognized sub-word information, and in that the speech recognition means, by evaluating transition probability information stored in the speech model word memory, are arranged for recognizing as recognized text with the highest overall probability one expression sequence from the at least two expression sequences. As a result, when in the speech recognition device a speech recognition method is carried out with a spoken text that contains a word not stored in the word memory, a sub- word sequence for this new word is inserted into the word sequence formed by recognized words of the spoken text, so that one expression sequence is maintained. During this operation, those sub-words stored in the sub-word memory are inserted into the sub-word sequence whose concatenated phoneme information corresponds to the part of the phoneme information of the spoken text, which part is to be assigned to the new word. When, subsequently, transition probabilities stored in the speech model word memory are evaluated and the expression sequence having the highest overall probability is recognized, for the sub-word sequence of the new word no transition probabilities for other words are stored in the speech model word memory, so that, advantageously, the words surrounding a new word in a spoken text are not recognized erroneously. In this way a speech recognition device is obtained having a considerably better recognition rate because, on the one hand, advantages are enjoyed of a speech model word memory that has been provided and, on the other, the disadvantage is avoided that occurs with new words when a speech model word memory is used. With a speech recognition device as claimed in claim 1, it has proved to be advantageous to provide the measures as claimed in claim 2. This enables that when sub- words stored in a sub-word memory are inserted into a sub-word sequence that represents a new word when the speech recognition method is executed, also probabilities of transitions from one sub-word to another sub-word are taken into account. This affords the advantage that the sub-word sequence is very well adapted to the new word and the recognition rate of the speech recognition device is additionally improved.

In a speech recognition device as claimed in claim 1 it has proved to be advantageous to provide the measures as claimed in claim 3. This provides an even better determination of the overall probability of the possible expression sequences, so that, advantageously, a better recognition rate is attained.

With a speech recognition device as claimed in claim 1, it has proved to be advantageous to provide the measures as claimed in claim 4. This affords the advantage that with the word determining means based on statistical background information the spelling of a sub-word sequence recognized by the speech recognition means can be adapted to the spelling normally used in the language and an apparently correct spelling of the new word can be determined.

In a speech recognition device as claimed in claim 4 it has proved to be advantageous to provide the measures as claimed in claim 5. As a result, the advantage is obtained that, after the speech recognition method has been executed in the speech recognition device, a user can correct the spelling of a sub-word sequence representing a new word into the correct spelling, after which the new word is stored in the word memory and, in consequence, the vocabulary of the speech recognition device is enlarged.

In a speech recognition device as claimed in claim 5 there could be provided that a user is to speak a new word stored in the word memory a number of times into the microphone, in order to determine an associated phoneme sequence of the new word. However, it has proved to be highly advantageous to provide the measures as claimed in claim 6. This affords the advantage that a user need not train a new word.

A further object of the invention is to eliminate the problems stated above and provide an improved speech recognition method in accordance with the type defined in the introduction in the second paragraph. This object is achieved with a speech recognition method of this type in that the speech recognition means determine phoneme information stored in the sub-word memory and corresponding to applied phoneme information and produce sub-word information stored in the sub-word memory and assigned to this stored phoneme information as recognized sub-word information, and in that a speech model word memory is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed from these words is stored as transition probability information, and in that the speech recognition means form at least two expression sequences which contain recognized word information and/or recognized sub-word information, and in that the speech recognition means evaluate transition probability information stored in the speech model word memory in order to recognize the one expression sequence of the at least two expression sequences as recognized text that has the highest overall probability.

The advantages of a speech recognition method according to the invention are a consequence of the above-defined advantages of a speech recognition device according to the invention.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. In the drawings:

Fig. 1 diagrammatically shows a block circuit diagram of a speech recognition device comprising a sub-word memory and a speech model sub-word memory,

Fig. 2 shows a first table containing word information and phoneme information stored in a word memory of the speech recognition device according to Fig. 1,

Fig. 3 shows a second table containing sub-word information and phoneme information stored in the sub-word memory of the speech recognition device shown in Fig. 1,

Fig. 4 shows a third table containing word sequence information and transition probability information stored in a speech model word memory of the speech recognition device shown in Fig. 1,

Fig. 5 shows a fourth table containing sub-word sequence information and transition probability information stored in the speech model sub-word memory of the speech recognition device shown in Fig. 1,

Fig. 6 shows a fifth table containing transition probability information and corresponding phoneme values of four possible expression sequences which are evaluated when the overall probability of each of the possible expression sequences is determined in speech recognition means of the speech recognition device, and

Fig. 7 shows a sixth table containing word information and phoneme information stored as background information in a background information memory of the speech recognition device shown in Fig. 1.

Fig. 1 diagrammatically shows in the form of a block circuit diagram a personal computer 1 in which a speech recognition device 2 is arranged. The speech recognition device 2 can be supplied with speech information SI by a user and the speech recognition device 2 is provided for recognizing phoneme information PI contained in the speech information SI and for producing word information WI of a recognized text.

The speech recognition device 2 has an input terminal 3 to which a microphone 4 can be connected. The microphone 4 can deliver speech information SI as an electric input signal to the input terminal 3 of the speech recognition device 2. The microphone 4 has a control key 5 by which a control information signal ST can be delivered to the speech recognition device 2.

If a user of the speech recognition device 2 wishes to speak a text into the microphone 4 as speech information SI to be recognized, the user is to actuate the control key 5. Then speech information SI contained in the spoken text can be delivered to the input terminal 3 and control information ST to the speech recognition device 2. The speech recognition device 2 includes speech recognition means 6 which are arranged for recognizing phoneme information PI of a spoken text contained in the speech information SI of the input signal and for producing word information WI of a recognized text. For this purpose, the speech recognition means 6 include an AID converter stage 7, a memory stage 8, calculation means 9, a word memory 10, a sub-word memory 11, a speech model word memory 12 and a speech model sub- word memory 13.

Speech information SI delivered as an electric input signal to the input terminal 3 can be applied to the A/D converter stage 7. Digitized speech information SI can be applied to the memory stage 8 by the A/D converter stage 7. Digitized speech information SI applied to the memory stage 8 can also be stored in this memory stage 8.

In an audio playback mode that can be activated (not further shown in Fig. 1) of the speech recognition device 2, digitized speech information SI stored in the memory stage 8 can be applied to a D/A converter stage 14. In the audio playback mode the D/A converter stage 14 can apply analog speech information SI as an electric output signal to a loudspeaker 15 for the acoustic reproduction of a text spoken into the microphone 4 by a user of the speech recognition device 2.

The calculation means 9 are formed by a microprocessor and connected by an address/data bus to the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model word memory 13. Digital speech information SI and the control information ST all stored in the memory stage 8 can be applied to the calculation means 9 by the microphone 4. When a speech recognition method of the speech recognition device is executed, the calculation means 9 can determine expression sequences AF containing word information WI and/or sub-word information SWI, by evaluating information stored in the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model sub-word memory 13, which expression sequences AF will be further explained hereinafter.

The speech recognition means 6 further include word determining means 16 and a background information memory 17. The phoneme information PI of the speech information SI applied to the calculation means 9, and an expression sequence AF recognized for this phoneme information PI by the calculation means 9 when the speech recognition method was executed, can be applied to the word determining means 16. When the speech recognition method is executed further, the word determining means 16 can determine, by evaluating background information stored in the background information memory 17, a probable spelling of at least one sub-word sequence contained in the recognized expression sequence AF, which will be further discussed hereinafter.

The word determining means 16 can apply to an output terminal 18a recognized expression sequence AF, in which at least one sub-word sequence contained in the expression sequence AF has been replaced by word information NWI of at least one new word, as word information WI of a recognized text. To the output terminal 18 is connected a monitor 19 which forms display means and by which word information WI of a recognized text delivered by the output terminal 18 can be displayed.

To the speech recognition device 2 is further connected a keyboard 20 which forms input means. The spelling of a new word displayed on the monitor 19 can be changed by a user of the speech recognition device 2, and changed word information NWI of the new word can be displayed by the monitor 19. When a user changes the spelling of a new word determined by the word determining means 16 and would like to store this now correct spelling of a new word in the word memory 10, the changed word information NWI of the new word can be stored as word information WI by the actuation of a key of the keyboard 20. This affords the advantage that a user, after the speech recognition method in the speech recognition device 2 has been executed, can change the spelling of a sub-word sequence that represents a new word into the correct spelling of the new word, after which the new word can be stored in the word memory 10 and the vocabulary of the speech recognition device 2 can be enlarged. Enlarging the vocabulary is possible in a very simple manner, because the spelling of a sub-word sequence already often corresponds with the correct spelling of a new word.

In the word memory 10 can be stored word information WI to a maximum of 64,000 individual words forming the vocabulary of the speech recognition device 2. The speech recognition device 2 recognizes as words only those words contained in speech information SI of a spoken text that are also stored in the word memory 10.

The word memory 10 stores word information WI of words of a certain so- called "context", which context corresponds to the vocabulary of a lawyer, as a result of which speech information SI of a spoken text can be recognized very well from this "context". It may be observed that also word information WI of another "context", such as, for example, the context of a doctor or a salesman could be stored.

In accordance with the example of embodiment, information in the German language is stored in the word memory 10, the sub-word memory 11, the speech model word memory 12 and the speech model sub-word memory 13, so that the speech recognition device 2 is arranged for recognizing speech information SI of texts spoken in German. In a first example described hereinafter, the execution of the speech recognition method of the speech recognition device 2 is explained while taking typical formulations of the German language into account. However, there may be observed that a speech recognition device according to the invention may be arranged for recognizing texts from speech information SI spoken in any language.

For each word information element WI of a word, a phoneme sequence featuring the word can be stored in the word memory 10 as phoneme information PI(WI).

Phonemes of a phoneme sequence are the smallest distinguishable acoustic units into which spoken speech information SI can be subdivided.

In a first table 21 of Fig. 2 is shown word information WI and also phoneme information PI(WI) assignedly stored in the word memory 10. For providing a simpler explanation, the word information WI has substituting letters A, B, C to G in the first table 21. According to the first example the first table 21 contains the word information WI = A for a word "Fϋβen", the word information WI = B for a word "mit", the word information WI = C for a word "freundlichen", the word information WI = D for a word "Kϋssen", the word information WI = E for a word "fremden", the word information WI = F for a word "Grϋβen" and the word information WI = G for a word "feuchten". The word information WI entered in the first table 21 substitutes further word information WI stored in the word memory 10. The vocabulary of the speech recognition device 2 thus also includes the seven words indicated in first table 21 as word information WI.

In the sub-word memory 11 can be stored as sub-word information SWI sub- words forming parts of words, and assigned phoneme information PI(SWI). Sub-words here form both individual letters, syllables or parts of words which can be added together to form a word.

A second table 22 of Fig. 3 contains sub- word information SWI and phoneme information PI(SWI) assignedly stored in the sub-word memory 11. To simplify the explanation, letters a, b, c to g have been entered in the second table 22 for the sub-word information SWI. According to the first example, the sub-word information SWI = a is entered in the second table 22 for a sub-word "eu", the sub-word information SWI = b for a sub-word "gen", the sub-word information SWI = c for a sub-word "f ', the sub-word information SWI = d for a sub-word "r", the sub-word information SWI = e for a sub-word "i", the sub-word information SWI = f for a sub-word "sch" and the sub-word information SWI = g for a sub- word "st". The seven sub-word information signals SWI entered in the second table 22 substitute a plurality of further sub-word information signals SWI stored in the sub-word memory 11.

In the speech model word memory 12 of the speech recognition device 2 can be stored as transition probability information UWI(WFI) a probability of occurrence of a second word stored in the word memory 10 after a first word stored in the word memory 10 in a word sequence formed by these words. In the speech model word memory 12 can be stored as word sequence information WFI word sequences having two words each, which word sequences are also known as bigrams.

Fig. 4 shows a third table 23 which contains word sequence information WFI of word sequences and assigned transition probability information UWI(WFI) which is stored in the speech model word memory 12. For example, the second row of the third table 23 contains the information that the word "freundlichen" having the word information WI = C follows the word "mit" having the word information WI = B in a word sequence formed by these words in speech information SI of a spoken text and has transition probability information UWI = (WFI) = 5. Small values of the transition probability information UWI(WFI) indicate a high transition probability.

When the word "mit" is recognized when the speech recognition method is executed, it may consequently be assumed with high probability expressed in a very low value UWI(WFI) = 5 that the next word contained in the spoken text is the word "freundlichen". The six word sequence information elements WFI entered in the third table 23 substitute a plurality of further word sequence information elements WFI stored in the speech model word memory 12.

In the speech model sub-word memory 13 of the speech recognition device 2 a probability of occurrence of a second sub-word stored in the sub-word memory 11 after a first sub-word stored in the sub-word memory 11 in a sub-word sequence formed by these sub- words can be stored as transition probability information UWI(SWFI). In the speech model sub- word memory 13 can be stored sub-word sequences having each two sub-words as sub- word sequence information SWFI, which also forms so-called bigrams.

Fig. 5 shows a fourth table 24 which contains sub-word sequence information SWFI of sub-word sequences and assigned transition probability information UWI(SWFI) which is stored in the speech model sub-word memory 13. For example, the third row of the fourth table 24 contains the information that a sub-word sequence formed by the sub-word "f ' having the sub-word information SWI = c and the sub-word "eu" having the sub-word information SWI = a in words of a spoken text has transition probability information UWI(SWFI) = 2. Small values of the transition probability information UWI(SWFI) express a high transition probability. The sub-word sequence "feu" is contained, for example, in the word "feuchten" but also in the word "feurigen". The seven sub-word sequence information elements SWFI entered in the fourth table 24 substitute a plurality of further sub-word sequence information elements SWFI stored in the speech model sub- word memory 13.

It may be observed that in the speech model word memory 12 the word information WI in the word sequence information WFI is not again stored in word memory 10, but that, in order to save memory capacity in the speech model word memory 12, address pointers are stored at memory locations of the respective word information WI in the word memory 10 as word sequence information WFI in the speech model word memory 12. For example, on the second row of the third table 23 are stored an address pointer to the third row of the first table 21 and an address pointer to the fourth row of the first table for the word sequence information WFI = B+C. In corresponding fashion, sub-word information SWI is stored only in the sub-word memory 11 and address pointers to memory locations in the sub- word memory 11 are stored as sub-word sequence information SWFI in the speech model sub- word memory 13.

In the following there will be explained the execution of the speech recognition method in the speech recognition means 6 by means of a first example. According to the first example, there is assumed that a user presses the control key 5 and speaks a text "Hans verabschiedete sich von Anna mit feurigen Kϋssen und ging nach Hause" into the microphone 4. Speech information SI of the spoken text is then delivered by the microphone 4 to the A/D converter stage 7 and by the latter as digitized speech information SI to the memory stage 8 and stored there. The control information ST delivered by the microphone 4 can be applied to the calculation means 9 and activates there the execution of the speech recognition method. When the speech recognition method is executed, a section of the digitized speech information SI stored in the memory stage 8 is read out each time by the calculation means 9, and recognized words and word sequences contained in the section of the speech information SI are processed in accordance with the so-called "Hidden-Markov-Modell".

In the following will be explained the execution of the speech recognition method with the aid of the section of the speech information SI stored in the memory stage 8 which section corresponds to the part of the spoken text "mit feurigen Kϋssen".

Fig. 6 shows the fifth table 25 in which are entered possible expression sequences AF determined during the execution of the speech recognition method. In order to simplify the explanation of the execution of the speech recognition method, the first row of the fifth table 25 additionally contains the word sequence WF = "mit feurigen Kϋssen", which is not available during the actual execution of the speech recognition method.

When the speech recognition method is executed, the calculation means 9 determine the phoneme information PI contained in the section of the speech information SI, as this has already been known for a long time. Determined phoneme information PI is then compared with phoneme information PI(WI) stored in the word memory 10. When during this comparison phoneme information PI corresponding to the determined phoneme information PI is found in the word memory 10, stored word information WI assigned to this found phoneme information PI is inserted as recognized word information WI into a possible expression sequence AF of the fifth table 25.

The speech recognition means 6 are arranged for determining a corresponding phoneme value PUW for recognized word information WI that was inserted into possible expression sequences AF. A corresponding value PUW here indicates the extent of correspondence or match of the stored phoneme information PI of the recognized word information WI with the phoneme information PI contained in the speech information SI delivered during the section of the speech recognition means 6 the word information WI, for which phoneme information was recognized by the speech recognition means 6. A small magnitude of a corresponding phoneme value PUW characterizes a great correspondence or match of compared phoneme information PI and a high probability that a word was recognized correctly.

On the third row of the fifth table 25 is stated a first possible expression sequence AF1, which contains the recognized word information WI = B ("mit"), WI = C ("freundlichen") and WI = F ("Gruβen"). When the phoneme information PI contained in the section of the speech information SI is compared with phoneme information PI(B) stored in the word memory 10 for the word information WI = B of the word "mit", a very large correspondence of the phoneme information PI was determined, as a result of which a corresponding phoneme value PUW1 of the word information WI = B of the word "mit" has the value "4", which corresponding phoneme value PUW1 is indicated on the fourth row and in the second column of the fifth table 25. The probability that the word contained in the spoken text is really the recognized word "mit" is therefore very great.

The word "feurigen" contained in the section of the speech information SI of the spoken text does not belong to the vocabulary of the speech recognition device 2 and is therefore not stored in the word memory 10. When the first possible expression sequence AF1 is determined, the word "freundlichen" is recognized which is similar to the unknown word "feurigen" and has the word information WI = C stored in the word memory 10. A corresponding phoneme value PUW1 determined for the word "freundlichen" has the value "35", because the compared phoneme information PI of the section of the speech information SI and of the stored phoneme information PI(C) have only a moderate correspondence. The probability that the word "freundlichen" was recognized correctly is therefore not very high. For the third word "Kϋssen" of the section of the speech information SI to be recognized by the speech recognition device 2, the calculation means 9 determine the word "Grϋβen" from the word memory 10 when the speech recognition method is executed. This is because the user has pronounced the word "Kϋssen" slightly differently than otherwise and, therefore, the phoneme information PI(D) assignedly stored in the word memory 10 with the word information WI = D does not unambiguously correspond to the phoneme information PI contained in the section of the speech information SI. The corresponding phoneme value PUW1 determined during this operation for the word "Griiβen" has the value "20" because the compared phoneme information PI of the section of the speech information SI and of the stored phoneme information PI(F) of the word "Grϋβen" has only a moderate correspondence. When the speech recognition method is executed, the speech recognition means 6 and, in addition, the calculation means 9, are not only arranged for determining words of possible expression sequences AF by comparing phoneme information PI contained in the section of the speech information SI with phoneme information PI(WI) stored in the word memory 10. Additionally, transition probability information UWI(WFI) of word sequences contained in possible expression sequences AF are determined, which transition probability information is fetched from the speech model word memory 12, and entered in the fifth table 25.

The calculation means 9 determine from the second row of the third table 23 stored in the speech model word memory 12 the transition probability information UWI(B+C) = 5 contained in the word sequence "mit freundlichen" having the word sequence information WFI = B+C, which transition probability information is entered on the second row and in the third column of the fifth table 25. The transition probability information UWI(C+F) = 2 of the word sequence "freundlichen Grϋβen" having the word sequence information WFI = C+F is also determined and entered on the second row and in the fifth column of the fifth table 25.

Since the sequence "mit freundlichen Grϋβen" represents a typical formulation in the German language and occurs in many letters, the transition probability information UWI1 has a small value. When the speech recognition method is executed, the speech recognition means 6 and thus the calculation means 9, are arranged for determining overall probability information GWI1 and entering this overall probability information GWI1 in the seventh column of the fifth table 25 based on the corresponding phoneme value PUWI of the first possible expression sequence AFl. During this operation, the corresponding phoneme values PUWI and transition probability information elements UWI1 are added together and they produce the overall probability information GWI1 = 66. A low value of the overall probability information GWI indicates a high probability of the possible expression sequence AF corresponding to a spoken word sequence contained in the section of the speech information SI.

It may be stated that for obtaining overall probability information GWI also prior to adding together the corresponding phoneme values PUW and transition probability information UWI, a weight factor can be multiplied by the corresponding phoneme values PUW or the transition probability information UWI in order to lend more weight to corresponding phoneme values PUW or transition probability information UWI.

When the speech recognition method is executed further, the calculation means 9 determine a second possible expression sequence AF2 having the word sequence "mit fremden FϋBen" and put this sequence on the sixth row of the fifth table 25. The calculation means 9 determine, as indicated above, corresponding phoneme values PUW2 for the word information WI of the second possible expression sequence AF2, which phoneme values PUW2 are put on the seventh row of the fifth table 25. By evaluating transition probability information UWI(WFI) stored in the speech model word memory 12, the calculation means 9 determine transition probability information UWI2 of the second possible expression sequence AF2 and enter it on the fifth row of the fifth table 25. Since the word sequence "mit fremden Fϋβen" hardly ever occurs in the

German language, the transition probability information UWI2 has a relatively high value. When the speech recognition method is executed, the calculation means 9 determine the overall probability information GWI2 = 139 of the second possible expression sequence AF2 as a sum of the transition probability information UWI2 and the corresponding phoneme value PUW2 and writes this in the seventh column of the fifth table 25.

When the speech recognition method is executed, the calculation means 9 determine a third possible expression sequence AF3 having the word sequence "mit freuchten Kϋssen" and write this sequence on the ninth row of the fifth table 25. When a corresponding phoneme value PUW3 = 9 is determined for the word "Kϋssen", there will be no low value as explained before, because the word "Kϋssen" was pronounced slightly differently and, therefore, slightly deviates from the phoneme information PI(D) stored in the word memory 10 in the section of the speech information SI of the phoneme information PI contained in the spoken text. Corresponding phoneme values PUW3 determined by the calculation means 9 and determined transition probability information UWI3 of the third possible expression sequence AF3 yield overall probability information GWI3 = 78 of the third possible expression sequence AF3 written in the seventh column of the fifth table 25.

The speech recognition means 6 and also the calculation means 9 are arranged for determining phoneme information PI(SWI) stored in the sub-word memory 11 and corresponding to the phoneme information PI contained in the section of the speech information SI, and for producing as recognized sub-word information SWI the sub-word information SWI stored in the sub-word memory 11 and assigned to this stored phoneme information PI(SWI).

As a result, in lieu of a word probably recognized wrong in possible expression sequences AF, advantageously a sub-word sequence is written in further possible expression sequences, which sequence largely corresponds to a new word contained in a spoken text and represents this when the recognition method is executed.

According to the first example of the execution of the speech recognition method, the calculation means 9 determine a fourth possible expression sequence AF4 which is written on the twelfth row of the fifth table 25. In this fourth expression sequence AF4, a sub-word sequence having the sub-word sequence information SWFI = c+a+d+e+b was recognized for the phoneme information PI of the section of the speech information SI for which speech information SI the words "freundlichen", "fremden" and "feuchten" were determined in the other possible expression sequences AFl, AF2 and AF3, which words have all the relatively high corresponding phoneme values PUW,. This sub-word sequence is developed from a concatenation of the sub-words "f", "eu", "r", "i", and "gen". Since the phoneme information PI(SWI) of these sub-words stored in the sub-word memory 11 very accurately copy the phoneme information PI contained in this section of the speech information SI, the sub-word information SWI has a very low corresponding phoneme value PTJW4 = 1. The corresponding phoneme values PUW4 = 1 of the sub-word information SWI are written on the thirteenth row and in the fourth column of the fifth table 25.

The calculation means 9 are arranged for determining transition probability information UWI(SWFI) stored in the speech model sub- word memory 13 and assigned to the sub-word information SWI = c+a+d+e+b of the sub- word sequence SWF. Transition probability information UWI(SWFI) thus determined is written on the eleventh row and in the fourth column of the fifth table 25.

There may be observed that a value of "32" evolved from the adding together of the transition probability information UWI14 of the sub-words of the sub-word sequence SWF and the corresponding phoneme value PUW4 of the sub-words of the sub-word sequence is smaller than the corresponding phoneme value PUW of the words determined in the other possible expression sequences AFl, AF2, and AF3 for the phoneme information PI of this section of the speech information SI. This makes clear that new words can very well be copied by sub-word sequences. According to the example of embodiment, no transition probability information

UWI is split up from words into sub-words or built from sub-words to. words as a result of which no values are written on the eleventh row and in the third column and in the fifth column for the transition probability information UWI 4 of the fourth possible expression sequence AF4. However, there may be stated that determining such transition probability information UWI may also be advantageous.

For the overall probability information GWI4 of the fourth possible expression sequence AF4, the determined corresponding phoneme values PUW4 and transition probability information UWI4 of the fourth expression sequence AF4 are added together, so that overall probability information GWI4 = 45 of the fourth expression sequence AF4 is determined and written in the seventh column of the fifth table 25.

As a result, by evaluating transition probability information UWI stored in the speech model word memory 12 and the speech model sub-word memory 13, and by evaluating expression sequences AF of determined corresponding phoneme values PUW, which expression sequences AF are possible for recognized word information WI and recognized sub-word information SWI, the speech recognition means 6 are arranged for recognizing an expression sequence AF containing recognized word information WI and/or recognized sub- word information SWI as recognized text that has the largest overall probability with the smallest overall probability information GWI. The calculation means 9 then determine the fourth expression sequence AF4 having the overall probability information GWI4 = 45 as text recognized for the respective section of the speech information SI.

This affords the advantage that in the word sequence to be recognized "mit feurigen Kϋssen", the new words "mit" and "Kϋssen" surrounding "feurigen" not contained in the word memory 10 were not recognized erroneously as would have been the case, for example, in the first expression sequence AFl having the next higher overall probability information GWI1 = 66. In addition, by evaluating both the transition probability information UWI and the corresponding phoneme value PUW, the overall probabilities of possible expression sequences AF are determined extremely accurately, so that an extremely good recognition rate of the speech recognition device 2 is attained. When the speech recognition method is executed further, the recognized expression sequence AF4 and the phoneme information PI contained in the respective section of the speech information SI is delivered to the word determining means 16. The word determining means 16 determine a probable spelling of the new word by evaluating background information stored in the background information memory 17, which new word is represented by the sub-word sequence ("feuNTgen") contained in the fourth expression sequence AF4.

Fig. 7 shows a sixth table 26 which contains word information WI and assignedly stored phoneme information PI(WI), which is stored in the background information memory 17 as background information. Word information WI of a very large vocabulary common in the German language and not limited to a certain "context" is stored in the background information memory 17.

In a first step of the speech recognition method for determining a probably correct spelling of a new word, the phoneme information PI of the section of the speech information SI for which the sub-word sequence ("feuNTgen") was determined is compared with phoneme information PI(WI) stored in the background information memory. If the word "feurigen" is stored in the background information memory 17, the word determining means 16 determine the new word with this spelling.

When the word determining means 16 for determining a probably correct spelling of a new word do not find any corresponding phoneme information PI in the background information memory 17 in the first step of the speech recognition method, the word determining means 16 carry out the second step indicated hereinafter. The word determining means 16 then compare parts of the phoneme information PI of the sub-word sequence ("f euNTgen" with phoneme information PI(WI) stored in the background information memory 17 and determine what spelling the parts of word information WI assigned to this stored phoneme information PI(WI) have. For example, the part of the phoneme information PI ("eur") of the phoneme information PI of the respective section of the speech information SI is also found, inter alia, in the phoneme information PI(ε) of the word "Heurigen" stored in the background information memory 17. The spelling common to a plurality of words found is also used for the new word by the word determining means 16. With the sub-word sequence of the recognized expression sequence AF4 the word determining means 16 determine that the sub-words can simply be combined to obtain the probably correct spelling of the new word.

After this, when the speech recognition method is executed further, the recognized fourth expression sequence AF4, in which the sub-word sequence ("f eu^"r~i~gen") is replaced by the word information NWI of the new word ("feurigen") determined by the word determining means 16, is delivered as recognized text to the output terminal 18 and from there to the monitor 19. Consequently, after the speech recognition method has been executed in the speech recognition device 2, the monitor displays for all the sections of the speech information SI stored in the memory stage 8 the recognized text "Hans verabschiedete sich von Anna mit feurigen Kϋssen und ging nach Hause".

A user of the speech recognition device 2 then has the option, by actuating one of the keys of the keyboard 20, to change the text shown on the monitor 19 and specifically correct the spelling of a new word. Such a new spelling of a new word would then be again delivered to the monitor 19 by the word determining means 16 via the output terminal 18 and displayed by means of the monitor 19.

Since the spelling of the new word "feurigen" has already been determined correctly by the word determining means 16, a modification of the spelling of the new word is not necessary and the user of the speech recognition device 2 is enabled in a manner not further shown in Fig. 1 to confirm and store the new word in the word memory 10 by actuating a key of the keyboard 20. Subsequently, the word information NWI of the new word is stored in the word memory 10 as word information WI and assigned phoneme information PI(WI) together with the phoneme information PI(NVVI) of the new word contained in the section of the speech information SI. This affords the advantage that the vocabulary of the speech recognition device

2 is enlarged by the new word "feurigen", so that, when the speech recognition method is executed next, the word "feurigen" contained in speech information SI of a spoken text is immediately recognized as a word having the correct spelling. Furthermore, since the phoneme information PI contained in the section of the speech information spoken by the user is stored immediately, the advantage is obtained that the user need not train the pronunciation of a new word.

According to a second example of embodiment of the invention, English language information is stored in a word memory 10, a sub-word memory 11, a speech model word memory 12, a speech model sub-word memory 13 and a background information memory 17 of a speech recognition device 2 whose structure corresponds to that of the speech recognition device 2 shown in Fig. 1. With this speech recognition device 2 according to the second example of embodiment, speech information SI of a text spoken in the English language can be processed. According to a second example of embodiment, a user pronounces the text "The

Toscana is a friendly and kind region of Italy". During the execution of the speech recognition method for the section of the speech information SI of the spoken text "and kind region", the calculation means 9 of the speech recognition device 2 according to the second example of embodiment determine amongst other possible expression sequences also a fifth possible expression sequence AF5 "and kind regards" and a sixth possible expression sequence AF6 "and kind r~i~gion".

The fifth possible expression sequence AF5 contains a formulation that is typical in the English language, as a result of which transition probability information UWI1 fetched from the speech model word memory 12 of the fifth expression sequence AF5 has small values. The sixth possible expression sequence AF6 contains the sub- word sequence "r~i^~gion", because the word "region" is not stored in the word memory 10.

Advantageously, a new word contained in the speech information SI, but not yet stored in the word memory 10, has already been replaced by a sub-word sequence during the execution of the speech recognition method, so that a wrong recognition of a possible expression sequence which has a typical formulation and, therefore, a high overall probability is avoided.

During the further execution of the speech recognition method, the word determining means 16 are arranged for determining the correct spelling ("region") of the sub- word sequence "r~i~gion" as word information NWI of the new word by evaluation of background information stored in a background information memory 17. Advantageously, this causes a word new to the speech recognition device 2 to be shown in a probably correct spelling on a monitor 19. The new word information NWI can then be stored in the word memory 10 with a spelling modified, as required, by a user, so that, advantageously, the vocabulary of the speech recognition device 2 is enlarged. It may be observed that a recognized sub-word sequence may be formed, for example, by the sub-word sequence "k~o~m~p~j~u~t~a. The word determining means are then arranged for determining the correct spelling of the new word "computer" by evaluation of the background information stored in the background information memory ", in that a comparison is made with the spelling customary in the German or English language. It may be observed that the background information memory may also store other background information containing statistical information about a language.

It may be observed that in the speech model word memory and in the speech model sub-word memory not only word sequences comprising two words (bigrams), but also word sequences comprising three or more words may be stored with assignedly stored transition probability information UWI.

Claims

CLAIMS:

1. A speech recognition device (2) including a word memory (10) in which word information (WI) and assigned phoneme information (PI(WI)) of at least a first and a second word forming the vocabulary of the speech recognition device (2) can be stored, and including speech recognition means (6) to which speech information (SI) containing phoneme information (PI) can be applied and which are arranged for determining phoneme information (PI(WI)) stored in the word memory (10) and corresponding to the applied phoneme information (PI), and for producing as recognized word information (WI) the word information (WI) stored in the word memory (10) and assigned to this stored phoneme information (PI(WI)), and including a sub-word memory (11) in which sub-words forming parts of words can be stored as sub-word information (SWI) and assigned phoneme information (PI(SWI)) of at least a first and a second sub-word, characterized in that the speech recognition means (6) are provided for determining phoneme information (PI(SWI)) stored in the sub-word memory (11) and corresponding to the applied phoneme information (PI), and for producing as recognized sub-word information (SWI) the sub-word information (SWI) stored in the sub-word memory (11) and assigned to this stored phoneme information (PI(SWI)), and in that a speech model word memory (12) is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed by these words can be stored as transition probability information (UWI(WFI)), and in that the speech recognition means (6) are provided for forming at least two expression sequences (AF) which contain recognized word information (WI) and/or recognized sub-word information (SWI), and in that the speech recognition means (6), by evaluating transition probability information (UWI(WFI)) stored in the speech model word memory (12), are provided for recognizing one expression sequence (AF) from the at least two expression sequences (AF) as the recognized text that has the highest overall probability (GWI)

2. A speech recognition device (2) as claimed in claim 1, characterized in that a speech model sub-word memory (13) is provided in which at least a probability of occurrence of the second sub-word after the first sub-word in a sub-word sequence formed by these sub- words can be stored as transition probability information (UWI(SWFI)) and in that the speech recognition means (6), by evaluating transition probability information (UWI(WFI), UWI(SWFT)) stored in the speech model word memory (12) and the speech model sub-word memory (13), are arranged for recognizing an expression sequence (AF) of the at least two expression sequences (AF) as recognized text that has the highest overall probability (GWI).

3. A speech recognition device (2) as claimed in claim 1, characterized in that the speech recognition means (6) are arranged for determining a corresponding phoneme value (PUW) for recognized word information (WI) and recognized sub-word information (SWI) of the at least two expression sequences (AF), while a corresponding phoneme value (PUW) features the extent of correspondence of the stored phoneme information (PI(WI), PI(SWI)) of the recognized word information (WI) or recognized sub-word information (SWI) to that part of the phoneme information (PI) applied to the speech recognition means (6) and contained in speech information (SI), for which the word information (WI) or sub-word information (SWI) was recognized by the speech recognition means (6), and in that, by evaluating the corresponding phoneme values (PUW), the speech recognition means (6) are additionally arranged for recognizing an expression sequence (AF) of the at least two expression sequences (AF) as recognized text that has the highest overall probability (GWI).

4. A speech recognition device (2) as claimed in claim 1, characterized in that word determining means (16) are provided to which can be applied phoneme information (PI) contained in the applied speech information (SI) and for this phoneme information (PI) expression sequences (AF) of the recognized text recognized by the speech recognition means (6) and in that a background information memory (17) is provided in which background information (WI, PI(WI)) relating to the spelling of words can be stored in dependence on its phoneme information (PI), and in that the word determining means (16), by evaluating the stored background information (WI, PI(WI)) are arranged for determining a probable spelling of at least one sub-word sequence contained in a recognized expression sequence (AF) and for producing this sub-word sequence in the determined spelling as word information (NWI) of a new word.

5. A speech recognition device (2) as claimed in claim 4, characterized in that display means (19) can be connected to the speech recognition device (2), by which display means at least a new word produced by the word determining means (16) can be displayed, and in that input means (20) can be connected to the speech recognition device (2) by which means the spelling of a displayed new word can be changed, and in that the word determining means (16) are arranged for storing a changed new word as word information (WI) in the word memory (10).

6. A speech recognition device (2) as claimed in claim 5, characterized in that the word determining means (16) are arranged for determining the part of the phoneme information (PI) applied to the word determining means (16) for which part the sub-word sequence of the new word was recognized by the speech recognition means (6), and in that the word determining means (16) are arranged for storing this determined part of the phoneme information (PI) assigned to the word information (WI) of the new word in the word memory (10).

7. A speech recognition method for recognizing spoken texts containing new words by means of a speech recognition device (2) which includes a word memory (10) in which word information (WI) and assigned phoneme information (PI(WI)) of at least a first and a second word forming the vocabulary of the speech recognition device (2) is stored, and includes speech recognition means (6) to which speech information (SI) containing phoneme information (PI) of a spoken text is applied and which determine phoneme information (PI(WI)) stored in the word memory (10) and corresponding to the applied phoneme information (PI), and which produce as recognized word information (WI) the word information (WI) stored in the word memory (10) and assigned to this phoneme information (PI(WI)), and which includes a sub-word memory (11) in which sub-words forming parts of words are stored as sub-word information (SWI) and assigned phoneme information (PI(SWI)) of at least a first and a second sub-word, characterized in that the speech recognition means (6) determine phoneme information (PI(SWI)) stored in the sub-word memory (11) and corresponding to the applied phoneme information (PI) and produce as recognized sub-word information (SWI) in the sub-word information (SWI) stored in the sub-word memory (11) and assigned to this stored phoneme information (PI(SWI)), and in that a speech model word memory (12) is provided in which at least a probability of occurrence of the second word after the first word in a word sequence formed by these words is stored as transition probability information (UWI(WFT)), and in that the speech recognition means (6) form at least two expression sequences (AF) which contain recognized word information (WI) and/or recognized sub-word information (SWI), and in that the speech recognition means (6) evaluate transition probability information (UWI(WFI)) stored in the speech model word memory (12) to recognize one expression sequence (AF) from the at least two expression sequences (AF) as the recognized text that has the highest overall probability (GWI).

8. A speech recognition method as claimed in claim 7, characterized in that in a speech model sub-word memory (13) of the speech recognition device (2) at least a probability of occurrence of the second sub-word after the first sub-word in a sub-word sequence formed by these sub-words is stored as transition probability information (UWI(SWFI)) and in that the speech recognition means (6) evaluate transition probability information (UWI(WFI), UWI(SWFI)) stored in the speech model word memory (12) and the speech model sub-word memory (13), and recognize an expression sequence (AF) of the at least two expression sequences (AF) as recognized text that has the highest overall probability (GWI).

9. A speech recognition method as claimed in claim 7, characterized in that the speech recognition means (6) determine a corresponding phoneme value (PUW) for recognized word information (WI) and recognized sub-word information (SWI) of the at least two expression sequences (AF), while a corresponding phoneme value (PUW) features the extent of correspondence of the stored phoneme information (PI(WI), PI(SWI)) of the recognized word information (WI) or recognized sub-word information (SWI) to that part of the phoneme information (PI) applied to the speech recognition means (6) and contained in speech information (SI) for which the word information (WI) or sub-word information (SWI) was recognized by the speech recognition means, and in that, in addition, the speech recognition means (6) evaluate the corresponding phoneme values (PUW) in order to recognize an expression sequence (AF) of the at least two expression sequences (AF) as recognized text that has the highest overall probability (GWI).

10. A speech recognition method as claimed in claim 7, characterized in that word determining means (16) are provided to which is applied phoneme information (PI) contained in the applied speech information (SI) and for this phoneme information (PI) expression sequences (AF) of the recognized text recognized by the speech recognition means (6) and in that a background information memory (17) is provided in which background information relating to the spelling of words is stored in dependence on its phoneme information (PI(WI)), and in that the word determining means (16) evaluate stored background information (WI, PI(WI)) and determine a probable spelling of at least one sub-word sequence contained in a recognized expression sequence (AF) and produce this sub-word sequence in the determined spelling as word information (NWI) of a new word.

11. A speech recognition method as claimed in claim 10, characterized in that display means (19) are connected to the speech recognition device (2), by which display means at least a new word produced by the word determining means (16) is displayed, in that input means (20) are connected to the speech recognition device (2) by which input means the spelling of a displayed new word can be changed, and in that the word determining means (16) store a changed new word as word information (WI) in the word memory (10).

12. A speech recognition method as claimed in claim 11, characterized in that the word determining means (16) determine the part of the phoneme information (PI) applied to the word determining means (16) for which part the sub-word sequence of the new word was recognized by the speech recognition means (6), and store this part of the phoneme information (PI) assigned to the word information (WI) of the new word in the word memory (10).