US20160005400A1 - Speech-recognition device and speech-recognition method - Google Patents
Speech-recognition device and speech-recognition method Download PDFInfo
- Publication number
- US20160005400A1 US20160005400A1 US14/655,141 US201314655141A US2016005400A1 US 20160005400 A1 US20160005400 A1 US 20160005400A1 US 201314655141 A US201314655141 A US 201314655141A US 2016005400 A1 US2016005400 A1 US 2016005400A1
- Authority
- US
- United States
- Prior art keywords
- recognition
- result
- reading
- acoustic
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 15
- 239000013598 vector Substances 0.000 claims description 36
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000003068 static effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 7
- 239000000284 extract Substances 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 241000282994 Cervidae Species 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 101000911772 Homo sapiens Hsc70-interacting protein Proteins 0.000 description 1
- 101001139126 Homo sapiens Krueppel-like factor 6 Proteins 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- WDQNIWFZKXZFAY-UHFFFAOYSA-M fentin acetate Chemical compound CC([O-])=O.C1=CC=CC=C1[Sn+](C=1C=CC=CC=1)C1=CC=CC=C1 WDQNIWFZKXZFAY-UHFFFAOYSA-M 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G06F17/2735—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the present invention relates to a speech-recognition device and a speech-recognition method for acquiring recognition results respectively from an external speech recognizer and an internal speech-recognizer to thereby determine a final recognition result.
- Patent Document 1 there is disclosed a technique of not only merely performing speech recognition by the server followed by receiving the result by the client, but also performing, depending on a speech, speech recognition both by the client and by the server followed by differently indicating both of the recognition results or selecting either one of the results. Specifically, in Patent Document 1, there is described that, when either one of the recognition results by the client or the server is to be selected, a higher one in acoustic likelihood is selected.
- Patent Document 1 Japanese Patent Application Laid-open No. 2010-85536
- This invention has been made to solve the problem as described above, and an object thereof is to provide a speech-recognition device and a speech-recognition method by which the recognition result by the client and the recognition result by the server are compared with each other under same conditions to thereby enhance a final recognition accuracy.
- a speech-recognition device of the invention comprises: an acoustic model in which feature quantities of speeches are modelized; a language model in which notations and readings of more than one recognition-object words of the speech-recognition device are stored; a reading dictionary in which pairs of notations and readings of the recognition-object words and other words than the recognition-object words are stored; an analyzer that analyzes input speech data to calculate a feature vector; an internal recognizer that performs, using the acoustic model, pattern collation between the feature vector calculated by the analyzer and each of words stored in the language model to thereby calculate their respective acoustic likelihoods, followed by outputting, as an internal recognition result, the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; a reading-addition processor that acquires an external recognition result from recognition processing of the input speech data by an external recognizer, adds a reading for said external recognition result by use of the reading dictionary, and
- a speech-recognition method of the invention comprises: a transmission step of transmitting input speech data to an external recognizer; an analysis step of analyzing the input speech data to calculate a feature vector; an internal recognition step of performing, using an acoustic model in which feature quantities of speeches are modelized, pattern collation between the feature vector calculated in the analysis step and each of words stored in a language model in which notations and readings of more than one recognition-object words of the speech-recognition device are stored, to thereby calculate their respective acoustic likelihoods, followed by outputting, as an internal recognition result, the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; a reading-addition step of acquiring an external recognition result from recognition processing of the input speech data by the external recognizer, adding a reading for said external recognition result by use of a reading dictionary in which pairs of notations and readings of the recognition-object words and other words than the recognition-object words are stored, and
- a speech-recognition device and a speech-recognition method by which the acoustic likelihood of the internal recognition result and the acoustic likelihood of the external recognition result are calculated using the same acoustic model and compared with each other, so that the final recognition accuracy is enhanced.
- FIG. 1 is a block diagram showing a configuration of a speech-recognition device according to Embodiment 1 of the invention.
- FIG. 2 is a diagram for illustrating an example of details of a language model included in the speech-recognition device according to Embodiment 1.
- FIG. 3 is a diagram illustrating an example of details of a reading dictionary included in the speech-recognition device according to Embodiment 1.
- FIG. 4 is a flowchart showing operations of the speech-recognition device according to Embodiment 1.
- FIG. 5 is a diagram illustrating, as a modified example, an example of details of a reading dictionary in English of the speech-recognition device according to Embodiment 1.
- FIG. 6 is a block diagram showing a configuration of a speech-recognition device according to Embodiment 2 of the invention.
- FIG. 7 is a flowchart showing operations of the speech-recognition device according to Embodiment 2.
- FIG. 8 is a diagram illustrating an example of details of a reading dictionary included in the speech-recognition device according to Embodiment 3.
- FIG. 9 is a block diagram showing a configuration of a speech-recognition device according to Embodiment 4 of the invention.
- FIG. 10 is a diagram illustrating an example of details of a result-determination language model included in the speech-recognition device according to Embodiment 4.
- a speech-recognition device 1 includes a transmitter 3 , an analyzer 5 , an internal recognizer 7 , a language model 8 , an acoustic model 9 , a reading-addition processor 12 , a reading dictionary 13 , a re-collation processor 15 and a re-collation result 16 .
- the speech-recognition device 1 corresponds to a client for constituting a client-server speech-recognition system, which may be mounted in or installed in an existing instrument, for example, a smartphone or like portable instrument carried by a user, a navigation device mounted on or brought into a vehicle or like moving object, or the like; or may be used instead as a separate unit.
- An external recognizer 19 is assumed to be a speech-recognition server connected to the speech-recognition device 1 through a network. It may be directly connected by wire or wirelessly, without through a network.
- the acoustic model 9 is storing therein acoustic models obtained from modelization of feature vectors of speeches.
- the acoustic models are assumed to be obtained from modelization of phonemes, and in the acoustic model 9 , there are stored the acoustic models for all phonemes. With the acoustic models for all phonemes, it is possible to modelize a feature vector of a speech about any word, by making access to an acoustic model of its phoneme.
- the feature vector to be modelized by the acoustic model 9 (namely, a feature vector 6 in FIG. 1 ) is assumed, for example, to be an MFCC (Mel Frequency Cepstral Coefficient). Further, the acoustic model is assumed, for example, to be an HMM (Hidden Markov Model).
- the language model 8 is storing therein notations and readings of recognition-object words of the internal recognizer 7 .
- “reading” referred to herein means a symbol sequence that can be associated with the acoustic model 9 .
- the recognition objects of the speech-recognition device 1 are assumed to be facility names in Kanagawa Prefecture.
- FIG. 2 An example of details of the language model 8 in this case is shown in FIG. 2 . In FIG. 2 , phoneme sequences are used as the readings.
- the reading dictionary 13 is storing therein pairs of notations and readings of a large number of words including also words not subject to the recognition by the internal recognizer 7 .
- “reading” is, similarly to the language model 8 , assumed to be a symbol sequence that can be associated with the acoustic model 9 .
- the readings in the reading dictionary 13 are phoneme sequences. An example of details of the reading dictionary 13 is shown in FIG. 3 .
- These language model 8 , acoustic model 9 and reading dictionary 13 may be stored in a common memory element, memory device or the like, or may be stored in independent memory elements, memory devices or the like, respectively.
- the speech-recognition device 1 it is allowable to configure the speech-recognition device 1 to have a memory storing a program and a CPU for implementing the program so that, when the CPU implements the program, the functions (details will be described later) carried by the transmitter 3 , the analyzer 5 , the internal recognizer 7 , the reading-addition processor 12 , the re-collation processor 15 and the result-determination processor 17 are achieved by software. Instead, a part of the functions may be achieved by hardware.
- Step ST 1 when a user makes a speech, an input speech 2 about that speech is inputted to the transmitter 3 .
- the transmitter 3 A-D converts the input speech 2 into speech data 4 , and outputs the data to the analyzer 5 .
- the transmitter 3 transmits the same speech data 4 to the external recognizer 19 .
- Step ST 2 the analyzer 5 converts the speech data 4 into a feature vector 6 and outputs it to the internal recognizer 7 and the re-collation processor 15 .
- the feature vector 6 is assumed to be an MFCC, for example.
- Step ST 3 using the language model 8 and the acoustic model 9 , the internal recognizer 7 performs according to, for example, a Viterbi algorithm, pattern collation (pattern matching) between the feature vector 6 and each of the words written in the language model 8 , to thereby calculate their respective acoustic likelihoods, followed by selecting the word whose acoustic likelihood is highest and outputting it to the result-determination processor 17 as an internal recognition result 10 .
- a Viterbi algorithm pattern collation (pattern matching) between the feature vector 6 and each of the words written in the language model 8 , to thereby calculate their respective acoustic likelihoods, followed by selecting the word whose acoustic likelihood is highest and outputting it to the result-determination processor 17 as an internal recognition result 10 .
- the internal recognition result 10 is composed of a notation, reading and acoustic likelihood of the word [Kanji]. For example, when the input speech 2 is “Maihama International Stadium (maihamakokusaikyoogizyoo)”, although there is no same word in the language model 8 , a word whose acoustic likelihood is highest among the words in the language model 8 is outputted. In this example, let's assume that the acoustic likelihood of “Yokohama International Stadium (yokohamakokusaikyoogizyoo)” is highest. Accordingly, the internal recognizer 7 outputs the notation “Yokohama International Stadium”, reading “yokohamakokusaikyoogizyoo” and acoustic likelihood of that word, as the internal recognition result 10 .
- Step ST 4 the reading-addition processor 12 waits for an external recognition result 11 sent back from the external recognizer 19 .
- the external recognition result 11 at least includes a notation of the word that is a recognition result of the speech data 4 , but does not include a reading of that word.
- the reading-addition processor 12 when received the external recognition result 11 (Step ST 4 “YES”), refers to the reading dictionary 13 to thereby extract therefrom a reading of a notation matched to the notation of the word included in the external recognition result 11 , and outputs the reading to the re-collation processor 15 as a reading-added result 14 (Step ST 5 ).
- the reading-addition processor 12 when the external recognition result 11 is “Maihama International Stadium”, the reading-addition processor 12 refers to the reading dictionary 13 to thereby extract the matched notation “Maihama International Stadium” and its reading “maihamakokusaikyoogizyoo”, and outputs them as the reading-added result 14 .
- Step ST 6 the re-collation processor 15 uses as its inputs, the feature vector 6 and the reading-added result 14 , and performs, using the same acoustic model as used in pattern collation in the internal recognizer 7 , namely using the acoustic model 9 , pattern collation between the reading of the feature vector 6 and the reading in the reading-added result 14 , to thereby calculate an acoustic likelihood for the reading-added result 14 .
- the pattern collation method by the re-collation processor 15 is assumed to be the same as the pattern collation method used in the internal recognizer 7 . In Embodiment 1, the Viterbi algorithm is used.
- the re-collation processor 15 uses in this manner, the same acoustic model and pattern collation method as for the internal recognizer 7 , the acoustic likelihood of the internal recognition result 10 calculated by the internal recognizer 7 and that of the external recognition result 11 calculated by the external recognizer 19 become comparable with each other.
- the re-collation processor 15 outputs the re-collation result 16 composed of the reading-added result 14 and the calculated acoustic likelihood to the result-determination processor 17 .
- Step ST 7 the result-determination processor 17 uses as its inputs, the internal recognition result 10 and the re-collation result 16 , sorts the recognition results in descending order of the acoustic likelihood, and outputs them as a final recognition result 18 .
- the internal recognition result 10 by the internal recognizer 7 is “Yokohama International Stadium”
- the external recognition result 11 by the external recognizer 19 is “Maihama International Stadium”
- the speech-recognition device 1 is configured to include: the acoustic model 9 in which feature quantities of speeches are modelized; the language model 8 in which notations and readings of more than one recognition-object words of the speech-recognition device 1 are stored; the reading dictionary 13 in which pairs of notations and readings of a large number of words including not only the recognition-object words but also other words than the recognition-object words are stored; the analyzer 5 that analyzes the speech data 4 of the input speech 2 to calculate the feature vector 6 ; the internal recognizer 7 that performs, using the acoustic model 9 , pattern collation between the feature vector 6 calculated by the analyzer 5 and each of words stored in the language model 8 , to thereby calculate their respective acoustic likelihoods, followed by outputting, as the internal recognition result 10 , the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; the reading-addition processor
- the acoustic likelihood can be calculated for the external recognition result 11 by using the same acoustic model and pattern collation method as for the internal recognizer 7 , so that exact comparison can be made between the acoustic likelihood of the external recognition result 11 and the acoustic likelihood of the internal recognition result 10 , thus making it possible to enhance the final recognition accuracy. Accordingly, even in the case, for example, where the speech-recognition device 1 has insufficient hardware resources and the number of words in the language mode 8 is small, it is possible to utilize the recognition result by the external recognizer 19 having a large-scale language model, thus providing an effect that the recognition performance of the speech-recognition device 1 is improved.
- the speech-recognition device 1 is also applicable to a language other than Japanese.
- the speech-recognition device 1 when it is to be applied to English, it suffices to change the language model 8 , the acoustic model 9 and the reading dictionary 13 to the respective corresponding ones for English. In that case, it suffices to record notations and readings of a large number of English words in the reading dictionary 13 .
- the readings in the reading dictionary 13 are provided as indications that can be associated with the acoustic model 9 .
- the acoustic model 9 comprises English phonemes
- the readings in the reading dictionary 13 are provided as phoneme indications or symbols convertible to the phoneme indications.
- FIG. 5 an example of English reading dictionary 13 is shown. Written at the first column in FIG. 5 are the notations and at the second column are the phenome indications as the readings of those notations.
- readings of a large number of words are stored so as to avoid no presence of the word matched to a word in the external recognition result 11 .
- the result-determination processor 17 provides the thus-determined recognition result as the final result.
- FIG. 6 is a block diagram showing a configuration of a speech-recognition device 1 according to Embodiment 2.
- the speech-recognition device 1 according to Embodiment 2 is characterized by the addition of a second acoustic model 20 .
- the second acoustic model 20 is storing therein acoustic models obtained from modelization of feature vectors of speeches. It should be noted that, the second acoustic model 20 is assumed to be an acoustic model that is more precise and is higher in recognition accuracy than the acoustic model 9 . For example, in a case where phonemes are to be modelized in this acoustic model, triphone phonemes in consideration of not only a target phoneme for modelization, but also before-after phonemes of the target phoneme, are assumed to be modelized.
- the second phoneme /s/ in “Morning/asa” and the second phoneme /s/ in “Stone/isi” are, since they are different in before-after phonemes, modelized into different acoustic models. It is known that this enhances the recognition accuracy. However, variations of acoustic models increase, so that the calculation amount at the pattern collation is increased.
- the transmitter 3 A-D converts the input speech 2 into speech data 4 , and outputs the data to the analyzer 5 and the external recognizer 19 (Step ST 1 ).
- the analyzer 5 and the internal recognizer 7 perform the same operations as those in Embodiment 1 (Steps ST 2 and ST 3 ) to thereby output the internal recognition result 10 .
- the internal recognition result 10 is outputted from the internal recognizer 7 to the result-determination processor 17 ; however, in Step ST 3 in Embodiment 2, it is outputted from the internal recognizer 7 to the re-collation processor 15 .
- Step ST 11 the re-collation processor 15 uses as its inputs, the feature vector 6 and the internal recognition result 10 , and performs, using the second acoustic model 20 , pattern collation between the reading of the feature vector 6 and the reading in the internal recognition result 10 , to thereby calculate an acoustic likelihood for the internal recognition result 10 .
- the pattern collation method at this time is not necessarily the same as the method used by the internal recognizer 7 , the Viterbi algorithm is used in Embodiment 2.
- the re-collation processor 15 outputs the re-collation result 16 a composed of the internal recognition result 10 and the calculated acoustic likelihood to the result-determination processor 17 .
- the second acoustic model 20 since the second acoustic model 20 has variations of the models more than those in the acoustic model 9 , the calculation amount required for the pattern collation is increased; however, the recognition objects of the re-collation processor 15 are limited to the words included in the internal recognition result 10 , so that an increase in processing load can be suppressed to be small.
- the reading-addition processor 12 performs the same operations as those in Embodiment 1 (Steps ST 4 and ST 5 ), to thereby obtain the reading-added result 14 for the external recognition result 11 and output it to the re-collation processor 15 .
- Step ST 12 when the reading-added result 14 is inputted, the re-collation processor 15 obtains, through similar operations to those in Embodiment 1, the re-collation result 16 composed of the reading-added result 14 and its acoustic likelihood, and outputs it to the result-determination processor 17 .
- the second acoustic model 20 is used for the pattern collation.
- Step ST 13 the result-determination processor 17 uses as its inputs, the re-collation result 16 a with respect to the internal recognition result 10 and the re-collation result 16 with respect to the external recognition result 11 , sorts the recognition results in descending order of the acoustic likelihood, and outputs them as the final recognition result 18 .
- the speech-recognition device 1 is configured to include the second acoustic model 20 different to the acoustic model 9 , wherein, using the second acoustic model 20 , the re-collation processor 15 performs pattern collation between the feature vector 6 calculated by the analyzer 5 and the internal recognition result 10 outputted by the internal recognizer 7 , to thereby calculate an acoustic likelihood (re-collation result 16 a ) for the internal recognition result 10 , and performs pattern collation between the feature vector 6 and the reading-added result 14 outputted by the reading-addition processor 12 , to thereby calculate an acoustic likelihood (re-collation result 16 ) for the external recognition result 11 ; and wherein the result-determination processor 17 determines the final recognition result by comparing with each other, the acoustic likelihood of the internal recognition result 10 and the acoustic likelihood of the external recognition result 11 which have been calculated by the re-collation processor 15 .
- the re-collation is performed using the second acoustic model 20 that is more precise and is higher in recognition accuracy than the acoustic model 9 , so that the comparison between the acoustic likelihood of the external recognition result 11 and the acoustic likelihood of the internal recognition result 10 becomes more exact, thus providing an effect of improving the recognition accuracy.
- the reason of not using the second acoustic model 20 in the internal recognizer 7 resides in the fact that when the second acoustic model 20 is used in the pattern collation by the internal recognizer 7 , because the variations of models increases to more than those in the acoustic model 9 , the calculation amount at the time of the pattern collation is increased.
- different kinds of models are used respectively in the acoustic model 9 and the second acoustic model 20 as in Embodiment 2, there is an effect that the recognition accuracy is enhanced while suppressing an increase in calculation amount to be small.
- a speech-recognition device has a configuration that is, on a figure basis, similar to that of the speech-recognition device 1 shown in FIG. 1 or FIG. 6 . Thus, in the followings, description will be made using FIG. 1 in a diverted manner.
- the details in the reading dictionary 13 and the operations of the reading-addition processor 12 and the re-collation processor 15 are modified as described later.
- FIG. 8 is a diagram showing an example of details of a reading dictionary 13 of the speech-recognition device according to Embodiment 3.
- the reading dictionary 13 is also storing therein, other than the dictionary of the words and the facility names shown in FIG. 3 , a dictionary of words in unit of about one character shown in FIG. 8 . Because of having the small-unit word elements in unit of about one character as just described, it becomes possible to add a reading to each of a variety of notations in the external recognition result 11 .
- the transmitter 3 A-D converts the input speech 2 into speech data 4 , and outputs the data to the analyzer 5 and the external recognizer 19 .
- the analyzer 5 and the internal recognizer 7 perform the same operations as those in Embodiment 1 to thereby output the internal recognition result 10 .
- the input speech 2 is “Suzuka Slope (suzukasaka)”
- the “Suzuka Slope” is absent in the language model 8
- pattern collation is performed between that speech and each of the words written in the language model 8 , so that the word whose acoustic likelihood is highest is outputted.
- Embodiment 3 it is assumed that the acoustic likelihood of “Suzuki Liquor Store (suzukisaketen)” is highest. Accordingly, the internal recognizer 7 outputs the notation, reading and acoustic likelihood of that word as the internal recognition result 10 .
- the reading-addition processor 12 waits for an external recognition result 11 sent back from the external recognizer 19 , and when received the external recognition result 11 , refers to the reading dictionary 13 shown in FIG. 8 to thereby extract therefrom a reading of a notation matched to the notation of the word (for example, “Suzuka Slope”) included in the external recognition result 11 .
- the reading dictionary 13 if there is a plurality of readings corresponding to the notation in the external recognition result 11 , the reading-addition processor outputs the plurality of readings.
- the reading-addition processor extracts notations in the reading dictionary 13 that are able to constitute, when coupled together, the notation of the external recognition result 11 .
- This extraction operation can be made, for example, by subjecting the notation of the external recognition result 11 to a continuous DP (Dynamic Programming) matching on a minimum division-number basis, using all of the notations in the reading dictionary 13 as recognition objects.
- DP Dynamic Programming
- the reading dictionary 13 there is no notation matched to “Suzuka Slope” of the external recognition result 11 , so that the reading-addition processor 12 extracts the notations “Bell”, “Deer” and “Slope” (each a single Kanji character constituting “Suzuka Slope”) existing in the reading dictionary 13 . If there is a plurality of readings for the thus-extracted notation, all of reading combinations are extracted.
- the re-collation processor 15 uses as its inputs, the feature vector 6 and the reading-added result 14 , and performs, using the same acoustic model 9 as used in the pattern collation by the internal recognizer 7 , pattern collation between the reading of the feature vector 6 and each of the plurality of readings in the reading-added result 14 , to thereby calculate from the reading whose acoustic likelihood is highest in the reading-added result 14 , this acoustic likelihood as the acoustic likelihood for the reading-added result 14 .
- the re-collation processor 15 outputs the re-collation result 16 composed of the reading-added result 14 and the calculated acoustic likelihood.
- the result-determination processor 17 uses as its inputs, the internal recognition result 10 and the re-collation result 16 , performs the same operation as in Embodiment 1 to thereby sort the recognition results in descending order of the acoustic likelihood, and outputs them as the final recognition result 18 .
- Embodiment 3 it is configured, with respect to the reading-added result 14 , so that when there is a plurality of readings as candidates for the external recognition result 11 in the reading dictionary 13 , such a reading-added result 14 in which said plurality of readings is added, is outputted, and the re-collation processor 15 performs pattern collation for each of the readings included in the reading-added result 14 to thereby calculate respective acoustic likelihoods, selects a reading that is highest in said acoustic likelihood, and outputs it to the result-determination processor 17 .
- Embodiment 3 description has been made about the case where, with respect to the speech-recognition device 1 of Embodiment 1, the operations of the reading-addition processor 12 and the re-collation processor 15 are modified; however, with respect also to the speech-recognition device 1 of Embodiment 2, the operations of its reading-addition processor 12 and the re-collation processor 15 may be modified similarly, and this provides the same effect for a case where it is unable to univocally determine the reading only from the notation in the external recognition result 11 .
- FIG. 9 is a block diagram showing a configuration of a speech-recognition device 1 according to Embodiment 4.
- the same reference numerals are given to the same or equivalent parts as those in FIG. 1 and FIG. 6 , so that their description is omitted here.
- a result-determination language model is added and the operation of the result-determination processor 17 is modified as described below.
- any model may be used so long as it gives a likelihood for a word or a sequence of a plurality of words.
- description will be made using as an example, a case where a unigram language model for words is used as the result-determination language model 21 .
- An example of details of the result-determination language model 21 is shown in FIG. 10 . Shown at the first column are notations of words, and at the second column are language likelihoods thereof.
- the result-determination language model 21 has been prepared beforehand using a database of a large number of words.
- probabilities of occurrence of the respective words have been calculated from the database of the large number of words, and logarithmic values of the probabilities of occurrence have been recorded as their likelihoods in the result-determination language model 21 .
- the transmitter 3 When a user makes a speech, using the speech as an input, the transmitter 3 , the analyzer 5 , the internal recognizer 7 , the reading-addition processor 12 and the re-collation processor 15 perform the same operations as those in Embodiment 1, to thereby output the internal recognition result 10 from the internal recognizer 7 and output the re-collation result 16 from the re-collation processor 15 , to the result-determination processor 17 .
- the result-determination processor 17 refers to the result-determination language model 21 to thereby calculate a language likelihood Sl for each of the internal recognition result 10 and the re-collation result 16 .
- a language likelihood Sl for each of the internal recognition result 10 and the re-collation result 16 .
- the result-determination processor 17 calculates a total likelihood S according to the following formula (1), for each of the internal recognition result 10 and the re-collation result 16 .
- Sa is an acoustic likelihood
- the result-determination processor 17 sorts the recognition results in the internal recognition result 10 and the re-collation result 16 in descending order of the total likelihood S, and outputs them as the final recognition result 18 .
- the speech-recognition device 1 is configured to include the result-determination language model 21 in which pairs of words and language likelihoods thereof are stored, wherein the result-determination processor 17 calculates, using the result-determination language model 21 , the language likelihood of the internal recognition result 10 and the language likelihood of the re-collation result 16 (namely, the external recognition result 11 ), and compares the acoustic likelihood and the language likelihood of the internal recognition result 10 with the acoustic likelihood and the language likelihood of the re-collation result 16 , to thereby determine the final recognition result.
- the result-determination language model 21 in which pairs of words and language likelihoods thereof are stored
- the result-determination processor 17 calculates, using the result-determination language model 21 , the language likelihood of the internal recognition result 10 and the language likelihood of the re-collation result 16 (namely, the external recognition result 11 ), and compares the acoustic likelihood and the language likelihood of the internal recognition result 10 with the acoustic likelihood and the language likelihood of the re
- the language likelihood Sl is calculated for each of the internal recognition result 10 and the re-collation result 16 by using the same result-determination language model 21 , so that comparison in consideration of the language likelihood Sl can be made therebetween, thus providing an effect that the recognition accuracy is improved.
- Embodiment 4 as the result-determination language model 21 , an example has been described that uses unigram of word; however, this is not limitative, and any one of static language models of (n-gram) including those of bigram, trigram and the like, may be used.
- Embodiment 4 description has been made about a case where, with respect to the speech-recognition device 1 of Embodiment 1, the result-determination language model 21 is added and the operation of the result-determination processor 17 is modified; however, with respect also to the speech-recognition device 1 of Embodiment 2 or 3, similarly, the result-determination language model 21 may be added and the operation of the result-determination processor 17 may be modified.
- the external recognition result 11 received from a single external recognizer 19 is used; however, a plurality of external recognition results 11 received from a plurality of external recognizers 19 may be used.
- the result-determination processor 17 is configured to output the recognition results sorted in descending order of the acoustic likelihood or the like, as the final recognition result 18 ; however, this is not limitative, and it may be configured so that just a predetermined number of results in descending order of the acoustic likelihood may be outputted as the final recognition result 18 , or likewise.
- the speech-recognition device is configured to calculate, using the same acoustic model, the acoustic likelihood of the internal recognition result and the acoustic likelihood of the external recognition result, to thereby compare them with each other.
- it is suited to use for a client-side car-navigation device, smartphone and the like, that constitute client-server speech-recognition systems.
- 1 speech-recognition device
- 2 input speech
- 3 transmitter
- 4 speech data
- 5 analyzer
- 6 feature vector
- 7 internal recognizer
- 8 language model
- 9 acoustic model
- 10 internal recognition result
- 11 external recognition result
- 12 reading-addition processor
- 13 reading dictionary
- 14 reading-added result
- 15 re-collation processor
- 16 , 16 a re-collation results
- 17 result-determination processor
- 18 final recognition result
- 19 external recognizer
- 20 second acoustic model
- 21 result-determination language model.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
- The present invention relates to a speech-recognition device and a speech-recognition method for acquiring recognition results respectively from an external speech recognizer and an internal speech-recognizer to thereby determine a final recognition result.
- When speech recognition is to be performed using an instrument such as a car-navigation device, a smartphone and the like, such an instrument not necessarily has sufficient hardware resources. For that reason, there is a client-server speech-recognition system in which, instead of internally executing all of speech recognition of a speech, the instrument transmits its speech data to an external server followed by receiving a result of speech recognition performed by the server. Note that the “client” herein means an instrument at user's hand, such as a car-navigation device, a smartphone, and the like. Consequently, even at the client, it becomes possible to utilize large-vocabulary based speech recognition. However, a recognition word specific to the client, a proper noun only found in an address book of the user, and the like are not necessarily recognizable by the server.
- As a measure therefor, in
Patent Document 1, there is disclosed a technique of not only merely performing speech recognition by the server followed by receiving the result by the client, but also performing, depending on a speech, speech recognition both by the client and by the server followed by differently indicating both of the recognition results or selecting either one of the results. Specifically, inPatent Document 1, there is described that, when either one of the recognition results by the client or the server is to be selected, a higher one in acoustic likelihood is selected. - In the case of the conventional client-server speech-recognition method, at the time of selecting either one of the recognition results by the client or the server, it is required to compare between their recognition scores, their likelihoods, etc. that are values indicative of certainties of both recognition results; however, there are cases where such information can not be obtained from the server-side. Further, if obtained, there is no assurance that it has been calculated on the same basis as in speech recognition in the client-side. Thus, there is a problem that, when either one of the recognition results by the client or the server is to be selected, in some cases, exact comparison therebetween can not be made, so that a sufficient accuracy in speech recognition is not achieved.
- This invention has been made to solve the problem as described above, and an object thereof is to provide a speech-recognition device and a speech-recognition method by which the recognition result by the client and the recognition result by the server are compared with each other under same conditions to thereby enhance a final recognition accuracy.
- A speech-recognition device of the invention comprises: an acoustic model in which feature quantities of speeches are modelized; a language model in which notations and readings of more than one recognition-object words of the speech-recognition device are stored; a reading dictionary in which pairs of notations and readings of the recognition-object words and other words than the recognition-object words are stored; an analyzer that analyzes input speech data to calculate a feature vector; an internal recognizer that performs, using the acoustic model, pattern collation between the feature vector calculated by the analyzer and each of words stored in the language model to thereby calculate their respective acoustic likelihoods, followed by outputting, as an internal recognition result, the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; a reading-addition processor that acquires an external recognition result from recognition processing of the input speech data by an external recognizer, adds a reading for said external recognition result by use of the reading dictionary, and outputs a reading-added result composed of said external recognition result and the reading therefor; a re-collation processor that performs, using the acoustic model, pattern collation between the feature vector calculated by the analyzer and the reading-added result outputted by the reading-addition processor, to thereby calculate an acoustic likelihood for the external recognition result; and a result-determination processor that compares the acoustic likelihoods of the internal recognition result with the acoustic likelihood of the external recognition result, to thereby determine a final recognition result.
- A speech-recognition method of the invention comprises: a transmission step of transmitting input speech data to an external recognizer; an analysis step of analyzing the input speech data to calculate a feature vector; an internal recognition step of performing, using an acoustic model in which feature quantities of speeches are modelized, pattern collation between the feature vector calculated in the analysis step and each of words stored in a language model in which notations and readings of more than one recognition-object words of the speech-recognition device are stored, to thereby calculate their respective acoustic likelihoods, followed by outputting, as an internal recognition result, the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; a reading-addition step of acquiring an external recognition result from recognition processing of the input speech data by the external recognizer, adding a reading for said external recognition result by use of a reading dictionary in which pairs of notations and readings of the recognition-object words and other words than the recognition-object words are stored, and outputting a reading-added result composed of said external recognition result and the reading therefor; a re-collation step of performing, using the acoustic model, pattern collation between the feature vector calculated in the analysis step and the reading-added result outputted in the reading-addition step, to thereby calculate the acoustic likelihood for the external recognition result; and a result-determination step of comparing the acoustic likelihood of the internal recognition result with the acoustic likelihood of the external recognition result, to thereby determine a final recognition result.
- According to the invention, it is possible to provide a speech-recognition device and a speech-recognition method by which the acoustic likelihood of the internal recognition result and the acoustic likelihood of the external recognition result are calculated using the same acoustic model and compared with each other, so that the final recognition accuracy is enhanced.
-
FIG. 1 is a block diagram showing a configuration of a speech-recognition device according toEmbodiment 1 of the invention. -
FIG. 2 is a diagram for illustrating an example of details of a language model included in the speech-recognition device according toEmbodiment 1. -
FIG. 3 is a diagram illustrating an example of details of a reading dictionary included in the speech-recognition device according toEmbodiment 1. -
FIG. 4 is a flowchart showing operations of the speech-recognition device according toEmbodiment 1. -
FIG. 5 is a diagram illustrating, as a modified example, an example of details of a reading dictionary in English of the speech-recognition device according toEmbodiment 1. -
FIG. 6 is a block diagram showing a configuration of a speech-recognition device according toEmbodiment 2 of the invention. -
FIG. 7 is a flowchart showing operations of the speech-recognition device according toEmbodiment 2. -
FIG. 8 is a diagram illustrating an example of details of a reading dictionary included in the speech-recognition device according toEmbodiment 3. -
FIG. 9 is a block diagram showing a configuration of a speech-recognition device according toEmbodiment 4 of the invention. -
FIG. 10 is a diagram illustrating an example of details of a result-determination language model included in the speech-recognition device according toEmbodiment 4. - Hereinafter, for illustrating the invention in more detail, embodiments for carrying out the invention will be described according to the accompanying drawings.
- As shown in
FIG. 1 , a speech-recognition device 1 according toEmbodiment 1 includes atransmitter 3, ananalyzer 5, aninternal recognizer 7, alanguage model 8, anacoustic model 9, a reading-addition processor 12, areading dictionary 13, are-collation processor 15 and are-collation result 16. The speech-recognition device 1 corresponds to a client for constituting a client-server speech-recognition system, which may be mounted in or installed in an existing instrument, for example, a smartphone or like portable instrument carried by a user, a navigation device mounted on or brought into a vehicle or like moving object, or the like; or may be used instead as a separate unit. - An
external recognizer 19 is assumed to be a speech-recognition server connected to the speech-recognition device 1 through a network. It may be directly connected by wire or wirelessly, without through a network. - In the speech-
recognition device 1, theacoustic model 9 is storing therein acoustic models obtained from modelization of feature vectors of speeches. InEmbodiment 1, the acoustic models are assumed to be obtained from modelization of phonemes, and in theacoustic model 9, there are stored the acoustic models for all phonemes. With the acoustic models for all phonemes, it is possible to modelize a feature vector of a speech about any word, by making access to an acoustic model of its phoneme. - Note that the feature vector to be modelized by the acoustic model 9 (namely, a
feature vector 6 inFIG. 1 ) is assumed, for example, to be an MFCC (Mel Frequency Cepstral Coefficient). Further, the acoustic model is assumed, for example, to be an HMM (Hidden Markov Model). - The
language model 8 is storing therein notations and readings of recognition-object words of theinternal recognizer 7. Note that “reading” referred to herein means a symbol sequence that can be associated with theacoustic model 9. For example, if theacoustic model 9 is that in which phonemes are modelized, the readings in thelanguage model 8 are phoneme sequences or the like. InEmbodiment 1, the recognition objects of the speech-recognition device 1 are assumed to be facility names in Kanagawa Prefecture. An example of details of thelanguage model 8 in this case is shown inFIG. 2 . InFIG. 2 , phoneme sequences are used as the readings. - The
reading dictionary 13 is storing therein pairs of notations and readings of a large number of words including also words not subject to the recognition by theinternal recognizer 7. Note that “reading” is, similarly to thelanguage model 8, assumed to be a symbol sequence that can be associated with theacoustic model 9. InEmbodiment 1, the readings in thereading dictionary 13 are phoneme sequences. An example of details of thereading dictionary 13 is shown inFIG. 3 . - These
language model 8,acoustic model 9 andreading dictionary 13 may be stored in a common memory element, memory device or the like, or may be stored in independent memory elements, memory devices or the like, respectively. - Further, it is allowable to configure the speech-
recognition device 1 to have a memory storing a program and a CPU for implementing the program so that, when the CPU implements the program, the functions (details will be described later) carried by thetransmitter 3, theanalyzer 5, theinternal recognizer 7, the reading-addition processor 12, there-collation processor 15 and the result-determination processor 17 are achieved by software. Instead, a part of the functions may be achieved by hardware. - Next, operations at the time of speech recognition will be described with reference to the flowchart in
FIG. 4 . - In Step ST1, when a user makes a speech, an
input speech 2 about that speech is inputted to thetransmitter 3. Thetransmitter 3 A-D converts theinput speech 2 intospeech data 4, and outputs the data to theanalyzer 5. In addition, thetransmitter 3 transmits thesame speech data 4 to theexternal recognizer 19. - In Step ST2, the
analyzer 5 converts thespeech data 4 into afeature vector 6 and outputs it to theinternal recognizer 7 and there-collation processor 15. As mentioned above, thefeature vector 6 is assumed to be an MFCC, for example. - In Step ST3, using the
language model 8 and theacoustic model 9, theinternal recognizer 7 performs according to, for example, a Viterbi algorithm, pattern collation (pattern matching) between thefeature vector 6 and each of the words written in thelanguage model 8, to thereby calculate their respective acoustic likelihoods, followed by selecting the word whose acoustic likelihood is highest and outputting it to the result-determination processor 17 as aninternal recognition result 10. - Note that, here, a case is described where only the top one high-ranking word in acoustic likelihood is included in the
internal recognition result 10; however, this is not limitative, and it is allowable to configure so that, for example, top one or more high-ranking words in acoustic likelihood are included in theinternal recognition result 10. - The
internal recognition result 10 is composed of a notation, reading and acoustic likelihood of the word [Kanji]. For example, when theinput speech 2 is “Maihama International Stadium (maihamakokusaikyoogizyoo)”, although there is no same word in thelanguage model 8, a word whose acoustic likelihood is highest among the words in thelanguage model 8 is outputted. In this example, let's assume that the acoustic likelihood of “Yokohama International Stadium (yokohamakokusaikyoogizyoo)” is highest. Accordingly, theinternal recognizer 7 outputs the notation “Yokohama International Stadium”, reading “yokohamakokusaikyoogizyoo” and acoustic likelihood of that word, as theinternal recognition result 10. - In Step ST4, the reading-
addition processor 12 waits for anexternal recognition result 11 sent back from theexternal recognizer 19. Note that inEmbodiment 1, it is presumed that theexternal recognition result 11 at least includes a notation of the word that is a recognition result of thespeech data 4, but does not include a reading of that word. - The reading-
addition processor 12, when received the external recognition result 11 (Step ST4 “YES”), refers to thereading dictionary 13 to thereby extract therefrom a reading of a notation matched to the notation of the word included in theexternal recognition result 11, and outputs the reading to there-collation processor 15 as a reading-added result 14 (Step ST5). For example, when theexternal recognition result 11 is “Maihama International Stadium”, the reading-addition processor 12 refers to thereading dictionary 13 to thereby extract the matched notation “Maihama International Stadium” and its reading “maihamakokusaikyoogizyoo”, and outputs them as the reading-addedresult 14. - In Step ST6, the
re-collation processor 15 uses as its inputs, thefeature vector 6 and the reading-addedresult 14, and performs, using the same acoustic model as used in pattern collation in theinternal recognizer 7, namely using theacoustic model 9, pattern collation between the reading of thefeature vector 6 and the reading in the reading-addedresult 14, to thereby calculate an acoustic likelihood for the reading-addedresult 14. The pattern collation method by there-collation processor 15 is assumed to be the same as the pattern collation method used in theinternal recognizer 7. InEmbodiment 1, the Viterbi algorithm is used. - Because the
re-collation processor 15 uses in this manner, the same acoustic model and pattern collation method as for theinternal recognizer 7, the acoustic likelihood of theinternal recognition result 10 calculated by theinternal recognizer 7 and that of theexternal recognition result 11 calculated by theexternal recognizer 19 become comparable with each other. There-collation processor 15 outputs there-collation result 16 composed of the reading-addedresult 14 and the calculated acoustic likelihood to the result-determination processor 17. - In Step ST7, the result-
determination processor 17 uses as its inputs, theinternal recognition result 10 and there-collation result 16, sorts the recognition results in descending order of the acoustic likelihood, and outputs them as afinal recognition result 18. In the example described above, since theinput speech 2 is “Maihama International Stadium”, theinternal recognition result 10 by theinternal recognizer 7 is “Yokohama International Stadium” and theexternal recognition result 11 by theexternal recognizer 19 is “Maihama International Stadium”, when pattern collation is performed using the sameacoustic model 9 to thereby calculate the respective acoustic likelihoods, it is expected that “Maihama International Stadium” by theexternal recognizer 19 becomes higher in acoustic likelihood than the other. This contributes to improving the accuracy in speech recognition. - Consequently, according to Embodiment 1, the speech-recognition device 1 is configured to include: the acoustic model 9 in which feature quantities of speeches are modelized; the language model 8 in which notations and readings of more than one recognition-object words of the speech-recognition device 1 are stored; the reading dictionary 13 in which pairs of notations and readings of a large number of words including not only the recognition-object words but also other words than the recognition-object words are stored; the analyzer 5 that analyzes the speech data 4 of the input speech 2 to calculate the feature vector 6; the internal recognizer 7 that performs, using the acoustic model 9, pattern collation between the feature vector 6 calculated by the analyzer 5 and each of words stored in the language model 8, to thereby calculate their respective acoustic likelihoods, followed by outputting, as the internal recognition result 10, the notations, the readings and the acoustic likelihoods of top one or more high-ranking words in the acoustic likelihoods; the reading-addition processor 12 that acquires the external recognition result 11 from recognition processing of the speech data 4 by the external recognizer 19, adds a reading for the external recognition result 11 by use of the reading dictionary 13, and outputs the reading-added result 14 composed of the external recognition result 11 and the reading therefor; the re-collation processor 15 that performs, using the acoustic model 9, pattern collation between the feature vector 6 calculated by the analyzer 5 and the reading-added result 14 outputted by the reading-addition processor 12, to thereby calculate an acoustic likelihood for the external recognition result 11; and the result-determination processor 17 that compares the acoustic likelihoods of the internal recognition result 10 with the acoustic likelihood of the re-collation recognition result 16, to thereby determine the final recognition result. Thus, the acoustic likelihood can be calculated for the
external recognition result 11 by using the same acoustic model and pattern collation method as for theinternal recognizer 7, so that exact comparison can be made between the acoustic likelihood of theexternal recognition result 11 and the acoustic likelihood of theinternal recognition result 10, thus making it possible to enhance the final recognition accuracy. Accordingly, even in the case, for example, where the speech-recognition device 1 has insufficient hardware resources and the number of words in thelanguage mode 8 is small, it is possible to utilize the recognition result by theexternal recognizer 19 having a large-scale language model, thus providing an effect that the recognition performance of the speech-recognition device 1 is improved. - Note that the speech-
recognition device 1 according toEmbodiment 1 is also applicable to a language other than Japanese. For example, when the speech-recognition device 1 is to be applied to English, it suffices to change thelanguage model 8, theacoustic model 9 and thereading dictionary 13 to the respective corresponding ones for English. In that case, it suffices to record notations and readings of a large number of English words in thereading dictionary 13. Note that the readings in thereading dictionary 13 are provided as indications that can be associated with theacoustic model 9. For example, if theacoustic model 9 comprises English phonemes, the readings in thereading dictionary 13 are provided as phoneme indications or symbols convertible to the phoneme indications. InFIG. 5 , an example ofEnglish reading dictionary 13 is shown. Written at the first column inFIG. 5 are the notations and at the second column are the phenome indications as the readings of those notations. - Meanwhile, in the
reading dictionary 13, readings of a large number of words are stored so as to avoid no presence of the word matched to a word in theexternal recognition result 11. For a case where the matched word is nevertheless not present in thereading dictionary 13, it suffices to determine beforehand which one of the recognition results by theinternal recognizer 7 and theexternal recognizer 19 is to be selected so that the result-determination processor 17 provides the thus-determined recognition result as the final result. -
FIG. 6 is a block diagram showing a configuration of a speech-recognition device 1 according toEmbodiment 2. In FIG. 6, the same reference numerals are given to the same or equivalent parts as those inFIG. 1 , so that their description is omitted here. The speech-recognition device 1 according toEmbodiment 2 is characterized by the addition of a secondacoustic model 20. - Similarly to the
acoustic model 9 inEmbodiment 1, the secondacoustic model 20 is storing therein acoustic models obtained from modelization of feature vectors of speeches. It should be noted that, the secondacoustic model 20 is assumed to be an acoustic model that is more precise and is higher in recognition accuracy than theacoustic model 9. For example, in a case where phonemes are to be modelized in this acoustic model, triphone phonemes in consideration of not only a target phoneme for modelization, but also before-after phonemes of the target phoneme, are assumed to be modelized. In the case of triphone, the second phoneme /s/ in “Morning/asa” and the second phoneme /s/ in “Stone/isi” are, since they are different in before-after phonemes, modelized into different acoustic models. It is known that this enhances the recognition accuracy. However, variations of acoustic models increase, so that the calculation amount at the pattern collation is increased. - Next, operations at the time of speech recognition will be described with reference to the flowchart in
FIG. 7 . - When a user makes a speech, the
transmitter 3 A-D converts theinput speech 2 intospeech data 4, and outputs the data to theanalyzer 5 and the external recognizer 19 (Step ST1). Theanalyzer 5 and theinternal recognizer 7 perform the same operations as those in Embodiment 1 (Steps ST2 and ST3) to thereby output theinternal recognition result 10. It should be noted that, in Step ST3 inEmbodiment 1, theinternal recognition result 10 is outputted from theinternal recognizer 7 to the result-determination processor 17; however, in Step ST3 inEmbodiment 2, it is outputted from theinternal recognizer 7 to there-collation processor 15. - In Step ST11, the
re-collation processor 15 uses as its inputs, thefeature vector 6 and theinternal recognition result 10, and performs, using the secondacoustic model 20, pattern collation between the reading of thefeature vector 6 and the reading in theinternal recognition result 10, to thereby calculate an acoustic likelihood for theinternal recognition result 10. Although the pattern collation method at this time is not necessarily the same as the method used by theinternal recognizer 7, the Viterbi algorithm is used inEmbodiment 2. There-collation processor 15 outputs there-collation result 16 a composed of theinternal recognition result 10 and the calculated acoustic likelihood to the result-determination processor 17. - Note that, as aforementioned, since the second
acoustic model 20 has variations of the models more than those in theacoustic model 9, the calculation amount required for the pattern collation is increased; however, the recognition objects of there-collation processor 15 are limited to the words included in theinternal recognition result 10, so that an increase in processing load can be suppressed to be small. - The reading-
addition processor 12 performs the same operations as those in Embodiment 1 (Steps ST4 and ST5), to thereby obtain the reading-addedresult 14 for theexternal recognition result 11 and output it to there-collation processor 15. - In Step ST12, when the reading-added
result 14 is inputted, there-collation processor 15 obtains, through similar operations to those inEmbodiment 1, there-collation result 16 composed of the reading-addedresult 14 and its acoustic likelihood, and outputs it to the result-determination processor 17. Note that the secondacoustic model 20 is used for the pattern collation. - In Step ST13, the result-
determination processor 17 uses as its inputs, there-collation result 16 a with respect to theinternal recognition result 10 and there-collation result 16 with respect to theexternal recognition result 11, sorts the recognition results in descending order of the acoustic likelihood, and outputs them as thefinal recognition result 18. - Consequently, according to
Embodiment 2, the speech-recognition device 1 is configured to include the secondacoustic model 20 different to theacoustic model 9, wherein, using the secondacoustic model 20, there-collation processor 15 performs pattern collation between thefeature vector 6 calculated by theanalyzer 5 and theinternal recognition result 10 outputted by theinternal recognizer 7, to thereby calculate an acoustic likelihood (re-collation result 16 a) for theinternal recognition result 10, and performs pattern collation between thefeature vector 6 and the reading-addedresult 14 outputted by the reading-addition processor 12, to thereby calculate an acoustic likelihood (re-collation result 16) for theexternal recognition result 11; and wherein the result-determination processor 17 determines the final recognition result by comparing with each other, the acoustic likelihood of theinternal recognition result 10 and the acoustic likelihood of theexternal recognition result 11 which have been calculated by there-collation processor 15. Accordingly, the re-collation is performed using the secondacoustic model 20 that is more precise and is higher in recognition accuracy than theacoustic model 9, so that the comparison between the acoustic likelihood of theexternal recognition result 11 and the acoustic likelihood of theinternal recognition result 10 becomes more exact, thus providing an effect of improving the recognition accuracy. - Note that the reason of not using the second
acoustic model 20 in theinternal recognizer 7 resides in the fact that when the secondacoustic model 20 is used in the pattern collation by theinternal recognizer 7, because the variations of models increases to more than those in theacoustic model 9, the calculation amount at the time of the pattern collation is increased. When different kinds of models are used respectively in theacoustic model 9 and the secondacoustic model 20 as inEmbodiment 2, there is an effect that the recognition accuracy is enhanced while suppressing an increase in calculation amount to be small. - A speech-recognition device according to
Embodiment 3 has a configuration that is, on a figure basis, similar to that of the speech-recognition device 1 shown inFIG. 1 orFIG. 6 . Thus, in the followings, description will be made usingFIG. 1 in a diverted manner. In the speech-recognition device 1 according toEmbodiment 3, the details in thereading dictionary 13 and the operations of the reading-addition processor 12 and there-collation processor 15 are modified as described later. -
FIG. 8 is a diagram showing an example of details of areading dictionary 13 of the speech-recognition device according toEmbodiment 3. In the speech-recognition device 1 according toEmbodiment 3, thereading dictionary 13 is also storing therein, other than the dictionary of the words and the facility names shown inFIG. 3 , a dictionary of words in unit of about one character shown inFIG. 8 . Because of having the small-unit word elements in unit of about one character as just described, it becomes possible to add a reading to each of a variety of notations in theexternal recognition result 11. - Next, operations at the time of speech recognition will be described.
- When a user makes a speech, the
transmitter 3 A-D converts theinput speech 2 intospeech data 4, and outputs the data to theanalyzer 5 and theexternal recognizer 19. Theanalyzer 5 and theinternal recognizer 7 perform the same operations as those inEmbodiment 1 to thereby output theinternal recognition result 10. In the case, for example, where theinput speech 2 is “Suzuka Slope (suzukasaka)”, although the “Suzuka Slope” is absent in thelanguage model 8, pattern collation is performed between that speech and each of the words written in thelanguage model 8, so that the word whose acoustic likelihood is highest is outputted. InEmbodiment 3, it is assumed that the acoustic likelihood of “Suzuki Liquor Store (suzukisaketen)” is highest. Accordingly, theinternal recognizer 7 outputs the notation, reading and acoustic likelihood of that word as theinternal recognition result 10. - The reading-
addition processor 12 waits for anexternal recognition result 11 sent back from theexternal recognizer 19, and when received theexternal recognition result 11, refers to thereading dictionary 13 shown inFIG. 8 to thereby extract therefrom a reading of a notation matched to the notation of the word (for example, “Suzuka Slope”) included in theexternal recognition result 11. In thereading dictionary 13, if there is a plurality of readings corresponding to the notation in theexternal recognition result 11, the reading-addition processor outputs the plurality of readings. Further, if there is no reading corresponding to the notation, the reading-addition processor extracts notations in thereading dictionary 13 that are able to constitute, when coupled together, the notation of theexternal recognition result 11. This extraction operation can be made, for example, by subjecting the notation of theexternal recognition result 11 to a continuous DP (Dynamic Programming) matching on a minimum division-number basis, using all of the notations in thereading dictionary 13 as recognition objects. - In the example of
Embodiment 3, in thereading dictionary 13, there is no notation matched to “Suzuka Slope” of theexternal recognition result 11, so that the reading-addition processor 12 extracts the notations “Bell”, “Deer” and “Slope” (each a single Kanji character constituting “Suzuka Slope”) existing in thereading dictionary 13. If there is a plurality of readings for the thus-extracted notation, all of reading combinations are extracted. In this case, there are respective two readings “suzu” and “rei” for the notation “Bell”, and “sika” and “ka” for the notation “Deer”, and one reading “saka” for the notation “Slope”, so that four readings “suzushikasaka”, “reishikasaka”, “suzukasaka” and “reikasaka” are extracted as readings for “Suzuka Slope” of theexternal recognition result 11. Then, the reading-addition processor 12 outputs, as the reading-addedresult 14, the extracted four readings with the notation “Suzuka Slop”. - The
re-collation processor 15 uses as its inputs, thefeature vector 6 and the reading-addedresult 14, and performs, using the sameacoustic model 9 as used in the pattern collation by theinternal recognizer 7, pattern collation between the reading of thefeature vector 6 and each of the plurality of readings in the reading-addedresult 14, to thereby calculate from the reading whose acoustic likelihood is highest in the reading-addedresult 14, this acoustic likelihood as the acoustic likelihood for the reading-addedresult 14. There-collation processor 15 outputs there-collation result 16 composed of the reading-addedresult 14 and the calculated acoustic likelihood. - In this manner, when there is a plurality of readings as candidates for the notation of the word in the
external recognition result 11, it is possible to determine the reading and calculates its acoustic likelihood by performing pattern collation between thefeature vector 6 and the plurality of readings in there-collation processor 15. In the example described above, among the four readings for “Suzuka Slope” in theexternal recognition result 11, the reading “suzukasaka” whose acoustic likelihood is highest is included in there-collation result 16. - The result-
determination processor 17 uses as its inputs, theinternal recognition result 10 and there-collation result 16, performs the same operation as inEmbodiment 1 to thereby sort the recognition results in descending order of the acoustic likelihood, and outputs them as thefinal recognition result 18. In the example described above, since theinput speech 2 is “Suzuka Slope”, theinternal recognition result 10 by theinternal recognizer 7 is “Suzuki Liquor Store” and theexternal recognition result 11 by theexternal recognizer 19 is “Suzuka Slope” (suzukasaka), when pattern collation is performed using the sameacoustic model 9 to thereby calculate their respective acoustic likelihoods, it is expected that “Suzuka Slope” (suzukasaka) by theexternal recognizer 19 becomes higher in acoustic likelihood than the other. This contributes to improving the accuracy in speech recognition. - Consequently, according to
Embodiment 3, it is configured, with respect to the reading-addedresult 14, so that when there is a plurality of readings as candidates for theexternal recognition result 11 in thereading dictionary 13, such a reading-addedresult 14 in which said plurality of readings is added, is outputted, and there-collation processor 15 performs pattern collation for each of the readings included in the reading-addedresult 14 to thereby calculate respective acoustic likelihoods, selects a reading that is highest in said acoustic likelihood, and outputs it to the result-determination processor 17. Thus, even when it is unable to univocally determine the reading only from the notation in theexternal recognition result 11, it becomes possible to determine the reading and calculate its acoustic likelihood by performing pattern recognition with thefeature vector 6 at there-collation processor 15, thus providing an effect that the accuracy in speech-recognition is improved. - Further, for the
reading dictionary 13 ofEmbodiment 3, notations and readings are given on a smaller-unit basis than on a word basis, which results in allowing a large variety of words to be prepared by their combinations, thus providing a merit of making higher the probability that a matched notation will be found. In contrast, for thereading dictionary 13 ofEmbodiment 1, since notations and readings are given on a word basis, there is provided a merit that the accuracy in reading-addition is high. - Note that in
Embodiment 3, description has been made about the case where, with respect to the speech-recognition device 1 ofEmbodiment 1, the operations of the reading-addition processor 12 and there-collation processor 15 are modified; however, with respect also to the speech-recognition device 1 ofEmbodiment 2, the operations of its reading-addition processor 12 and there-collation processor 15 may be modified similarly, and this provides the same effect for a case where it is unable to univocally determine the reading only from the notation in theexternal recognition result 11. -
FIG. 9 is a block diagram showing a configuration of a speech-recognition device 1 according toEmbodiment 4. InFIG. 9 , the same reference numerals are given to the same or equivalent parts as those inFIG. 1 andFIG. 6 , so that their description is omitted here. In the speech-recognition device 1 according toEmbodiment 4, a result-determination language model is added and the operation of the result-determination processor 17 is modified as described below. - As the result-
determination language model 21 shown inFIG. 9 , any model may be used so long as it gives a likelihood for a word or a sequence of a plurality of words. InEmbodiment 4, description will be made using as an example, a case where a unigram language model for words is used as the result-determination language model 21. An example of details of the result-determination language model 21 is shown inFIG. 10 . Shown at the first column are notations of words, and at the second column are language likelihoods thereof. The result-determination language model 21 has been prepared beforehand using a database of a large number of words. For example, when it is assumed to be a unigram language model as in this example, probabilities of occurrence of the respective words have been calculated from the database of the large number of words, and logarithmic values of the probabilities of occurrence have been recorded as their likelihoods in the result-determination language model 21. - Next, operations at the time of speech recognition will be described.
- When a user makes a speech, using the speech as an input, the
transmitter 3, theanalyzer 5, theinternal recognizer 7, the reading-addition processor 12 and there-collation processor 15 perform the same operations as those inEmbodiment 1, to thereby output the internal recognition result 10 from theinternal recognizer 7 and output there-collation result 16 from there-collation processor 15, to the result-determination processor 17. - The result-
determination processor 17 refers to the result-determination language model 21 to thereby calculate a language likelihood Sl for each of theinternal recognition result 10 and there-collation result 16. For example, when the notation in theinternal recognition result 10 is “Suzuka Liquor Store”, its language likelihood Sl=−0.32 with reference to the result-determination language model 21 ofFIG. 10 . Likewise, when the notation in there-collation result 16 is “Suzuka Slope”, its language likelihood Sl=−0.30. Then, the result-determination processor 17 calculates a total likelihood S according to the following formula (1), for each of theinternal recognition result 10 and there-collation result 16. In the formula (1), Sa is an acoustic likelihood and w is a constant experimentally-determined beforehand which is, for example, given as w=10. -
S=Sa+w×Sl (1) - The result-
determination processor 17 sorts the recognition results in theinternal recognition result 10 and there-collation result 16 in descending order of the total likelihood S, and outputs them as thefinal recognition result 18. - Consequently, according to
Embodiment 4, the speech-recognition device 1 is configured to include the result-determination language model 21 in which pairs of words and language likelihoods thereof are stored, wherein the result-determination processor 17 calculates, using the result-determination language model 21, the language likelihood of theinternal recognition result 10 and the language likelihood of the re-collation result 16 (namely, the external recognition result 11), and compares the acoustic likelihood and the language likelihood of theinternal recognition result 10 with the acoustic likelihood and the language likelihood of there-collation result 16, to thereby determine the final recognition result. Thus, the language likelihood Sl is calculated for each of theinternal recognition result 10 and there-collation result 16 by using the same result-determination language model 21, so that comparison in consideration of the language likelihood Sl can be made therebetween, thus providing an effect that the recognition accuracy is improved. - Note that in
Embodiment 4, as the result-determination language model 21, an example has been described that uses unigram of word; however, this is not limitative, and any one of static language models of (n-gram) including those of bigram, trigram and the like, may be used. - Note that in
Embodiment 4, description has been made about a case where, with respect to the speech-recognition device 1 ofEmbodiment 1, the result-determination language model 21 is added and the operation of the result-determination processor 17 is modified; however, with respect also to the speech-recognition device 1 ofEmbodiment determination language model 21 may be added and the operation of the result-determination processor 17 may be modified. - Further, in
Embodiments 1 to 4, theexternal recognition result 11 received from a singleexternal recognizer 19 is used; however, a plurality of external recognition results 11 received from a plurality ofexternal recognizers 19 may be used. Further, the result-determination processor 17 is configured to output the recognition results sorted in descending order of the acoustic likelihood or the like, as thefinal recognition result 18; however, this is not limitative, and it may be configured so that just a predetermined number of results in descending order of the acoustic likelihood may be outputted as thefinal recognition result 18, or likewise. - Other than the above, unlimited combination of the respective embodiments, modification of any element in the embodiments and omission of any element in the embodiments may be made in the present invention without departing from the scope of the invention.
- As described above, the speech-recognition device according to the invention is configured to calculate, using the same acoustic model, the acoustic likelihood of the internal recognition result and the acoustic likelihood of the external recognition result, to thereby compare them with each other. Thus, it is suited to use for a client-side car-navigation device, smartphone and the like, that constitute client-server speech-recognition systems.
- 1: speech-recognition device, 2: input speech, 3: transmitter, 4: speech data, 5: analyzer, 6: feature vector, 7: internal recognizer, 8: language model, 9: acoustic model, 10: internal recognition result, 11: external recognition result, 12: reading-addition processor, 13: reading dictionary, 14: reading-added result, 15: re-collation processor, 16, 16 a: re-collation results, 17: result-determination processor, 18: final recognition result, 19: external recognizer, 20: second acoustic model, 21: result-determination language model.
Claims (6)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2013/056142 WO2014136222A1 (en) | 2013-03-06 | 2013-03-06 | Speech-recognition device and speech-recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160005400A1 true US20160005400A1 (en) | 2016-01-07 |
US9431010B2 US9431010B2 (en) | 2016-08-30 |
Family
ID=51490785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/655,141 Expired - Fee Related US9431010B2 (en) | 2013-03-06 | 2013-03-06 | Speech-recognition device and speech-recognition method |
Country Status (5)
Country | Link |
---|---|
US (1) | US9431010B2 (en) |
JP (1) | JP5868544B2 (en) |
CN (1) | CN105009206B (en) |
DE (1) | DE112013006770B4 (en) |
WO (1) | WO2014136222A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170140751A1 (en) * | 2015-11-17 | 2017-05-18 | Shenzhen Raisound Technology Co. Ltd. | Method and device of speech recognition |
US20180366123A1 (en) * | 2015-12-01 | 2018-12-20 | Nuance Communications, Inc. | Representing Results From Various Speech Services as a Unified Conceptual Knowledge Base |
US20190096396A1 (en) * | 2016-06-16 | 2019-03-28 | Baidu Online Network Technology (Beijing) Co., Ltd. | Multiple Voice Recognition Model Switching Method And Apparatus, And Storage Medium |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105161092B (en) * | 2015-09-17 | 2017-03-01 | 百度在线网络技术(北京)有限公司 | A kind of audio recognition method and device |
JP6585022B2 (en) * | 2016-11-11 | 2019-10-02 | 株式会社東芝 | Speech recognition apparatus, speech recognition method and program |
CN106782502A (en) * | 2016-12-29 | 2017-05-31 | 昆山库尔卡人工智能科技有限公司 | A kind of speech recognition equipment of children robot |
CN110447068A (en) * | 2017-03-24 | 2019-11-12 | 三菱电机株式会社 | Speech recognition equipment and audio recognition method |
CN110111778B (en) * | 2019-04-30 | 2021-11-12 | 北京大米科技有限公司 | Voice processing method and device, storage medium and electronic equipment |
JP7038919B2 (en) * | 2019-08-01 | 2022-03-18 | 三菱電機株式会社 | Multilingual speech recognition device and multilingual speech recognition method |
CN113345418B (en) * | 2021-06-09 | 2024-08-09 | 中国科学技术大学 | Multilingual model training method based on cross-language self-training |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3581648B2 (en) * | 2000-11-27 | 2004-10-27 | キヤノン株式会社 | Speech recognition system, information processing device, control method thereof, and program |
JP2003323196A (en) * | 2002-05-08 | 2003-11-14 | Nec Corp | Voice recognition system, voice recognition method, and voice recognition program |
US7228275B1 (en) * | 2002-10-21 | 2007-06-05 | Toyota Infotechnology Center Co., Ltd. | Speech recognition system having multiple speech recognizers |
JP2004198831A (en) | 2002-12-19 | 2004-07-15 | Sony Corp | Method, program, and recording medium for speech recognition |
JP2005037662A (en) * | 2003-07-14 | 2005-02-10 | Denso Corp | Voice dialog system |
JP5046589B2 (en) * | 2006-09-05 | 2012-10-10 | 日本電気通信システム株式会社 | Telephone system, call assistance method and program |
JP4902617B2 (en) * | 2008-09-30 | 2012-03-21 | 株式会社フュートレック | Speech recognition system, speech recognition method, speech recognition client, and program |
JP5274191B2 (en) * | 2008-10-06 | 2013-08-28 | 三菱電機株式会社 | Voice recognition device |
US9959870B2 (en) * | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
WO2011052412A1 (en) * | 2009-10-28 | 2011-05-05 | 日本電気株式会社 | Speech recognition system, speech recognition request device, speech recognition method, speech recognition program, and recording medium |
US8660847B2 (en) * | 2011-09-02 | 2014-02-25 | Microsoft Corporation | Integrated local and cloud based speech recognition |
US8972263B2 (en) * | 2011-11-18 | 2015-03-03 | Soundhound, Inc. | System and method for performing dual mode speech recognition |
-
2013
- 2013-03-06 US US14/655,141 patent/US9431010B2/en not_active Expired - Fee Related
- 2013-03-06 JP JP2015504055A patent/JP5868544B2/en not_active Expired - Fee Related
- 2013-03-06 DE DE112013006770.6T patent/DE112013006770B4/en not_active Expired - Fee Related
- 2013-03-06 CN CN201380074221.7A patent/CN105009206B/en not_active Expired - Fee Related
- 2013-03-06 WO PCT/JP2013/056142 patent/WO2014136222A1/en active Application Filing
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170140751A1 (en) * | 2015-11-17 | 2017-05-18 | Shenzhen Raisound Technology Co. Ltd. | Method and device of speech recognition |
US20180366123A1 (en) * | 2015-12-01 | 2018-12-20 | Nuance Communications, Inc. | Representing Results From Various Speech Services as a Unified Conceptual Knowledge Base |
US20190096396A1 (en) * | 2016-06-16 | 2019-03-28 | Baidu Online Network Technology (Beijing) Co., Ltd. | Multiple Voice Recognition Model Switching Method And Apparatus, And Storage Medium |
US10847146B2 (en) * | 2016-06-16 | 2020-11-24 | Baidu Online Network Technology (Beijing) Co., Ltd. | Multiple voice recognition model switching method and apparatus, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
DE112013006770T5 (en) | 2015-12-24 |
CN105009206B (en) | 2018-02-09 |
DE112013006770B4 (en) | 2020-06-18 |
US9431010B2 (en) | 2016-08-30 |
CN105009206A (en) | 2015-10-28 |
WO2014136222A1 (en) | 2014-09-12 |
JP5868544B2 (en) | 2016-02-24 |
JPWO2014136222A1 (en) | 2017-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9431010B2 (en) | Speech-recognition device and speech-recognition method | |
CN108711422B (en) | Speech recognition method, speech recognition device, computer-readable storage medium and computer equipment | |
JP5957269B2 (en) | Voice recognition server integration apparatus and voice recognition server integration method | |
JP5480760B2 (en) | Terminal device, voice recognition method and voice recognition program | |
JP4465564B2 (en) | Voice recognition apparatus, voice recognition method, and recording medium | |
JP4802434B2 (en) | Voice recognition apparatus, voice recognition method, and recording medium recording program | |
JP4224250B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
KR101590724B1 (en) | Method for modifying error of speech recognition and apparatus for performing the method | |
EP2685452A1 (en) | Method of recognizing speech and electronic device thereof | |
US8271282B2 (en) | Voice recognition apparatus, voice recognition method and recording medium | |
WO2012073275A1 (en) | Speech recognition device and navigation device | |
JPWO2015118645A1 (en) | Voice search apparatus and voice search method | |
KR20120066530A (en) | Method of estimating language model weight and apparatus for the same | |
US9135911B2 (en) | Automated generation of phonemic lexicon for voice activated cockpit management systems | |
US20070038453A1 (en) | Speech recognition system | |
US10515634B2 (en) | Method and apparatus for searching for geographic information using interactive voice recognition | |
US20120259627A1 (en) | Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition | |
CN101123090B (en) | Speech recognition by statistical language using square-rootdiscounting | |
US20120245940A1 (en) | Guest Speaker Robust Adapted Speech Recognition | |
KR101424496B1 (en) | Apparatus for learning Acoustic Model and computer recordable medium storing the method thereof | |
US8306820B2 (en) | Method for speech recognition using partitioned vocabulary | |
JP2938866B1 (en) | Statistical language model generation device and speech recognition device | |
KR20200117826A (en) | Method and apparatus for speech recognition | |
JP4987530B2 (en) | Speech recognition dictionary creation device and speech recognition device | |
JP3914709B2 (en) | Speech recognition method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MITSUBISHI ELECTRIC CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HANAZAWA, TOSHIYUKI;REEL/FRAME:035895/0017 Effective date: 20150525 |
|
ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20240830 |