WO2005122143A1 - 音声認識装置および音声認識方法 - Google Patents
音声認識装置および音声認識方法 Download PDFInfo
- Publication number
- WO2005122143A1 WO2005122143A1 PCT/JP2005/009652 JP2005009652W WO2005122143A1 WO 2005122143 A1 WO2005122143 A1 WO 2005122143A1 JP 2005009652 W JP2005009652 W JP 2005009652W WO 2005122143 A1 WO2005122143 A1 WO 2005122143A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vocabulary
- speech recognition
- combination coefficient
- language model
- prediction probability
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 61
- 238000000605 extraction Methods 0.000 claims description 26
- 230000004048 modification Effects 0.000 description 40
- 238000012986 modification Methods 0.000 description 40
- 238000005516 engineering process Methods 0.000 description 28
- 238000010586 diagram Methods 0.000 description 22
- 238000012545 processing Methods 0.000 description 13
- 230000003044 adaptive effect Effects 0.000 description 7
- 239000000284 extract Substances 0.000 description 7
- 230000006978 adaptation Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 208000016354 hearing loss disease Diseases 0.000 description 5
- 230000000877 morphologic effect Effects 0.000 description 5
- 102100040841 C-type lectin domain family 5 member A Human genes 0.000 description 4
- 101150008824 CLEC5A gene Proteins 0.000 description 4
- 206010011878 Deafness Diseases 0.000 description 4
- 101150056111 MDL1 gene Proteins 0.000 description 4
- 101150095628 MDL2 gene Proteins 0.000 description 4
- 101100386697 Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958) DCL1 gene Proteins 0.000 description 4
- 101100062770 Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958) DCL2 gene Proteins 0.000 description 4
- 101100236847 Caenorhabditis elegans mdl-1 gene Proteins 0.000 description 3
- 101100236856 Prunus serotina MDL3 gene Proteins 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 1
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 1
- 238000007688 edging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
Definitions
- the present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech using a language model.
- a language model used for speech recognition etc. is subjected to preprocessing such as deleting unnecessary symbols from the sentence examples based on a large amount of sentence examples corresponding to tasks to be subjected to speech recognition. Then, morphological analysis is performed to statistically model word chain information. Usually, 2-grams or 3-grams are used for language models.
- Patent Document 1 discloses a topic adaptation technology of a language model for speech recognition as a method of creating a language model as described above.
- FIG. 1 is a flow chart showing the operation of a speech input search system using the conventional topic adaptation technology described in Patent Document 1 above.
- the speech input search system performs speech recognition using the sound model 1012 and the language model 1014 (step S1016), and generates a transcript.
- the language model 1014 is created based on the text database 1020.
- the speech input search system executes a text search using the written search request (step S 1022), and ranks and outputs search results from related ones (step S 1024).
- the speech input search system performs modeling by acquiring information on the upper document strength of the search result (step S1026), and refines the speech recognition language model 1014.
- the voice input search system displays the search result on the display unit such as the display screen of the personal computer (step S1028).
- Patent Document 2 it is targeted using information obtained from a plurality of existing language models (language models created from text data of other tasks) that do not collect a large amount of text databases.
- An invention for creating a language model of a specific task has been published.
- FIG. 2 is an explanatory diagram for explaining the processing operation performed by the language model generation unit of the speech recognition device of Patent Document 2 described above.
- the language model generation unit includes a distribution of connection frequency (prior information) obtained from a plurality of language models (language models generated from text data of other tasks) and a target specific task (specific task) (Text data in Japanese Patent No. 2 (page 21), column 19, lines 3 to 5), and the resulting connection frequency (post information) is used to The language prediction probability (presence word prediction probability) is determined. That is, the language model generation unit generates a language model corresponding to a specific task.
- the speech recognition apparatus of Patent Document 2 performs speech recognition using the language model generated by the language model generation unit.
- Patent Document 1 Japanese Patent Application Laid-Open No. 2003-36093
- Patent Document 2 Japanese Patent Application Laid-Open No. 10-198395
- the topic is characterized as it changes gradually.
- the language model can not be adapted in real time to changes in the topic, and there is a problem that speech recognition can not be properly performed.
- An object of the present invention is to provide a speech recognition apparatus and a speech recognition method that appropriately perform speech recognition.
- a speech recognition apparatus for acquiring and recognizing speech, comprising: a plurality of vocabulary acquiring means for acquiring a vocabulary; and a plurality of speech recognition units.
- Language model storage means for storing various language models
- tag information storage means for storing tag information indicating features of the language model for each language model, a vocabulary acquired by the vocabulary acquisition means
- Combination coefficient calculation means for calculating the weight of each language model according to the vocabulary acquired by the vocabulary acquisition means as a combination coefficient based on the relevancy to tag information of the language model, and the combination coefficient calculation means
- Prediction probability calculating means for calculating a prediction probability that a predetermined word appears in the voice by combining the language models according to the combination coefficient calculated by the method;
- a recognition unit that recognizes the voice using the prediction probability calculated by the measurement probability calculation unit.
- the combination coefficient of each language model according to the topic is calculated by acquiring the vocabulary corresponding to the changed topic.
- the prediction probability (N) that corresponds to the topic without collecting a large number of sentences or performing complex language processing etc.
- the appearance word prediction probability can be calculated in real time. That is, even if the topic changes, a language model corresponding to the topic can be pseudo-generated in real time, and as a result, speech recognition can be appropriately performed.
- the combination coefficient is calculated based on the relation between the vocabulary corresponding to the topic and the tag information, it is possible to generate an appropriate language model for the topic.
- the vocabulary acquisition means may acquire a vocabulary corresponding to the speech recognized by the recognition means.
- the vocabulary corresponding to the recognized voice indicates the topic of the speech by the user, even if the user speaks and the topic changes, the topic changes each time the topic changes.
- a language model corresponding to the changed topic can be generated in real time, and speech by the user can be properly recognized.
- the speech recognition apparatus further includes association degree holding means for holding the degree of association between a plurality of kinds of vocabulary and each tag information, and each of the degree of association holding means.
- Association degree deriving means for deriving the degree of association of each tag information to the vocabulary acquired by the vocabulary acquiring means using the degree of association, and the importance of the tag information to the language model for each tag information
- the combination coefficient calculating unit is configured to calculate the association coefficient calculated by the association degree deriving unit, and the importance level held by the importance level holding unit.
- the combination coefficient of each language model is calculated by using, and the prediction probability calculation means is a combination of a specific model prediction probability that the predetermined word appears and a combination of each language model, which is derived for each language model. Use coefficients and The prediction probability may be calculated by the calculation.
- the association between the topic and each language model can be accurately performed according to the degree of association and the degree of importance, and the combination coefficient of each language model for the topic can be calculated more appropriately. Furthermore, since the degree of association between multiple types of vocabularies and each tag information is maintained, if the vocabulary of the degree of association is maintained, each language model regardless of the vocabulary included in each language model. The combination coefficient of can be calculated, and a language model corresponding to many vocabulary or topic can be generated.
- combination coefficient calculation means may calculate a combination coefficient of each language model each time one vocabulary is acquired by the vocabulary acquisition means.
- the combination coefficient of each language model can be made to quickly follow the change of the topic of the speech by the user, and even if the topic changes successively, the speech by the speech can be properly recognized. Can.
- the combination coefficient calculation means a plurality of vocabularies are acquired by the vocabulary acquisition means.
- the combination coefficient of each language model may be calculated each time.
- the combination coefficient calculation means may further calculate the combination coefficients according to the plurality of vocabularies based on the relativity between the plurality of vocabularies acquired by the vocabulary acquisition means and the tag information of the language models.
- the weight of the language model may be calculated as a combination coefficient.
- the speech recognition apparatus further includes key edging extraction means for extracting at least one force keyword among electronic data being browsed by the user and profile information related to the user, and the lexical acquisition
- the means may be characterized in that a keyword extracted by the keyword extraction means is acquired as the vocabulary.
- the present invention can be realized not only as such a speech recognition device, but also as a method, program, and storage medium for storing the program.
- the speech recognition apparatus can collect a large number of sentence examples, perform complicated language processing, etc., or can not predict the probability (predicted word occurrence probability) corresponding to the topic in real time. It can be calculated. That is, even if the topic changes, a language model corresponding to the topic can be generated in real time, and as a result, speech recognition can be appropriately performed. Furthermore, since the combination coefficient is calculated based on the relation between the vocabulary corresponding to the topic and the tag information, an appropriate language model can be generated for the topic.
- FIG. 1 is a flowchart showing the operation of a conventional voice input search system.
- FIG. 2 is an explanatory diagram for explaining the processing operation performed by the language model generation unit of the conventional speech recognition device.
- FIG. 3 is a block diagram showing a language model generation device of the speech recognition device in the embodiment of the present invention.
- FIG. 4 is a diagram showing information stored in the language model storage unit and the tag information storage unit of the same as the above.
- FIG. 5 is a diagram showing the contents of the co-occurrence information of the above.
- FIG. 6 is a flowchart showing the operation of creating a language model by the language model generation device of the same.
- FIG. 7 is a block diagram of the above speech recognition device.
- FIG. 8 is a block diagram of a speech recognition device according to a first modification of the above.
- FIG. 9 is a flow chart showing the operation of the speech recognition system in the first modification of the above.
- FIG. 10 is a block diagram of a speech recognition apparatus according to a second modification of the above.
- FIG. 11 is a block diagram of a speech recognition device according to a third modification of the above.
- FIG. 12 is a block diagram of a speech recognition device according to a fourth modification of the above.
- FIG. 13 is a block diagram of a speech recognition system according to a fifth modification of the above.
- FIG. 14 is a block diagram of a speech recognition system according to a sixth modification of the above.
- Video receiver 601 Character recognition unit 601 Video receiver 602 Character recognition unit
- the speech recognition device in the present embodiment includes a language model generation device, and performs speech recognition based on the appearance word prediction probability calculated by the language model generation device.
- the language model generation device of the speech recognition device pays attention to the feature that the unit of sentence can be expressed by a combination of various topics, and based on the vocabulary representing the topic, By combining the prepared language models and calculating the word prediction probability, a language model capable of dealing with any topic is generated. For example, considering the sentence that speech recognition technology for spoken words has been established and deaf people have hoped! /, And subtitle broadcasting has come to be realized for all programs, this sentence It can be said that topics related to "voice recognition” and topics related to "hearing impairment" are also composed, as well as topic power related to "broadcasting".
- the language model generation device designates the vocabulary of "speech recognition", “hearing impairment” and “broadcasting”, combines the language models prepared in advance based on this vocabulary, and combines the word models Obtain the connected probability (predicted word occurrence probability).
- the speech recognition apparatus sequentially updates the coefficients (combination coefficients) for combining existing language models according to the designated vocabulary in real time, According to the change of the topic, the occurrence word prediction probability corresponding to the topic is calculated, that is, a language model adapted to the topic in a pseudo manner is created, and the input speech is appropriately recognized.
- FIG. 3 is a configuration diagram showing a configuration of a language model generation device according to Embodiment 1 of the present invention.
- This language model generation device accepts one or more vocabularies, combines one or more language models prepared in advance according to the accepted vocabularies, and outputs the next word. Calculate the current word prediction probability. This enables appropriate recognition of the speech related to the content of the accepted vocabulary.
- the language model generation device includes a vocabulary designation unit 101, a degree of association calculation unit 102, a combination coefficient calculation unit 103, a language probability calculation unit 104, and a language model information storage unit 105. .
- Language model information storage unit 105 stores language model storage unit 106 storing a plurality of language models, and a vocabulary (hereinafter referred to as tag information) representing the feature of topicality of each language model.
- the tag information storage unit 107 is provided.
- FIG. 4 is a diagram showing information stored in the language model storage unit 106 and the tag information storage unit 107.
- the language model storage unit 106 stores and speaks multiple types of language models.
- the language model storage unit 106 stores a language model MDL1 corresponding to technology-use, a language model MDL2 corresponding to welfare technology, and a language model MDL3 corresponding to music information.
- the language model storage unit 106 outputs a specific model prediction probability signal 119 indicating an appearance word prediction probability P (Wj I Wj-1) corresponding to each language model.
- the occurrence word prediction probability P (Wj I Wj-1) means the probability that the word Wj follows the word Wj-1.
- the tag information storage unit 107 stores, for each language model described above, tag information representing the feature of the language model and the degree of importance of the tag information.
- the above-mentioned importance indicates the degree of relation between tag information and a language model corresponding to the tag information, and is indicated by a numerical value less than 1, for example.
- the tag information "news" and “technology” are stored in the language model MDL1
- the importance "0.4" of the tag information "news” and the importance of the tag information "technology” are stored.
- the tag information storage unit 107 outputs a tag information signal 116 indicating the tag information to the association degree calculation unit 102, and an importance degree signal 115 indicating the importance of each tag information to the language model as a combination coefficient calculation unit. 1 Output to 03.
- the vocabulary specifying unit 101 receives a vocabulary indicating the feature of topicality, and outputs a vocabulary information signal 111 indicating the vocabulary to specify the vocabulary in the degree-of-relevance calculating unit 102.
- the target vocabulary designation unit 108 outputs, to the language probability calculation unit 104, a target word vocabulary signal 118 indicating the vocabulary for which the appearance word prediction probability is to be calculated.
- the target vocabulary designation unit 108 sets, for example, vocabularies that become candidates for speech recognition results as probability calculation targets, and outputs a target vocabulary signal 118 indicating the vocabulary.
- the degree-of-association calculation unit 102 holds co-occurrence information 102 a indicating the degree to which two words appear together in the same sentence (individual degree of association).
- FIG. 5 is a diagram showing the contents of co-occurrence information 102a.
- the co-occurrence information 102a indicates a plurality of types of vocabulary sets and individual association degrees between the vocabulary in each set.
- the co-occurrence information 102a indicates a set of vocabulary of "speech recognition” and "technology” and an individual association degree "0.8" of the vocabulary in the set, and the "speech recognition” and “subtitles” A set of vocabulary and individual degree of association "0.5" between the vocabulary in the set are shown.
- the degree of association calculation unit 102 designates with the vocabulary designation unit 101. For each vocabulary, the individual relevance of each tag information to the vocabulary is specified.
- the degree-of-association calculation unit 102 refers to the co-occurrence information 102 a when specifying the above-described individual degree of association.
- the degree-of-association calculation unit 102 identifies an individual degree of association between the designated vocabulary and the tag information by regarding one vocabulary included in the vocabulary set indicated in the co-occurrence information 102 a as the tag information.
- the degree of association calculating unit 102 obtains the degree of association (degree of association a) of each tag information for all the vocabulary designated by the vocabulary specifying unit 101 from the identified individual degree of relevance, and obtains the obtained degree of association.
- a tag information relevance signal 112 is output.
- the combination coefficient calculation unit 103 is received by the vocabulary specification unit 101 from the importance signal 115 representing the importance of the language model stored in the language model storage unit 106 and the tag information relevance signal 112. The combination coefficient of each language model corresponding to the vocabulary is calculated, and a combination coefficient signal 113 indicating the combination coefficient is output.
- the language probability calculation unit 104 calculates a combination coefficient, and a specific model prediction probability signal 119 indicating the word occurrence probability P (Wj I Wj-1) of each language model stored in the language model storage unit 106.
- the combination coefficient signal 113 output from the unit 103 is acquired.
- Language probability calculator that acquires the specific model prediction probability signal 119 and the combination coefficient signal 113 of each language model 104 calculates, for each vocabulary indicated by the target vocabulary signal 118 output from the target vocabulary specification unit 108, an appearance word prediction probability adapted to the topic, and an adaptation showing the appearance word prediction probability adapted to the topic
- the language probability signal 114 is output.
- FIG. 6 is a flowchart showing the operation of the language model generation device described above.
- the vocabulary specifying unit 101 receives a vocabulary related to the content of the speech, for example, "speech recognition” or "hearing impairment”, and specifies the vocabulary for the degree of association calculating unit 102 (step S202).
- the degree-of-association calculation unit 102 calculates the degree of association oc of each piece of tag information stored in the tag information storage unit 107 based on the designated vocabulary (step S 203).
- association degree calculation unit 102 when there is tag information “news” and “technology” for language model MDL 1 and there is tag information “welfare” and “technology” for language model MDL 2, association degree calculation unit 102 First, the tag information "news”, “technology” and “welfare” stored in the tag information storage unit 107 for each of the vocabulary “speech recognition” and “hearing impairment” designated by the vocabulary specification unit 101 are Each tag information, such as “,” is related to each other to identify the degree of individual relevance (the degree of relevance for each designated vocabulary). This individual association degree is identified based on the co-occurrence information 102a.
- the point of the present invention is to obtain combination coefficients of accepted lexical ability and language models by interposing tag information.
- the degree of association ⁇ represents the association between the accepted vocabulary and the tag information, and the development of the topic is predicted by this tag information.
- the degree of association ⁇ can be calculated as follows using the co-occurrence information 102a.
- TAG the degree of association (TAG) to all the words per TAG.
- the function R indicates the individual association degree defined by the co-occurrence information 102a, that is, the individual association degree of Word, TAG, and k 1.
- tag information can be defined as a noun included in each language model, but is desirably identified using a token such as term frequency inverse document frequency (tfidf). , It is good to prepare words specific to each language model as tag information
- co-occurrence information 102a be created using more general information such as newspaper articles and the Internet. Furthermore, with regard to co-occurrence relationships, etc., there may be cases where the relationship between Word and TAG is not defined due to the problem of language sparsity. That k 1
- the second degree co-occurrence relation may be used to calculate the degree of association a (TAG).
- TAG 1 Wlx :: R (W rd k , TAG)
- the degree of association calculation unit 102 determines how important each piece of tag information representing the feature of each language model stored in the language model storage unit 106 is with respect to the vocabulary designated by the vocabulary designation unit 101.
- the degree of association OC is calculated.
- One of the advantages of interposing tag information is that the number of vocabulary that can be designated by the vocabulary designation unit 101 can be made larger than the number of vocabulary of the language model. Specifically, while the number of words that can be used as a target for speech recognition is about 100,000 even in terms of performance, it is specified by the vocabulary specification unit 101 using the method of the present invention. As long as there is a co-occurrence relation between the subject vocabulary and the tag information, the designated vocabulary size can be, for example, one million words regardless of the vocabulary size of the language model.
- the combination coefficient calculation unit 103 is specified by the vocabulary specification unit 101 based on the degree of association of each of the above tag information (“news”, “welfare”, “technology”, etc.)
- the combination coefficient ⁇ of each language model according to the vocabulary is calculated (step S 204)
- the tag information storage unit 107 determines the degree of relation between each tag information and each language model.
- Importance level is defined in advance as ⁇
- the importance level “0.4” of tag information “-” is defined for language model MDL 1.
- this importance level j8 It is possible to calculate the appearance word prediction probability according to the feature of the topic specified by the vocabulary specified by the vocabulary specifying unit 101. Note that the above-mentioned tfidf may be used as an index of such importance.
- n-th language model is N-gram
- TAG tag information
- 8 TAG, N-gram
- the combination coefficient ⁇ for the language model (N-gram) of can be obtained by the following equation 3.
- the combination coefficient calculation unit 103 uses the tag information relevance signal 112 (degree of association ⁇ ) output from the degree of association calculation unit 102 and the importance level signal 115 (importance degree ⁇ ) output from the tag information storage unit 107. And the weight of the combination of each language model according to the vocabulary ("speech recognition" and "hearing impairment") designated by the vocabulary designation unit 101. (how much each language model can be involved in the designated vocabulary) Calculate the combination coefficient ⁇ indicating.
- the language probability calculation unit 104 is specified by the specific model prediction probability signal 119 indicating the word appearance prediction probability of each language model stored in the language model storage unit 106 and the target vocabulary specification unit 108.
- the target vocabulary signal 118 indicating the vocabulary and the combination coefficient signal 113 (combination coefficient ⁇ )
- the appearance word prediction probability is calculated and the adaptive language probability signal 114 is output (step S205).
- the language probability calculation unit 104 calculates an appearance word prediction probability that the word Wj (the vocabulary specified by the target vocabulary specification unit 108) appears after the word Wj-1 using Formula 4.
- N-gram 3-gram, 4-gram, etc.
- FSA Finite State Automaton
- n 1 Note that P. (Wj I Wj-1) indicates the predicted word occurrence probability of the n-th language model.
- relevance degree calculation unit 102 uses equation 1 or equation 2 to determine the degree of association of each tag information oc
- the combination coefficient calculation unit 103 calculates the product of the degree of importance ⁇ between each tag information and each language model and the degree of association ex according to Equation 3 to obtain the combination coefficient ⁇ .
- the word prediction probability for the vocabulary (history 1 and the target vocabulary) designated by the target vocabulary designation unit 108 is calculated from the language probability calculation unit 10 from the specific model prediction probability signal 119 of each language model and the combination coefficient signal 113. Is calculated by Equation 4 and the calculation result is output as an adaptive language probability signal 114.
- Ru is
- FIG. 7 is a block diagram of the speech recognition apparatus in the present embodiment.
- the speech recognition apparatus includes the language model generation apparatus described above, a speech input unit 300, a speech recognition unit 301, and a speech recognition result output unit 117.
- Voice input unit 300 receives a voice (speech) and outputs the voice as input voice signal 314 to voice recognition unit 301.
- the speech recognition unit 301 performs speech recognition processing on the input speech signal 314, and outputs the above-mentioned target vocabulary signal 118 indicating each candidate of the vocabulary corresponding to the speech to the language probability calculation unit 104. Furthermore, the speech recognition unit 301 acquires an adaptive language probability signal 114 indicating the word appearance probability of each candidate calculated by the language probability calculation unit 104, and uses the adaptive language probability signal 114 as a language model. That is, the speech recognition unit 301 narrows down the vocabulary corresponding to the speech out of the candidates based on the word prediction probability of each candidate. Then, the speech recognition unit 301 outputs a speech recognition output signal 311 indicating the vocabulary obtained by the narrowing-down to the speech recognition result output unit 117.
- the speech recognition result output unit 117 includes a display, a device control system, and the like, and displays the vocabulary indicated by the speech recognition output signal 311.
- the degree of relevance of each tag information is determined based on the designated vocabulary simply by designating one or more vocabularies as vocabulary designation unit 101.
- the combination coefficient calculation unit 103 determines the importance of each tag information and each language model, and the combination coefficient y of each language model in the degree calculation unit 102.
- a language model is combined and an appearance word prediction probability is determined by the language probability calculation unit 104.
- the obtained appearance word prediction probability can be used as a language model adaptable to a topic. It can be obtained instantaneously by the speech recognition unit 301.
- the combination coefficient of each language model according to the topic is calculated. Therefore, by combining the language models using the calculated combination coefficient. Therefore, by combining the language models using the calculated combination coefficient, As in the conventional example, it is possible to calculate in real time the appearance word prediction probability corresponding to the topic without collecting a large number of sentence examples or performing complicated language processing. That is, even if the topic changes, a language model corresponding to the topic can be generated in real time, and as a result, speech recognition can be appropriately performed. Furthermore, since the combination coefficient is calculated based on the association between the vocabulary corresponding to the topic and the tag information, it is possible to calculate an appropriate combination coefficient for the topic.
- FIG. 8 is a block diagram of the speech recognition apparatus according to the present modification.
- the speech recognition apparatus uses the recognition result of speech recognition unit 301 as the vocabulary received by vocabulary designation unit 101.
- the speech recognition receives feedback of the recognition result, and the combination coefficient ⁇ is sequentially changed, so that speech recognition adaptive to the topic dynamically becomes possible.
- the combination coefficient calculation method of the present invention is characterized in that the language model related to the vocabulary can be instantaneously configured only by specifying the vocabulary of at most one word. It is possible to respond instantly to changes.
- the speech recognition apparatus includes the components of the speech recognition apparatus shown in FIG. 7, and also includes a result output unit 302 and a keyword extraction unit 303.
- the result output unit 302 receives the speech recognition output signal 311 output from the speech recognition unit 301 and uses the speech recognition output signal 311 as a recognition result signal 312 as a speech recognition result output unit 117 and a keyword extraction unit 303. Output to
- the keyword extraction unit 303 receives the recognition result signal 312 output from the result output unit 302, extracts a vocabulary serving as a keyword from the recognition result signal 312, and displays a keyword indicating the vocabulary (key word)
- the signal 313 is output to the vocabulary designation unit 101.
- the vocabulary designation unit 101 receives the vocabulary indicated by the keyword signal 313 output from the keyword extraction unit 303.
- FIG. 9 is a flowchart showing the operation of the speech recognition apparatus according to the present modification.
- the voice recognition unit 301 determines whether voice is detected by the voice input unit 300 based on the input voice signal 314 output from the voice input unit 300 (step S402), and it is determined that the voice is detected. When it is detected (Y in step S402), the detected voice is recognized (step S403). On the other hand, when it is determined that no force has been detected (N in step S402), the voice recognition unit 301 determines whether an end instruction has been given based on an operation by the user, for example (step S409).
- step S409 When it is determined by the speech recognition unit 301 that an instruction to end has been issued (Y in step S409), the speech recognition device ends all processing, and when it is determined that an instruction has not been issued (step S409). N) The speech recognition device repeatedly executes the process from step S402.
- the voice recognition result output unit 117 obtains the result recognized by the voice recognition unit 301 from the voice recognition unit 301 via the result output unit 302, and the result is obtained.
- the keyword extraction unit 303 extracts a keyword having information on a topic from the recognition result (step S 405), and designates the extracted keyword in the vocabulary specification unit 101. That is, the vocabulary specifying unit 101 receives the keyword specified in this way as a vocabulary, and specifies the vocabulary as the degree-of-relation calculating unit 102 (step S406).
- the extraction of the key code in the keyword extraction unit 303 can be realized, for example, by extracting only the recognition result power noun. In addition, it is also effective to eliminate the sparsity of the co-occurrence relation by specifying the similar words or concept words of the extracted keywords.
- the degree-of-association calculation unit 102 calculates the degree of association ⁇ of each piece of tag information based on the vocabulary designated by the vocabulary designation unit 101 (step S 407).
- the combination coefficient calculation unit 103 calculates the combination coefficient ⁇ of each language model using the degree of importance 13 defined between each tag information and each language model and the degree of association ⁇ (step S408). ),
- the calculated combination coefficient ⁇ is reflected in the processing of step S402 and step S403. That is, when it is determined that the speech is detected again in step S402, the speech recognition unit 301 uses Expression 4 to predict the appearance word prediction probability by a plurality of language models based on the combination coefficient ⁇ calculated above. After the calculation, the speech recognition is performed using the appearance word prediction probability (step S403).
- each utterance one vocabulary corresponding to speech is recognized
- the combination coefficient ⁇ of each language model can be changed to and, and a speech recognition device that can dynamically adapt to topics can be realized.
- tag information "news” and “technology” are stored with importance of 0.4 and 0.3 respectively for the language model of "technology-use”, and the language model of "welfare technology” is stored.
- the tag information "welfare” and “technology” are stored with importance levels 0.7 and 0.3, respectively. It is assumed that tag information and importance are stored as shown in Figure 4 for other language models.
- voice flows “The speech recognition technology for spoken words has been established, and there is a possibility that the deaf broadcasters wishing to the deaf people can be realized in all programs.
- the combination coefficient is updated using the extracted keywords.
- a plurality of keywords are used. Good. In this case, it is effective to be able to suppress sudden changes in language model more than necessary.
- FIG. 10 is a block diagram of the speech recognition apparatus according to the present modification.
- the speech recognition apparatus includes each component of the speech recognition apparatus of the above-described modification 1, and also includes a keyword transmission unit 304.
- the keyword transmitting unit 304 is provided between the keyword extracting unit 303 and the vocabulary specifying unit 101.
- the keyword transmitting unit 304 acquires the keyword signal 313 from the keyword extracting unit 303, and outputs the keyword signal 313 to the vocabulary specifying unit 101 at a predetermined timing.
- the modification of the keyword transmission unit 304 is performed in this modification to allow the vocabulary specification unit 101 to specify the vocabulary.
- the timing can be controlled.
- the keyword transmitting unit 304 transmits the keyword signal 313 to the vocabulary specifying unit 101 for each of a plurality of utterances, or after a predetermined number of keywords (vocabulary) are stored, it is appropriate.
- the vocabulary designated to the vocabulary designation unit 101 by one transmission of the keyword signal 313 is extracted and accumulated up to the transmission time even if it is one keyword extracted at the transmission timing. It may be multiple keywords.
- the vocabulary accepted by the vocabulary designation unit 101 is a speech recognition unit 301.
- the speech recognition receives feedback of the recognition result, and speech recognition adaptive to the topic becomes possible dynamically, and the timing of adaptation to the topic is appropriate.
- FIG. 11 is a block diagram of the speech recognition apparatus according to the present modification.
- the speech recognition apparatus includes each component of the speech recognition apparatus shown in FIG. 7 of the above embodiment, and further includes a keyword extraction unit 303 a and a text input unit 503.
- Such a speech recognition apparatus is configured such that the vocabulary designated in the vocabulary designation unit 101 is a key code from which an electronic program list 501, a program-related home page 502, and a text information ability, which will be described later, are extracted. With this configuration, it is possible to perform speech recognition of contents related to the electronic program guide, the program related home page, and the input text. That is, the speech recognition apparatus recognizes speech suitable for the topic, assuming that the contents related to the electronic program guide and the like are the subject of the topic.
- the text input unit 503 acquires text information based on, for example, a keyboard input operation by the user, and outputs the text information as a text input signal 513 to the keyword extraction unit 303 a.
- the keyword extraction unit 303 receives an electronic program guide (Electronic Program Program) 501 which is electronically distributed as an electronic program guide signal 511, and a program related home page 502 which is a home page (electronic data) related to program contents. It receives as a program-related homepage signal 512, and further receives a text input signal 513 from the text input unit 503.
- the program related home page 502 is electronic data indicating program contents available through the network, such as a home page related to the program contents of the electronic program guide 501 or a home page related to the program contents published on the TV station home page. . Further, the contents of the electronic program guide 501 and the program related home page 502 are changed according to the browsing operation by the user.
- the keyword extraction unit 303 has an electronic program guide signal 511, a program related homepage. From the signal 512 and the text input signal 513, a key command (vocabulary) to be designated in the vocabulary designation unit 101 is extracted, and a keyword signal 313 indicating the keyword is outputted to the vocabulary designation unit 101.
- the keyword extraction unit 303 performs language processing such as morphological analysis on the electronic program guide signal 511 received as input, the program related homepage signal 512, and the text input signal 513, and then extracts only nouns. Extract keywords using the method such as. Further, as in the case of the first modification, it is also effective to eliminate the sparseness of the co-occurrence relation by outputting the similar term or concept word of the extracted keyword.
- the contents of electronic program guide 501 being browsed by the user, the contents of program-related home page 502 being browsed by the user, and the contents of the user are being input by the user through a browsing operation or an input operation.
- a browsing operation or an input operation Each time the content of the text information changes, it becomes possible to perform speech recognition in accordance with the changed content. That is, it is possible to perform appropriate speech recognition by instantaneously generating a language model according to the user's operation using a feed nok to the operation. This makes it possible, for example, to refer to the past electronic program guide and to recognize related topics such as ForceS.
- the configuration it is possible to designate the electronic program guide, the program related home page, and the keyword extracted from the input text information ability in the vocabulary designation unit 101, and every time the home page being browsed changes. It is possible to calculate the appearance word prediction probability related to the contents of the electronic program guide, the program-related home page, and the input text information, and to perform speech recognition adapted to the topic of the contents.
- the electronic program guide, the program related home page, and the input text information power may also be a power V configured to extract a keyword, or only one power may be extracted, needless to say! .
- such a voice recognition device is incorporated in a personal computer, and is used in a situation where a plurality of users are in conversation while browsing electronic data displayed on the personal computer. Be done.
- the agent incorporated in the personal computer can make the conversation related to the food the sound according to the present modification. Properly recognize using a voice recognition device. Then, based on the recognition result, the agent presents information indicating the user's interest such as sushi expressed in the conversation V. (Modification 4)
- FIG. 12 is a block diagram of the speech recognition apparatus according to the present modification.
- the voice recognition apparatus includes each component of the voice recognition apparatus shown in FIG. 7 of the above embodiment, and is displayed on a video receiver 601 for copying a video such as a television, and a video receiver 601.
- a character recognition unit 602 that performs character recognition on character information, and a keyword extraction unit 303 b that extracts a keyword from the character recognition result signal 612 output from the character recognition unit 602 are provided.
- This speech recognition apparatus assumes that the content of the character displayed on the video image receiving unit 601 is the subject of a topic, and performs speech recognition suitable for the topic.
- the video receiver 601 sends the displayed video information as a video signal 611 to the character recognition unit 602.
- the character recognition unit 602 acquires the video signal 611 and performs character recognition on the video information indicated by the video signal 611. Then, the character recognition unit 602 sends the character recognition result as a character recognition result signal 612 to the keyword extraction unit 303 b.
- the keyword extraction unit 303b performs processing such as morphological analysis on the character recognition result signal 612, extracts a keyword (word) from the character recognition result signal 612, and sends the keyword signal 313 indicating the keyword to the vocabulary specification unit 101. Send.
- the speech recognition apparatus can calculate the appearance word prediction probability related to the content of the text displayed on the screen of the video reception unit 601, and perform speech recognition adapted to the topic of the content.
- voice recognition can be performed according to the content of the keyword based on the keyword displayed on the screen of the video image receiving unit 601. For example, according to the title of the news, -Speech recognition in line with the content of use can be realized.
- keyword extraction by the keyword extraction unit 303b each time a keyword appears in a subtitle, it is possible to recognize a conversation according to the program content.
- the voice recognition device is incorporated in a television and displayed on the television-used in a situation where a plurality of users are in conversation while watching the use etc. Be done. Specifically, if multiple users are talking while watching, for example, economic use, the agent embedded in the television is associated with that use. Conversations are appropriately recognized using the speech recognition apparatus according to the present modification. Then, based on the recognition result, the agent presents information indicating the user's interest, such as the stock price, expressed in the conversation.
- FIG. 13 is a block diagram of the speech recognition apparatus according to the present modification.
- the speech recognition apparatus includes each component of the speech recognition apparatus shown in FIG. 7 of the above-described embodiment, and a profile information storage unit 701 for storing profile information, and an output from the profile information storage unit 701. And a keyword extraction unit 303 for extracting a key word from the profile information signal 711.
- the profile information is information related to the user such as the preference of the user, and the profile information signal 711 is a signal indicating the profile information.
- This speech recognition device assumes that the content of the profile information is the subject of a topic, and performs speech recognition suitable for the topic.
- the keyword extraction unit 303 performs processing such as morphological analysis on the profile information signal 711 output from the profile information storage unit 701 to extract a key word (vocabulary),
- the keyword can be designated as the keyword signal 313 in the vocabulary designation unit 101.
- the voice recognition device is applied to a ticket reservation system.
- the profile information storage unit 701 stores profile information indicating that the user likes “classic music”.
- a keyword (classical music) indicated by this profile information is designated in the vocabulary specification unit 101, so that a language model corresponding to the language expression necessary for making a concert reservation of classical music is created. Can. This makes it possible to more reliably recognize the user's speech.
- FIG. 14 is a block diagram of the speech recognition apparatus according to the present modification.
- the speech recognition apparatus according to the present modification includes each component of the speech recognition apparatus shown in FIG. 7 of the above-described embodiment, a text input unit 503 shown in FIG. A profile information storage unit 701 shown in and a keyword extraction unit 303 d are provided.
- This speech recognition apparatus recognizes speech suitable for the topic, assuming that the contents of the profile information and the electronic program guide are the subject of the topic.
- the keyword extraction unit 303 d extracts a keyword (vocabulary) from the profile information signal 711, the electronic number thread / table signal 511, the program number related home page signal 512, and the text input signal 513, and indicates the keyword
- the signal 313 is output to the vocabulary designation unit 101.
- the speech recognition apparatus has the features of the speech recognition apparatus of variation 3 and the features of the speech recognition apparatus of variation 5 and includes profile information, an electronic program guide 501, and a program related home page. 502 and use the text information in combination at the same time.
- the speech recognition apparatus generates, for example, a language model in line with the drama based on the profile information that the user “loves drama” and the electronic program guide. And the user's speech can be recognized more appropriately.
- the present invention makes it possible to use a language model adapted to the topic only by specifying a vocabulary of at least one word expressing the content, and as a result, it is possible to use speech recognition that can be adapted dynamically to the topic. It becomes possible to realize voice recognition technology for user interface of various devices such as home appliances, AV (Audio Video) devices, personal computers etc., and subtitles for performing character recognition on AV (Audio Video) data. It can also be applied to applications such as application devices and tagging devices.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006514451A JP3923513B2 (ja) | 2004-06-08 | 2005-05-26 | 音声認識装置および音声認識方法 |
US11/296,268 US7310601B2 (en) | 2004-06-08 | 2005-12-08 | Speech recognition apparatus and speech recognition method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-169730 | 2004-06-08 | ||
JP2004169730 | 2004-06-08 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/296,268 Continuation US7310601B2 (en) | 2004-06-08 | 2005-12-08 | Speech recognition apparatus and speech recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005122143A1 true WO2005122143A1 (ja) | 2005-12-22 |
Family
ID=35503309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/009652 WO2005122143A1 (ja) | 2004-06-08 | 2005-05-26 | 音声認識装置および音声認識方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7310601B2 (ja) |
JP (1) | JP3923513B2 (ja) |
WO (1) | WO2005122143A1 (ja) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007083496A1 (ja) * | 2006-01-23 | 2007-07-26 | Nec Corporation | 音声認識用言語モデル作成用のシステム、方法およびプログラムならびに音声認識システム |
JP2007225952A (ja) * | 2006-02-24 | 2007-09-06 | Casio Comput Co Ltd | 画像処理装置および画像処理のプログラム |
WO2010061507A1 (ja) * | 2008-11-28 | 2010-06-03 | 日本電気株式会社 | 言語モデル作成装置 |
WO2010100853A1 (ja) * | 2009-03-04 | 2010-09-10 | 日本電気株式会社 | 言語モデル適応装置、音声認識装置、言語モデル適応方法、及びコンピュータ読み取り可能な記録媒体 |
JP2011059830A (ja) * | 2009-09-07 | 2011-03-24 | Honda Motor Co Ltd | 言語学習装置、言語学習プログラム及び言語学習方法 |
JP2012008554A (ja) * | 2010-05-24 | 2012-01-12 | Denso Corp | 音声認識装置 |
US20120022866A1 (en) * | 2009-12-23 | 2012-01-26 | Ballinger Brandon M | Language Model Selection for Speech-to-Text Conversion |
JP2017058534A (ja) * | 2015-09-17 | 2017-03-23 | 日本電信電話株式会社 | 言語モデル作成装置、言語モデル作成方法、およびプログラム |
US10311860B2 (en) | 2017-02-14 | 2019-06-04 | Google Llc | Language model biasing system |
JP2020042313A (ja) * | 2016-01-06 | 2020-03-19 | グーグル エルエルシー | 音声認識システム |
US11416214B2 (en) | 2009-12-23 | 2022-08-16 | Google Llc | Multi-modal input on an electronic device |
US11996103B2 (en) | 2022-07-11 | 2024-05-28 | Google Llc | Voice recognition system |
Families Citing this family (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7539086B2 (en) * | 2002-10-23 | 2009-05-26 | J2 Global Communications, Inc. | System and method for the secure, real-time, high accuracy conversion of general-quality speech into text |
WO2005039406A1 (en) * | 2003-10-23 | 2005-05-06 | Koninklijke Philips Electronics, N.V. | Heart monitor with remote alarm capability |
CN1922605A (zh) * | 2003-12-26 | 2007-02-28 | 松下电器产业株式会社 | 辞典制作装置以及辞典制作方法 |
US7848927B2 (en) * | 2004-11-30 | 2010-12-07 | Panasonic Corporation | Speech recognition device and method of recognizing speech using a language model |
WO2006080149A1 (ja) * | 2005-01-25 | 2006-08-03 | Matsushita Electric Industrial Co., Ltd. | 音復元装置および音復元方法 |
US8265933B2 (en) * | 2005-12-22 | 2012-09-11 | Nuance Communications, Inc. | Speech recognition system for providing voice recognition services using a conversational language model |
US8510109B2 (en) | 2007-08-22 | 2013-08-13 | Canyon Ip Holdings Llc | Continuous speech transcription performance indication |
US20090204399A1 (en) * | 2006-05-17 | 2009-08-13 | Nec Corporation | Speech data summarizing and reproducing apparatus, speech data summarizing and reproducing method, and speech data summarizing and reproducing program |
US8069032B2 (en) * | 2006-07-27 | 2011-11-29 | Microsoft Corporation | Lightweight windowing method for screening harvested data for novelty |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8447285B1 (en) | 2007-03-26 | 2013-05-21 | Callwave Communications, Llc | Methods and systems for managing telecommunications and for translating voice messages to text messages |
US8325886B1 (en) | 2007-03-26 | 2012-12-04 | Callwave Communications, Llc | Methods and systems for managing telecommunications |
US8352264B2 (en) | 2008-03-19 | 2013-01-08 | Canyon IP Holdings, LLC | Corrective feedback loop for automated speech recognition |
US9973450B2 (en) | 2007-09-17 | 2018-05-15 | Amazon Technologies, Inc. | Methods and systems for dynamically updating web service profile information by parsing transcribed message strings |
US8214338B1 (en) | 2007-05-01 | 2012-07-03 | Callwave, Inc. | Methods and systems for media storage |
US8583746B1 (en) | 2007-05-25 | 2013-11-12 | Callwave Communications, Llc | Methods and systems for web and call processing |
US8392392B1 (en) * | 2008-09-11 | 2013-03-05 | Smith Micro Software, Inc | Voice request broker |
GB2469499A (en) * | 2009-04-16 | 2010-10-20 | Aurix Ltd | Labelling an audio file in an audio mining system and training a classifier to compensate for false alarm behaviour. |
US10276170B2 (en) * | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US9576570B2 (en) | 2010-07-30 | 2017-02-21 | Sri International | Method and apparatus for adding new vocabulary to interactive translation and dialogue systems |
US8527270B2 (en) * | 2010-07-30 | 2013-09-03 | Sri International | Method and apparatus for conducting an interactive dialogue |
US8744860B2 (en) * | 2010-08-02 | 2014-06-03 | At&T Intellectual Property I, L.P. | Apparatus and method for providing messages in a social network |
KR101699720B1 (ko) * | 2010-08-03 | 2017-01-26 | 삼성전자주식회사 | 음성명령 인식 장치 및 음성명령 인식 방법 |
US9099087B2 (en) * | 2010-09-03 | 2015-08-04 | Canyon IP Holdings, LLC | Methods and systems for obtaining language models for transcribing communications |
US8352245B1 (en) | 2010-12-30 | 2013-01-08 | Google Inc. | Adjusting language models |
US8296142B2 (en) | 2011-01-21 | 2012-10-23 | Google Inc. | Speech recognition using dock context |
US9202465B2 (en) * | 2011-03-25 | 2015-12-01 | General Motors Llc | Speech recognition dependent on text message content |
US9679561B2 (en) | 2011-03-28 | 2017-06-13 | Nuance Communications, Inc. | System and method for rapid customization of speech recognition models |
US20140089239A1 (en) * | 2011-05-10 | 2014-03-27 | Nokia Corporation | Methods, Apparatuses and Computer Program Products for Providing Topic Model with Wording Preferences |
JP5799733B2 (ja) * | 2011-10-12 | 2015-10-28 | 富士通株式会社 | 認識装置、認識プログラムおよび認識方法 |
US9324323B1 (en) | 2012-01-13 | 2016-04-26 | Google Inc. | Speech recognition using topic-specific language models |
JP6019604B2 (ja) * | 2012-02-14 | 2016-11-02 | 日本電気株式会社 | 音声認識装置、音声認識方法、及びプログラム |
US8775177B1 (en) | 2012-03-08 | 2014-07-08 | Google Inc. | Speech recognition process |
US9620111B1 (en) * | 2012-05-01 | 2017-04-11 | Amazon Technologies, Inc. | Generation and maintenance of language model |
WO2014063099A1 (en) * | 2012-10-19 | 2014-04-24 | Audience, Inc. | Microphone placement for noise cancellation in vehicles |
US9747900B2 (en) | 2013-05-24 | 2017-08-29 | Google Technology Holdings LLC | Method and apparatus for using image data to aid voice recognition |
US9508345B1 (en) | 2013-09-24 | 2016-11-29 | Knowles Electronics, Llc | Continuous voice sensing |
US9953634B1 (en) | 2013-12-17 | 2018-04-24 | Knowles Electronics, Llc | Passive training for automatic speech recognition |
US9842592B2 (en) | 2014-02-12 | 2017-12-12 | Google Inc. | Language models using non-linguistic context |
US10643616B1 (en) * | 2014-03-11 | 2020-05-05 | Nvoq Incorporated | Apparatus and methods for dynamically changing a speech resource based on recognized text |
US9812130B1 (en) * | 2014-03-11 | 2017-11-07 | Nvoq Incorporated | Apparatus and methods for dynamically changing a language model based on recognized text |
US9412365B2 (en) | 2014-03-24 | 2016-08-09 | Google Inc. | Enhanced maximum entropy models |
US9437188B1 (en) | 2014-03-28 | 2016-09-06 | Knowles Electronics, Llc | Buffered reprocessing for multi-microphone automatic speech recognition assist |
US10134394B2 (en) | 2015-03-20 | 2018-11-20 | Google Llc | Speech recognition using log-linear model |
US20170018268A1 (en) * | 2015-07-14 | 2017-01-19 | Nuance Communications, Inc. | Systems and methods for updating a language model based on user input |
US10896681B2 (en) * | 2015-12-29 | 2021-01-19 | Google Llc | Speech recognition with selective use of dynamic language models |
US9978367B2 (en) | 2016-03-16 | 2018-05-22 | Google Llc | Determining dialog states for language models |
US10832664B2 (en) | 2016-08-19 | 2020-11-10 | Google Llc | Automated speech recognition using language models that selectively use domain-specific model components |
CN108346073B (zh) * | 2017-01-23 | 2021-11-02 | 北京京东尚科信息技术有限公司 | 一种语音购物方法和装置 |
KR102435750B1 (ko) * | 2017-12-14 | 2022-08-25 | 현대자동차주식회사 | 멀티미디어 장치 및 이를 포함하는 차량, 멀티미디어 장치의 방송 청취 방법 |
CN110703612B (zh) * | 2018-07-10 | 2023-09-15 | 松下家电(中国)有限公司 | 一种家电自动调整用户设置参数的方法 |
US11568007B2 (en) * | 2018-10-03 | 2023-01-31 | Walmart Apollo, Llc | Method and apparatus for parsing and representation of digital inquiry related natural language |
US11954719B2 (en) * | 2019-05-30 | 2024-04-09 | Ncr Voyix Corporation | Personalized voice-based assistance |
US11397859B2 (en) * | 2019-09-11 | 2022-07-26 | International Business Machines Corporation | Progressive collocation for real-time discourse |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002082690A (ja) * | 2000-09-05 | 2002-03-22 | Nippon Telegr & Teleph Corp <Ntt> | 言語モデル生成方法、音声認識方法及びそのプログラム記録媒体 |
JP2002268678A (ja) * | 2001-03-13 | 2002-09-20 | Mitsubishi Electric Corp | 言語モデル構成装置及び音声認識装置 |
JP2002297372A (ja) * | 2001-03-30 | 2002-10-11 | Seiko Epson Corp | ウエブページの音声検索方法、音声検索装置および音声検索プログラム |
JP2003255985A (ja) * | 2002-02-28 | 2003-09-10 | Toshiba Corp | 統計的言語モデル作成方法及び装置並びにプログラム |
JP2004053745A (ja) * | 2002-07-17 | 2004-02-19 | Nippon Telegr & Teleph Corp <Ntt> | 言語モデル生成方法、その装置及びそのプログラム |
JP2004333738A (ja) * | 2003-05-06 | 2004-11-25 | Nec Corp | 映像情報を用いた音声認識装置及び方法 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5467425A (en) * | 1993-02-26 | 1995-11-14 | International Business Machines Corporation | Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models |
JP3027544B2 (ja) | 1997-01-10 | 2000-04-04 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | 統計的言語モデル生成装置及び音声認識装置 |
JP3794597B2 (ja) | 1997-06-18 | 2006-07-05 | 日本電信電話株式会社 | 話題抽出方法及び話題抽出プログラム記録媒体 |
US6418431B1 (en) * | 1998-03-30 | 2002-07-09 | Microsoft Corporation | Information retrieval and speech recognition based on language models |
US6233559B1 (en) * | 1998-04-01 | 2001-05-15 | Motorola, Inc. | Speech control of multiple applications using applets |
JP3232289B2 (ja) * | 1999-08-30 | 2001-11-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 記号挿入装置およびその方法 |
JP2001188784A (ja) * | 1999-12-28 | 2001-07-10 | Sony Corp | 会話処理装置および方法、並びに記録媒体 |
US6606597B1 (en) * | 2000-09-08 | 2003-08-12 | Microsoft Corporation | Augmented-word language model |
US20020087313A1 (en) * | 2000-12-29 | 2002-07-04 | Lee Victor Wai Leung | Computer-implemented intelligent speech model partitioning method and system |
US20020087311A1 (en) * | 2000-12-29 | 2002-07-04 | Leung Lee Victor Wai | Computer-implemented dynamic language model generation method and system |
US7072838B1 (en) * | 2001-03-20 | 2006-07-04 | Nuance Communications, Inc. | Method and apparatus for improving human-machine dialogs using language models learned automatically from personalized data |
JP2003036093A (ja) | 2001-07-23 | 2003-02-07 | Japan Science & Technology Corp | 音声入力検索システム |
US7379867B2 (en) * | 2003-06-03 | 2008-05-27 | Microsoft Corporation | Discriminative training of language models for text and speech classification |
KR100612839B1 (ko) * | 2004-02-18 | 2006-08-18 | 삼성전자주식회사 | 도메인 기반 대화 음성인식방법 및 장치 |
-
2005
- 2005-05-26 JP JP2006514451A patent/JP3923513B2/ja active Active
- 2005-05-26 WO PCT/JP2005/009652 patent/WO2005122143A1/ja active Application Filing
- 2005-12-08 US US11/296,268 patent/US7310601B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002082690A (ja) * | 2000-09-05 | 2002-03-22 | Nippon Telegr & Teleph Corp <Ntt> | 言語モデル生成方法、音声認識方法及びそのプログラム記録媒体 |
JP2002268678A (ja) * | 2001-03-13 | 2002-09-20 | Mitsubishi Electric Corp | 言語モデル構成装置及び音声認識装置 |
JP2002297372A (ja) * | 2001-03-30 | 2002-10-11 | Seiko Epson Corp | ウエブページの音声検索方法、音声検索装置および音声検索プログラム |
JP2003255985A (ja) * | 2002-02-28 | 2003-09-10 | Toshiba Corp | 統計的言語モデル作成方法及び装置並びにプログラム |
JP2004053745A (ja) * | 2002-07-17 | 2004-02-19 | Nippon Telegr & Teleph Corp <Ntt> | 言語モデル生成方法、その装置及びそのプログラム |
JP2004333738A (ja) * | 2003-05-06 | 2004-11-25 | Nec Corp | 映像情報を用いた音声認識装置及び方法 |
Non-Patent Citations (1)
Title |
---|
KOBAYASHI A ET AL: "New Onsei Ninshiki no tame no Genjo Model no Doteki Tekioka.", THE ACOUSTICAL SOCIETY OF JAPAN (ASJ)., 15 March 2000 (2000-03-15), pages 69 - 70, XP002987547 * |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007083496A1 (ja) * | 2006-01-23 | 2007-07-26 | Nec Corporation | 音声認識用言語モデル作成用のシステム、方法およびプログラムならびに音声認識システム |
JP2007225952A (ja) * | 2006-02-24 | 2007-09-06 | Casio Comput Co Ltd | 画像処理装置および画像処理のプログラム |
JP5598331B2 (ja) * | 2008-11-28 | 2014-10-01 | 日本電気株式会社 | 言語モデル作成装置 |
WO2010061507A1 (ja) * | 2008-11-28 | 2010-06-03 | 日本電気株式会社 | 言語モデル作成装置 |
US9043209B2 (en) | 2008-11-28 | 2015-05-26 | Nec Corporation | Language model creation device |
WO2010100853A1 (ja) * | 2009-03-04 | 2010-09-10 | 日本電気株式会社 | 言語モデル適応装置、音声認識装置、言語モデル適応方法、及びコンピュータ読み取り可能な記録媒体 |
JP2011059830A (ja) * | 2009-09-07 | 2011-03-24 | Honda Motor Co Ltd | 言語学習装置、言語学習プログラム及び言語学習方法 |
US10157040B2 (en) | 2009-12-23 | 2018-12-18 | Google Llc | Multi-modal input on an electronic device |
US11416214B2 (en) | 2009-12-23 | 2022-08-16 | Google Llc | Multi-modal input on an electronic device |
US9251791B2 (en) | 2009-12-23 | 2016-02-02 | Google Inc. | Multi-modal input on an electronic device |
US9495127B2 (en) | 2009-12-23 | 2016-11-15 | Google Inc. | Language model selection for speech-to-text conversion |
US11914925B2 (en) | 2009-12-23 | 2024-02-27 | Google Llc | Multi-modal input on an electronic device |
US20120022866A1 (en) * | 2009-12-23 | 2012-01-26 | Ballinger Brandon M | Language Model Selection for Speech-to-Text Conversion |
US10713010B2 (en) | 2009-12-23 | 2020-07-14 | Google Llc | Multi-modal input on an electronic device |
JP2012008554A (ja) * | 2010-05-24 | 2012-01-12 | Denso Corp | 音声認識装置 |
JP2017058534A (ja) * | 2015-09-17 | 2017-03-23 | 日本電信電話株式会社 | 言語モデル作成装置、言語モデル作成方法、およびプログラム |
JP2020042313A (ja) * | 2016-01-06 | 2020-03-19 | グーグル エルエルシー | 音声認識システム |
US10643617B2 (en) | 2016-01-06 | 2020-05-05 | Google Llc | Voice recognition system |
JP2021182168A (ja) * | 2016-01-06 | 2021-11-25 | グーグル エルエルシーGoogle LLC | 音声認識システム |
US11410660B2 (en) | 2016-01-06 | 2022-08-09 | Google Llc | Voice recognition system |
US10311860B2 (en) | 2017-02-14 | 2019-06-04 | Google Llc | Language model biasing system |
US11037551B2 (en) | 2017-02-14 | 2021-06-15 | Google Llc | Language model biasing system |
US11682383B2 (en) | 2017-02-14 | 2023-06-20 | Google Llc | Language model biasing system |
US11996103B2 (en) | 2022-07-11 | 2024-05-28 | Google Llc | Voice recognition system |
Also Published As
Publication number | Publication date |
---|---|
US20060100876A1 (en) | 2006-05-11 |
JPWO2005122143A1 (ja) | 2008-04-10 |
US7310601B2 (en) | 2007-12-18 |
JP3923513B2 (ja) | 2007-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3923513B2 (ja) | 音声認識装置および音声認識方法 | |
US11978439B2 (en) | Generating topic-specific language models | |
US10410627B2 (en) | Automatic language model update | |
US9330661B2 (en) | Accuracy improvement of spoken queries transcription using co-occurrence information | |
JP4987203B2 (ja) | 分散型リアルタイム音声認識装置 | |
US9190052B2 (en) | Systems and methods for providing information discovery and retrieval | |
EP1330816B1 (en) | Language independent voice-based user interface | |
US7222073B2 (en) | System and method for speech activated navigation | |
US10394886B2 (en) | Electronic device, computer-implemented method and computer program | |
CN108538286A (zh) | 一种语音识别的方法以及计算机 | |
JP2009042968A (ja) | 情報選別システム、情報選別方法及び情報選別用プログラム | |
Guinaudeau et al. | Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news | |
Hirohata et al. | Sentence-extractive automatic speech summarization and evaluation techniques | |
JP6843689B2 (ja) | コンテキストに応じた対話シナリオを生成する装置、プログラム及び方法 | |
Menacer et al. | Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations | |
Adell Mercado et al. | Buceador, a multi-language search engine for digital libraries | |
Bahng et al. | CAC: Content-Aware Captioning for Professional Online Lectures in Korean Language | |
JP2007213554A (ja) | コンピュータにより実施される、確率論的クエリーに対して順位付けした結果セットをレンダリングする方法 | |
US7805291B1 (en) | Method of identifying topic of text using nouns | |
van der Werff | Evaluation of noisy transcripts for spoken document retrieval | |
Bordel et al. | An XML Resource Definition for Spoken Document Retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2006514451 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11296268 Country of ref document: US |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 11296268 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |