WO2010086927A1 - 音声認識装置 - Google Patents
音声認識装置 Download PDFInfo
- Publication number
- WO2010086927A1 WO2010086927A1 PCT/JP2009/005487 JP2009005487W WO2010086927A1 WO 2010086927 A1 WO2010086927 A1 WO 2010086927A1 JP 2009005487 W JP2009005487 W JP 2009005487W WO 2010086927 A1 WO2010086927 A1 WO 2010086927A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- speech
- recognition
- speech recognition
- recognition target
- Prior art date
Links
- 238000010586 diagram Methods 0.000 description 14
- 238000000034 method Methods 0.000 description 13
- 230000007717 exclusion Effects 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
Definitions
- This invention relates to a speech recognition apparatus.
- Patent Document 2 discloses a speech recognition device that performs recognition while predicting the content of the next utterance by gradually forming a storage device for speech recognition into a hierarchical structure and gradually narrowing the comprehensive range.
- a conventional speech recognition apparatus creates a speech recognition dictionary using text notation of a sentence to be recognized.
- the speech recognition dictionary size becomes large.
- the speech recognition dictionary size exceeds the available memory capacity, There was a problem that the device would not operate normally.
- Patent Document 1 determines whether or not to register a new vocabulary from an evaluation value relating to the ease of confusion between the new vocabulary and the already registered vocabulary. This case is not taken into consideration, and the case where the memory capacity is limited is not taken into consideration.
- Patent Document 2 a speech recognition dictionary is created in consideration of a decrease in recognition performance due to an increase in vocabulary, and the above problem is not taken into consideration when the memory capacity is limited as described above. Can not be solved.
- the present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition device capable of suppressing an increase in the dictionary size of a speech recognition dictionary.
- the speech recognition apparatus includes a sentence selection unit that selects, as a recognition target sentence, a sentence whose number of speech pieces is a predetermined value or less from among recognition target sentence candidates.
- the speech recognition device since a sentence whose number of speech pieces is a predetermined value or less is selected as a recognition target sentence from the recognition target sentence candidates, an increase in the size of the speech recognition dictionary configured from the recognition target sentence is suppressed. Even when the speech recognition device is realized by software embedded in a device such as a navigation system or a mobile phone, a dictionary size within the usable capacity of the memory can be realized.
- FIG. 10 is a diagram for explaining processing by a dictionary creation processing unit according to the second embodiment. It is a block diagram which shows the structure of the speech recognition apparatus by Embodiment 3 of this invention. 10 is a flowchart illustrating a flow of operations performed by the speech recognition apparatus according to the third embodiment.
- FIG. 1 is a block diagram showing the configuration of a speech recognition apparatus according to Embodiment 1 of the present invention.
- the speech recognition apparatus 1 according to Embodiment 1 includes a dictionary creation processing unit 2 and a speech recognition processing unit 3.
- the dictionary creation processing unit 2 is a means for creating a speech recognition dictionary composed only of sentences equal to or less than a predetermined number of speech pieces, a recognition target sentence candidate storage unit 21, a sentence selection unit 22, and a recognition target sentence storage unit. 23, a voice recognition dictionary creating unit 24 and a voice recognition dictionary storage unit 25 are provided.
- the recognition target sentence candidate stored in the storage unit 21 is composed of a set of a text candidate to be recognized by the speech recognition apparatus 1 and a speech piece indicating the pronunciation content of the text.
- the recognition target sentence means a sentence having a predetermined number or less of speech pieces.
- a speech segment is a short speech unit such as a phoneme or syllable.
- a speech recognition dictionary is created with only sentences equal to or less than a predetermined number of speech pieces to prevent an increase in dictionary size.
- the sentence selection unit 22 excludes recognition target sentence candidates that exceed the definition value of the number of phonemes set in advance for one recognition target sentence from recognition target sentence candidates based on phonemes that are speech pieces, This is means for selecting a recognition target sentence having a number of phonemes within the defined value.
- the speech recognition dictionary creation unit 24 is a unit that creates a speech recognition dictionary using the recognition target sentence read from the storage unit 23.
- the voice recognition processing unit 3 includes a voice recognition dictionary storage unit 25 and a voice recognition unit 26.
- the speech recognition unit 26 refers to the speech recognition dictionary stored in the storage unit 25, executes speech recognition processing on the recognition target speech uttered by the user (hereinafter referred to as recognition target speech), and obtains a recognition result. Means.
- the sentence selection unit 22, the speech recognition dictionary creation unit 24, and the speech recognition unit 26 read a dictionary creation / speech recognition program in accordance with the gist of the present invention into a computer and cause the CPU to execute the hardware and software. It can be realized on the computer as a specific means in which the wear cooperates.
- the recognition target sentence candidate storage unit 21, the recognition target sentence storage unit 23, and the speech recognition dictionary storage unit 25 are stored in a storage area of a storage device (for example, a hard disk device or an external storage medium) included in the computer. Can be built.
- FIG. 2 is a flowchart showing a flow of operations performed by the speech recognition apparatus according to the first embodiment, and the processing steps surrounded by a broken line with a symbol A in FIG. 2 indicate processing by the dictionary creation processing unit 2.
- the processing steps surrounded by a broken line with a symbol B indicate processing by the speech recognition processing unit 3.
- the sentence selection unit 22 reads and prepares recognition target sentence candidates for creating a speech recognition dictionary from the storage unit 21 (step ST1).
- the sentence selection unit 22 increments the count value of the counter by 1 (step ST3), and determines whether or not the sentence number N is within the number of recognition target sentence candidates read from the storage unit 21 (step ST4). ).
- the sentence selection unit 22 defines the number of speech pieces of recognition target sentence candidates corresponding to the current sentence number N. It is determined whether or not the value is within the range (step ST5). If the number of speech pieces of the recognition target sentence candidate corresponding to the sentence number N is within the defined value (step ST5; Yes), the sentence selection unit 22 selects the recognition target sentence candidate of the document number N as the recognition target sentence. Is stored in the storage unit 23 (step ST6).
- step ST5 when the number of speech pieces of the recognition target sentence candidate corresponding to the sentence number N does not fall within the defined value (step ST5; No), the sentence selection unit 22 selects the sentence with the sentence number N as the recognition target sentence candidate. (Step ST8).
- step ST6 or step ST8 When the processing of step ST6 or step ST8 is completed, the sentence selection unit 22 returns to step ST3 and increments the count value of the counter that counts the sentence number N by one, and the recognition target sentence candidate corresponding to the next sentence number is determined. Then, the processing from step ST4 to step ST8 is repeated.
- FIG. 3 is a diagram for explaining the recognition target sentence candidate excluding process, and illustrates a case where a sentence having a phoneme number of more than 20 is excluded from recognition target sentence candidates as a speech piece.
- the phonemes that make up the sentence to be recognized correspond to the acoustic model, greatly affecting the size of the speech recognition dictionary. Therefore, in the first embodiment, an increase in the dictionary size can be prevented by excluding candidate sentences in which the number of phonemes exceeds the defined value.
- the number of phonemes constituting a sentence is counted one by one to obtain the number of phonemes, but the number of phonemes may be calculated using the number of mora.
- the speech recognition dictionary creation unit 24 has so far A speech recognition dictionary is created from the recognition target sentences stored in the storage unit 23 (step ST7).
- the created voice recognition dictionary is stored in the storage unit 25.
- the speech recognition unit 26 of the speech recognition processing unit 3 inputs the recognition target speech uttered by the user (step ST9), and refers to the speech recognition dictionary stored in the storage unit 25 for the recognition target speech. And the recognition result is output (step ST10).
- the sentence selection unit 22 that selects a sentence whose number of speech pieces is equal to or less than a predetermined value from the recognition target sentence candidates is provided as the recognition target sentence, the speech recognition is performed. It is possible to suppress an increase in the dictionary size of the dictionary, and even if the speech recognition device is realized by software embedded in a device such as a navigation system or a mobile phone, the dictionary size within the usable capacity of the memory can be realized. is there. As a result, it is possible to provide a speech recognition apparatus suitable for building with embedded software.
- FIG. FIG. 4 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 2 of the present invention.
- the speech recognition apparatus 1A according to the second embodiment has basically the same configuration as that of FIG. 1 of the first embodiment, but the sentence selection unit 22 based on the number of speech pieces of each recognition target sentence candidate. Instead, it is different in that it includes a sentence selection unit 22a based on the total number of speech pieces of all recognition target sentence candidates.
- the sentence selection part 22a selects a recognition object sentence so that the sum total of the audio
- FIG. The configuration other than the speech recognition processing unit 3 and the sentence selection unit 22a of the dictionary creation processing unit 2 is the same as that in the first embodiment, and thus the description thereof is omitted.
- FIG. 5 is a flowchart showing the flow of operations performed by the speech recognition apparatus according to the second embodiment.
- the processing steps surrounded by a broken line with a symbol A in FIG. 5 indicate processing by the dictionary creation processing unit 2.
- the processing steps surrounded by a broken line with a symbol B indicate processing by the speech recognition processing unit 3.
- the sentence selection unit 22a increments the count value of the counter that counts the sentence number N by 1 (step ST3), and determines whether or not the sentence number N is within the number of recognition target sentence candidates read from the storage unit 21. Determine (step ST4). Here, if the sentence number N is within the number of recognition target sentence candidates (step ST4; Yes), the sentence selection unit 22a adds the sentence number N to the count value of the counter that counts the total number of speech pieces. The number of audio pieces is added (step ST4-1).
- the sentence selection unit 22a determines whether or not the total number of speech pieces indicated by the count value of the counter is within a defined value (step ST5a). If the total number of speech pieces is within the defined value (step ST5a; Yes), the sentence selection unit 22a stores the recognition target sentence candidate of the document number N as a recognition target sentence in the storage unit 23 (step ST6). .
- the speech recognition dictionary creation unit 24 creates a speech recognition dictionary from the recognition target sentences stored in the storage unit 23. And stored in the storage unit 25 (step ST7).
- the sentence selection unit 22a stops accumulating the recognition target sentence in the storage unit 23, and the speech recognition dictionary creation unit 24 has reached that point.
- a speech recognition dictionary is created from the recognition target sentences stored in the storage unit 23 and stored in the storage unit 25 (step ST8a).
- step ST6 When the sentence selection unit 22a selects a recognition target sentence in step ST6, the sentence selection unit 22a returns to step ST3 to increment the count value of the counter that counts the sentence number N by one, and for the recognition target sentence candidate corresponding to the next sentence number. The processes from step ST4 to step ST8a are repeated.
- step ST9 and step ST10 for referring to the voice recognition dictionary created as described above is the same as that in the first embodiment.
- FIG. 6 is a diagram for explaining processing by the dictionary creation processing unit according to the second embodiment.
- a recognition target sentence candidate having a total phoneme number of up to 100 can be selected as a recognition target sentence based on phonemes as speech segments. Shows the case.
- a plurality of sentences to be recognized are selected so that the total number of speech pieces of the plurality of sentences is equal to or less than a defined value.
- the speech recognition dictionary size increases. Therefore, if a speech recognition dictionary having a predetermined total number of speech pieces or less is created, an increase in the speech recognition dictionary size can be prevented.
- the sentence selection unit 22a based on the total speech segment, the phoneme is used as a speech segment, and the number of phonemes of recognition target sentence candidates is added, and the total value (total speech segment number) up to the middle exceeds a defined value.
- a sentence to be recognized is obtained by excluding sentences after that sentence.
- the sentence selection unit 22a when the total number of speech pieces of the recognition target sentence candidate exceeds the defined value in step ST5a, the sentence selection unit 22a should be excluded according to the result of the speech recognition process executed in step ST10.
- a recognition target sentence candidate may be selected. For example, a sentence with a large number of speech segments is returned to the recognition target sentence candidate, and is used for creating a speech recognition dictionary as a recognition target sentence in the next selection process.
- the speech recognition result referring to the created speech recognition dictionary is good, it is determined whether or not to be excluded by presenting the fact to the user.
- the sentence selection unit 22a may provide a GUI (Graphical User Interface) for selecting a sentence to be aborted, and the user may select the sentence to be aborted accordingly.
- GUI Graphic User Interface
- the second embodiment when the total number of speech pieces exceeds the defined value and when the sentence number N> the number of recognition target sentence candidates, it is accumulated in the storage unit 23 until that point.
- a speech recognition dictionary is generated from the recognized recognition target sentence.
- FIG. 7 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 3 of the present invention.
- the speech recognition apparatus 1B according to the third embodiment has basically the same configuration as that of FIG. 1 of the first embodiment, but the sentence selection unit 22 based on the number of speech segments of each recognition target sentence candidate. Instead, a sentence truncation unit 27 based on the number of speech pieces of the recognition target sentence candidate is provided, and a storage unit 28 that stores the censored recognition target sentence is provided instead of the recognition target sentence storage unit 23.
- the sentence truncation unit 27 is a means that, when the number of speech fragments of the recognition target sentence candidate exceeds the defined value, makes the sentence a text that is terminated immediately before the syllable including the speech segment that exceeds the defined value. Sentences and texts with a large number of speech segments will lead to an increase in the size of the speech recognition dictionary. In this third embodiment, even such texts can be recognized up to a portion where the number of speech segments does not exceed the defined value. To do.
- the storage unit 28 is a storage unit that stores a recognition target sentence that has been subjected to the abort process by the sentence abort unit 27.
- the storage unit 28 since it is the same as that of the said Embodiment 1 about structures other than the sentence truncation part 27 and the memory
- FIG. 8 is a flowchart showing the flow of operations performed by the speech recognition apparatus according to the third embodiment, and the processing steps surrounded by a broken line with a symbol A in FIG. 8 indicate processing by the dictionary creation processing unit 2.
- the processing steps surrounded by a broken line with a symbol B indicate processing by the speech recognition processing unit 3.
- the processing from step ST1 to step ST7 is the same as the contents shown in FIG.
- the sentence truncation unit 27 starts the syllable after the above defined value of the recognition target sentence candidate. Is stored in the storage unit 28 as a sentence to be recognized with the sentence number N (step ST8b).
- step ST6 or step ST8b When the process of step ST6 or step ST8b is completed, the sentence truncation unit 27 returns to step ST3 and increments the count value of the counter that counts the sentence number N by 1 to the recognition target sentence candidate corresponding to the next sentence number. Then, the processing from step ST4 to step ST8b is repeated.
- FIG. 9 is a diagram for explaining sentence censoring processing for a recognition target sentence candidate, and shows a case where a phoneme is used as a reference and a syllable having a phoneme number exceeding 20 is excluded from recognition target sentence candidates.
- the sentence truncation unit 27 excludes syllables and more after 20 phonemes.
- censoring may be performed in phoneme units instead of syllable units.
- phonemes exceeding 20 phonemes are subject to truncation
- “kanagawakeN kamakuras” (20 phonemes) is a recognition target sentence. That is, the text up to the phoneme “i” exceeding the 20th phoneme is cut off.
- the speech recognition dictionary creation unit 24 A speech recognition dictionary is generated from the recognition target sentences that have already been stored in the storage unit 28 (step ST7).
- the created voice recognition dictionary is stored in the storage unit 25.
- the speech recognition unit 26 of the speech recognition processing unit 3 inputs the recognition target speech uttered by the user (step ST9), and refers to the speech recognition dictionary stored in the storage unit 25 for the recognition target speech. And the recognition result is output (step ST10).
- the third embodiment when the number of speech pieces of recognition target sentence candidates exceeds a predetermined value, a speech piece exceeding the predetermined value and a subsequent speech piece, or the speech piece. Is included as a recognition target sentence, and a sentence whose speech segment number does not exceed the defined value is set as a recognition target sentence.
- An increase in the dictionary size of the speech recognition dictionary can be suppressed without being reduced as much as possible, and a speech recognition device suitable for construction with embedded software can be provided.
- FIG. 10 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 4 of the present invention.
- the speech recognition apparatus 1 ⁇ / b> C according to the fourth embodiment has basically the same configuration as FIG. 7 of the third embodiment, but instead of the speech recognition dictionary creation unit 24, A GM additional speech recognition dictionary creation unit 29 that creates a speech recognition dictionary using a recognition target sentence to which a garbage model (hereinafter abbreviated as GM as appropriate) is added, and a storage unit 30 that stores a garbage model. It is different.
- GM garbage model
- the GM additional speech recognition dictionary creation unit 29 is a means for creating a speech recognition dictionary by subsequently adding a garbage model to a censored recognition target sentence.
- the garbage model stored in the storage unit 30 is a model configured to detect a speech section, and can be recognized even if an unknown utterance including unnecessary words and noise follows.
- the recognition vocabulary can be recognized even if additional words are added before and after the utterance of the recognition vocabulary.
- the continued portion of the recognition target sentence that was interrupted in the middle of the sentence is recognized, and a decrease in score (likelihood) obtained as a recognition result can be prevented. Since the configuration other than the GM additional speech recognition dictionary creation unit 29 and the storage unit 30 is the same as that of the third embodiment, the description thereof is omitted.
- FIG. 11 is a flowchart showing a flow of operations performed by the speech recognition apparatus according to the fourth embodiment, and the processing steps surrounded by a broken line with a symbol A in FIG. 11 indicate processing by the dictionary creation processing unit 2.
- the processing steps surrounded by a broken line with a symbol B indicate processing by the speech recognition processing unit 3.
- the processing from step ST1 to step ST6 and step ST8b is the same as the contents shown in FIG.
- the GM additional speech recognition dictionary creation unit 29 stores the data in the storage unit 28 by that time.
- the garbage model read from the storage unit 30 is added to the end of the accumulated recognition target sentence that has been censored, and a speech recognition dictionary is created from the recognition target sentence to which the garbage model has been added (step ST7a).
- the created voice recognition dictionary is stored in the storage unit 25.
- step ST9 and step ST10 for referring to the voice recognition dictionary created as described above is the same as that in the first embodiment.
- a speech recognition dictionary is created by subsequently adding a garbage model to a censored sentence to be recognized, the number of recognized vocabularies is not reduced as much as possible and Since it can be recognized, an increase in the dictionary size of the speech recognition dictionary is suppressed, and a decrease in score obtained as a recognition result can be prevented. As a result, it is possible to provide a speech recognition apparatus suitable for building with embedded software.
- FIG. FIG. 12 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 5 of the present invention.
- the speech recognition apparatus 1D according to the fifth embodiment has basically the same configuration as that of FIG.
- the non-recognition candidate notification unit 31 is a unit that notifies the user of recognition target sentence candidates that are excluded from being recognized by the sentence selection unit 22. Since the configuration other than the non-recognition candidate notification unit 31 is the same as that of the first embodiment, description thereof is omitted.
- FIG. 13 is a flowchart showing a flow of operations performed by the speech recognition apparatus according to the fifth embodiment, and the processing steps surrounded by a broken line with a symbol A in FIG. 13 indicate processing by the dictionary creation processing unit 2.
- the processing steps surrounded by a broken line with a symbol B indicate processing in the speech recognition processing unit 3.
- the processing from step ST1 to step ST8 is the same as that shown in FIG.
- step ST8 when excluding recognition target sentence candidates whose number of speech exceeds the defined value, the sentence selection unit 22 notifies the recognition target non-recognition candidate notification unit 31 of the recognition target sentence candidates to be excluded.
- the speech recognition dictionary creation unit 24 creates a speech recognition dictionary
- the non-recognition candidate notification unit 31 notifies the user of the notified recognition target sentence candidate (step ST8-1). In this way, the user can recognize a sentence that is not a recognition target.
- the non-recognition candidate notification unit 31 notifies the user that the vocabulary is not the recognition target.
- a notification method as shown in FIG. 14, a method of notifying by a telop describing a headline of a vocabulary that is not a recognition target and that the vocabulary is not a recognition target is possible.
- step ST6 or step ST8-1 the sentence selection unit 22 returns to step ST3 and increments the count value of the counter that counts the sentence number N by one, and the recognition target sentence candidate corresponding to the next sentence number In contrast, the processing from step ST4 to step ST8-1 is repeated.
- step ST9 and step ST10 for referring to the voice recognition dictionary created as described above is the same as that in the first embodiment.
- the non-recognition candidate notifying unit 31 for notifying a candidate that has not been selected as a recognition target sentence or a candidate for a censoring process is provided. It is possible to suppress an increase in the dictionary size of the speech recognition dictionary while allowing the user to understand the vocabulary to be used in advance, and to provide a speech recognition device that is convenient and suitable for building with embedded software. it can.
- the non-recognition candidate notification unit 31 can be added to the configurations of the second to fourth embodiments.
- the non-recognition candidate notification unit 31 notifies the user of the recognition target sentence candidate that has been excluded by the sentence selection unit 22a, or notifies the user of the recognition target sentence candidate that has been the target of termination by the sentence truncation unit 27.
- This configuration also allows the user to understand in advance that text that is not to be recognized or whose recognition vocabulary is terminated in the middle of the creation of the speech recognition dictionary. Thereby, a user's convenience can be improved.
- the speech recognition apparatus can suppress an increase in the size of the speech recognition dictionary composed of the recognition target sentence, and realizes the speech recognition apparatus with embedded software for a device such as a navigation system or a mobile phone.
- the dictionary size within the usable capacity of the memory can be realized, it is suitable for use in a speech recognition apparatus that requires a large amount of memory.
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
実施の形態1.
図1は、この発明の実施の形態1による音声認識装置の構成を示すブロック図である。図1において、実施の形態1による音声認識装置1は、辞書作成処理部2及び音声認識処理部3を備える。辞書作成処理部2は、予め定めた音声片数以下の文だけで構成した音声認識辞書を作成する手段であり、認識対象文候補の記憶部21、文選択部22、認識対象文の記憶部23、音声認識辞書作成部24及び音声認識辞書の記憶部25を備える。
図2は、実施の形態1の音声認識装置による動作の流れを示すフローチャートであり、図2中に記号Aを付した破線で囲まれる処理ステップは、辞書作成処理部2による処理を示しており、記号Bを付した破線で囲まれる処理ステップは、音声認識処理部3による処理を示している。
図3は、認識対象文候補の除外処理を説明するための図であり、音声片として音素を基準とし、音素数が20を超える文を認識対象文候補から除外する場合を示している。図3に示すように、文番号N=1である「かながわけん かまくらし」という認識対象文候補は、音素列が「kanagawakeN kamakurasi」(21音素)となり、20音素を超えているので除外する。同様に、文番号N=6,8の文の音素数が20を超えるために除外され、文番号N=2~5,7の文が認識対象文として記憶部23に格納される。
図4は、この発明の実施の形態2による音声認識装置の構成を示すブロック図である。図4において、実施の形態2による音声認識装置1Aは、上記実施の形態1の図1と基本的に同様な構成を有するが、個々の認識対象文候補の音声片数による文選択部22の代わりに、全ての認識対象文候補の総音声片数による文選択部22aを備える点で異なる。
図5は、実施の形態2の音声認識装置による動作の流れを示すフローチャートであり、図5中に記号Aを付した破線で囲まれる処理ステップは、辞書作成処理部2による処理を示しており、記号Bを付した破線で囲まれる処理ステップは、音声認識処理部3による処理を示している。
例えば、音声片数が多い文を再び認識対象文候補に戻し、次回の選択処理で認識対象文として音声認識辞書の作成に利用する。これにより作成された音声認識辞書を参照する音声認識結果が良好である場合、その旨をユーザに提示するようにして除外すべきか否かを判断させる。
図7は、この発明の実施の形態3による音声認識装置の構成を示すブロック図である。図7において、実施の形態3による音声認識装置1Bは、上記実施の形態1の図1と基本的に同様な構成を有するが、個々の認識対象文候補の音声片数による文選択部22の代わりに、認識対象文候補の音声片数による文打ち切り部27を備え、認識対象文の記憶部23の代わりに、打ち切り済の認識対象文を記憶する記憶部28を備える点で異なる。
図8は、実施の形態3の音声認識装置による動作の流れを示すフローチャートであり、図8中に記号Aを付した破線で囲まれる処理ステップは、辞書作成処理部2による処理を示しており、記号Bを付した破線で囲まれる処理ステップは、音声認識処理部3による処理を示している。図8において、ステップST1からステップST7までの処理は、上記実施の形態1の図2で示した内容と同様であるので説明を省略する。
図9は、認識対象文候補の文打ち切り処理を説明するための図であり、音声片として音素を基準とし、音素数が20を超える音節以降を認識対象文候補から除外する場合を示している。図9の上段に示す例では、文番号N=1,6,8の各認識対象文候補が打ち切り対象となる。この場合、図9の下段に示すように、文打ち切り部27が、20音素を超える音節以降を除外する。
図10は、この発明の実施の形態4による音声認識装置の構成を示すブロック図である。図10において、実施の形態4による音声認識装置1Cは、上記実施の形態3の図7と基本的に同様な構成を有するが、音声認識辞書作成部24の代わりに、打ち切り済みの文に対してガーベジモデル(以下、GMと適宜略す)を後続追加した認識対象文を用いて音声認識辞書を作成するGM付加音声認識辞書作成部29を備え、さらにガーベジモデルを格納する記憶部30を備える点で異なる。
図11は、実施の形態4の音声認識装置による動作の流れを示すフローチャートであり、図11中に記号Aを付した破線で囲まれる処理ステップは、辞書作成処理部2による処理を示しており、記号Bを付した破線で囲まれる処理ステップは、音声認識処理部3による処理を示している。図11において、ステップST1からステップST6まで、及びステップST8bの処理は、上記実施の形態3の図8で示した内容と同様であるので説明を省略する。
図12は、この発明の実施の形態5による音声認識装置の構成を示すブロック図である。図12において、実施の形態5による音声認識装置1Dは、上記実施の形態1の図1と基本的に同様な構成を有するが、認識対象外候補通知部31を備える点で異なる。認識対象外候補通知部31は、文選択部22により認識対象外として除外される認識対象文候補をユーザに通知する手段である。なお、認識対象外候補通知部31以外の構成は、上記実施の形態1と同様であるので説明を省略する。
図13は、実施の形態5の音声認識装置による動作の流れを示すフローチャートであり、図13中に記号Aを付した破線で囲まれる処理ステップは、辞書作成処理部2による処理を示しており、記号Bを付した破線で囲まれる処理ステップは、音声認識処理部3での処理を示している。図13において、ステップST1からステップST8までの処理は、上記実施の形態1の図2で示した内容と同様であるので説明を省略する。
Claims (7)
- 認識対象文から音声認識辞書を作成する音声認識辞書作成部と、前記音声認識辞書を参照して認識対象の音声を音声認識する音声認識部とを備えた音声認識装置において、
前記認識対象文の候補の中から音声片数が所定値以下である文を前記認識対象文として選択する文選択部を備えたことを特徴とする音声認識装置。 - 認識対象文から音声認識辞書を作成する音声認識辞書作成部と、前記音声認識辞書を参照して認識対象の音声を音声認識する音声認識部とを備えた音声認識装置において、
前記認識対象文の候補の音声片数を合計した総音声片数が所定値以下となるように文を選択して前記認識対象文とする文選択部を備えたことを特徴とする音声認識装置。 - 認識対象文から音声認識辞書を作成する音声認識辞書作成部と、前記音声認識辞書を参照して認識対象の音声を音声認識する音声認識部とを備えた音声認識装置において、
前記認識対象文の候補の音声片数が所定値を超える場合、当該所定値を超える音声片及びそれ以降の音声片、若しくは、当該音声片を含む音節及びそれ以降の音節を打ち切って除外した文を前記認識対象文とする文打ち切り部を備えたことを特徴とする音声認識装置。 - 音声認識辞書作成部は、文打ち切り部により打ち切り処理が施された認識対象文の末尾にガーベジモデルを付与し、当該ガーベジモデルを付与した前記認識対象文から音声認識辞書を作成することを特徴とする請求項3記載の音声認識装置。
- 認識対象文として選択されなかった候補又は打ち切り処理の対象となった候補を通知する通知部を備えたことを特徴とする請求項1記載の音声認識装置。
- 認識対象文として選択されなかった候補又は打ち切り処理の対象となった候補を通知する通知部を備えたことを特徴とする請求項2記載の音声認識装置。
- 認識対象文として選択されなかった候補又は打ち切り処理の対象となった候補を通知する通知部を備えたことを特徴とする請求項3記載の音声認識装置。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009801503310A CN102246226B (zh) | 2009-01-30 | 2009-10-20 | 声音识别装置 |
US13/123,552 US8200478B2 (en) | 2009-01-30 | 2009-10-20 | Voice recognition device which recognizes contents of speech |
DE112009003930.8T DE112009003930B4 (de) | 2009-01-30 | 2009-10-20 | Spracherkennungsvorrichtung |
JP2010546175A JP4772164B2 (ja) | 2009-01-30 | 2009-10-20 | 音声認識装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009019702 | 2009-01-30 | ||
JP2009-019702 | 2009-01-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010086927A1 true WO2010086927A1 (ja) | 2010-08-05 |
Family
ID=42395197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2009/005487 WO2010086927A1 (ja) | 2009-01-30 | 2009-10-20 | 音声認識装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US8200478B2 (ja) |
JP (1) | JP4772164B2 (ja) |
CN (1) | CN102246226B (ja) |
DE (1) | DE112009003930B4 (ja) |
WO (1) | WO2010086927A1 (ja) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE112010005226T5 (de) | 2010-02-05 | 2012-11-08 | Mitsubishi Electric Corporation | Erkennungswörterbuch-Erzeugungsvorrichtung und Spracherkennungsvorrichtung |
CN102770910B (zh) * | 2010-03-30 | 2015-10-21 | 三菱电机株式会社 | 声音识别装置 |
KR102245747B1 (ko) | 2014-11-20 | 2021-04-28 | 삼성전자주식회사 | 사용자 명령어 등록을 위한 디스플레이 장치 및 방법 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003337595A (ja) * | 2002-05-22 | 2003-11-28 | Takeaki Kamiyama | 音声認識装置及び辞書生成装置及び音声認識システム及び音声認識方法及び辞書生成方法及び音声認識プログラム及び辞書生成プログラム及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体及び辞書生成プログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2004252167A (ja) * | 2003-02-20 | 2004-09-09 | Nippon Telegr & Teleph Corp <Ntt> | 音素モデル学習用文リスト生成方法、生成装置、および生成プログラム |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4718094A (en) * | 1984-11-19 | 1988-01-05 | International Business Machines Corp. | Speech recognition system |
US5033087A (en) * | 1989-03-14 | 1991-07-16 | International Business Machines Corp. | Method and apparatus for the automatic determination of phonological rules as for a continuous speech recognition system |
DE19501599C1 (de) * | 1995-01-20 | 1996-05-02 | Daimler Benz Ag | Verfahren zur Spracherkennung |
DE19508137A1 (de) * | 1995-03-08 | 1996-09-12 | Zuehlke Werner Prof Dr Ing Hab | Verfahren zur schrittweisen Klassifikation arhythmisch segmentierter Worte |
JP3790038B2 (ja) | 1998-03-31 | 2006-06-28 | 株式会社東芝 | サブワード型不特定話者音声認識装置 |
JP3700533B2 (ja) | 2000-04-19 | 2005-09-28 | 株式会社デンソー | 音声認識装置及び処理システム |
GB2370401A (en) * | 2000-12-19 | 2002-06-26 | Nokia Mobile Phones Ltd | Speech recognition |
JP2002207181A (ja) | 2001-01-09 | 2002-07-26 | Minolta Co Ltd | 光スイッチ |
JP2002297181A (ja) | 2001-03-30 | 2002-10-11 | Kddi Corp | 音声認識語彙登録判定方法及び音声認識装置 |
JP4727852B2 (ja) | 2001-06-29 | 2011-07-20 | クラリオン株式会社 | ナビゲーション装置及び方法並びにナビゲーション用ソフトウェア |
CN1628338A (zh) * | 2002-04-29 | 2005-06-15 | 阿德诺塔有限公司 | 处理语音信息的方法和装置 |
JP2004325704A (ja) | 2003-04-24 | 2004-11-18 | Nissan Motor Co Ltd | 音声認識装置 |
JP2005010691A (ja) * | 2003-06-20 | 2005-01-13 | P To Pa:Kk | 音声認識装置、音声認識方法、会話制御装置、会話制御方法及びこれらのためのプログラム |
JP2006178013A (ja) | 2004-12-20 | 2006-07-06 | Canon Inc | データベース作成装置及び方法 |
JP5233989B2 (ja) * | 2007-03-14 | 2013-07-10 | 日本電気株式会社 | 音声認識システム、音声認識方法、および音声認識処理プログラム |
JP5046902B2 (ja) | 2007-12-13 | 2012-10-10 | 三菱電機株式会社 | 音声検索装置 |
US8160866B2 (en) * | 2008-04-18 | 2012-04-17 | Tze Fen Li | Speech recognition method for both english and chinese |
JP2010097239A (ja) * | 2008-10-14 | 2010-04-30 | Nec Corp | 辞書作成装置、辞書作成方法、および辞書作成プログラム |
WO2010050414A1 (ja) * | 2008-10-31 | 2010-05-06 | 日本電気株式会社 | モデル適応装置、その方法及びそのプログラム |
US8155961B2 (en) * | 2008-12-09 | 2012-04-10 | Nokia Corporation | Adaptation of automatic speech recognition acoustic models |
-
2009
- 2009-10-20 JP JP2010546175A patent/JP4772164B2/ja not_active Expired - Fee Related
- 2009-10-20 DE DE112009003930.8T patent/DE112009003930B4/de not_active Expired - Fee Related
- 2009-10-20 CN CN2009801503310A patent/CN102246226B/zh not_active Expired - Fee Related
- 2009-10-20 WO PCT/JP2009/005487 patent/WO2010086927A1/ja active Application Filing
- 2009-10-20 US US13/123,552 patent/US8200478B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003337595A (ja) * | 2002-05-22 | 2003-11-28 | Takeaki Kamiyama | 音声認識装置及び辞書生成装置及び音声認識システム及び音声認識方法及び辞書生成方法及び音声認識プログラム及び辞書生成プログラム及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体及び辞書生成プログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2004252167A (ja) * | 2003-02-20 | 2004-09-09 | Nippon Telegr & Teleph Corp <Ntt> | 音素モデル学習用文リスト生成方法、生成装置、および生成プログラム |
Non-Patent Citations (1)
Title |
---|
YOSHIAKI ITO: "Partial Sentence Recognition by Sentence Spotting", IEICE TECHNICAL REPORT, vol. 93, no. 88, 18 June 1993 (1993-06-18), pages 65 - 72 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2010086927A1 (ja) | 2012-07-26 |
CN102246226A (zh) | 2011-11-16 |
JP4772164B2 (ja) | 2011-09-14 |
DE112009003930T5 (de) | 2012-09-27 |
DE112009003930B4 (de) | 2016-12-22 |
US20110196672A1 (en) | 2011-08-11 |
CN102246226B (zh) | 2013-11-13 |
US8200478B2 (en) | 2012-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7234415B2 (ja) | 音声認識のためのコンテキストバイアス | |
US11398236B2 (en) | Intent-specific automatic speech recognition result generation | |
JP6435312B2 (ja) | 並列認識タスクを用いた音声認識 | |
US9697827B1 (en) | Error reduction in speech processing | |
US10037758B2 (en) | Device and method for understanding user intent | |
CN108052498B (zh) | 纠正转录的文字中的字词的方法和系统 | |
US9292487B1 (en) | Discriminative language model pruning | |
JP5066483B2 (ja) | 言語理解装置 | |
US11024298B2 (en) | Methods and apparatus for speech recognition using a garbage model | |
CN1320902A (zh) | 语音识别装置、语音识别方法和记录介质 | |
CN106875936B (zh) | 语音识别方法及装置 | |
JP2011033680A (ja) | 音声処理装置及び方法、並びにプログラム | |
JPWO2011121649A1 (ja) | 音声認識装置 | |
JP4634156B2 (ja) | 音声対話方法および音声対話装置 | |
JP5183120B2 (ja) | 平方根ディスカウンティングを使用した統計的言語による音声認識 | |
JP2010078877A (ja) | 音声認識装置、音声認識方法及び音声認識プログラム | |
JP4772164B2 (ja) | 音声認識装置 | |
JP5480844B2 (ja) | 単語追加装置、単語追加方法及びそのプログラム | |
JP6274015B2 (ja) | 音響モデル調整装置及びプログラム | |
Pelemans et al. | A layered approach for dutch large vocabulary continuous speech recognition | |
CN114974249A (zh) | 一种语音识别方法、装置及存储介质 | |
JP2009098201A (ja) | 音声認識装置及び音声認識方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980150331.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09839119 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010546175 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13123552 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120090039308 Country of ref document: DE Ref document number: 112009003930 Country of ref document: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09839119 Country of ref document: EP Kind code of ref document: A1 |