JP6805431B2

JP6805431B2 - Voice recognition device

Info

Publication number: JP6805431B2
Application number: JP2017079219A
Authority: JP
Inventors: 謙太郎中村; 貴章伊藤
Original assignee: Computer Engineering and Consulting Ltd
Current assignee: Computer Engineering and Consulting Ltd
Priority date: 2017-04-12
Filing date: 2017-04-12
Publication date: 2020-12-23
Anticipated expiration: 2037-04-12
Also published as: JP2018180260A

Description

本発明は、音声認識装置に関する。 The present invention relates to a voice recognition device.

発話者の発話音声を取得し、取得した音声の発話内容に基づいて予め登録された音声認識データベース（音声認識辞書）を参照して、音声認識を行う音声認識装置が知られている。 There is known a voice recognition device that acquires a speaker's uttered voice and refers to a voice recognition database (speech recognition dictionary) registered in advance based on the utterance content of the acquired voice to perform voice recognition.

例えば、施設名全体の読みの第１の認識語と、施設名の先頭の音節を母音の音節に置き換えた第２の認識語を認識辞書内に準備し、施設名の先頭の子音を取りこぼした場合、第２の認識語との相関により音声認識を行う技術が知られている（例えば、特許文献１参照）。 For example, the first recognition word in the reading of the entire facility name and the second recognition word in which the first syllable of the facility name is replaced with a vowel syllable are prepared in the recognition dictionary, and the first consonant of the facility name is omitted. In this case, a technique for performing speech recognition by correlating with a second recognition word is known (see, for example, Patent Document 1).

特開２００１−８３９８３号公報Japanese Unexamined Patent Publication No. 2001-83983

特許文献１に開示された音声認識装置では、施設名の第２の認識語が、予め認識辞書内に登録されていない場合、第２の認識語を利用することができないため、認識率を上げることは困難である。 In the voice recognition device disclosed in Patent Document 1, if the second recognition word of the facility name is not registered in the recognition dictionary in advance, the second recognition word cannot be used, so that the recognition rate is increased. That is difficult.

本発明の実施の形態は、上記の問題点に鑑みてなされたものであって、取得した音声の発話内容に基づいて、音声認識データベースを参照して音声認識を行う音声認識装置において、音声認識データベースに予め登録されていない発話内容の認識率を向上させる。 An embodiment of the present invention has been made in view of the above problems, and is used in a voice recognition device that performs voice recognition by referring to a voice recognition database based on the utterance content of the acquired voice. Improve the recognition rate of speech content that is not registered in the database in advance.

上記の課題を解決するため、本発明の一実施形態に係る音声認識装置は、発話者の音声を取得し、取得した音声の発話内容に基づいて音声認識データベースを参照して、前記発話内容に対応する目的語を決定する音声認識を行う音声認識装置であって、前記音声認識に失敗し、かつ前記音声認識とは別の方法で前記目的語が設定された場合、前記音声認識に失敗した前記発話内容、及び前記設定された目的語を母音に変換する変換部と、前記音声認識に失敗した前記発話内容の母音と、前記設定された目的語の母音との一致率を判定する判定部と、前記判定部が判定した一致率が閾値以上である場合、前記音声認識に失敗した前記発話内容と、前記設定された目的語とを対応付けて前記音声認識データベースに登録する登録部と、を有する。 In order to solve the above problem, the voice recognition device according to the embodiment of the present invention acquires the voice of the speaker, refers to the voice recognition database based on the utterance content of the acquired voice, and obtains the utterance content. A voice recognition device that performs voice recognition to determine a corresponding object, and when the voice recognition fails and the target word is set by a method different from the voice recognition, the voice recognition fails. A determination unit that determines the matching rate between the utterance content and the conversion unit that converts the set target word into a vowel, the vowel of the utterance content that failed in voice recognition, and the vowel of the set target word. When the match rate determined by the determination unit is equal to or greater than the threshold value, the registration unit that registers the utterance content that failed in the voice recognition and the set target word in the voice recognition database. Have.

本発明の実施形態では、音声認識装置が音声認識に失敗した場合でも、母音の認識は正しい傾向があることに着目し、音声認識に失敗した発話内容と、設定された目的語の母音の一致率が閾値以上である場合、両者を対応付けて音声認識データベースに登録する。 In the embodiment of the present invention, attention is paid to the fact that the recognition of vowels tends to be correct even when the voice recognition device fails in voice recognition, and the utterance content in which the voice recognition fails and the vowels of the set object match. If the rate is equal to or higher than the threshold value, both are associated and registered in the speech recognition database.

これにより、音声認識に失敗した発話内容に対応する目的語が、音声認識データベースに自動的に登録されるので、音声認識データベースに予め登録されていない発話内容の認識率を向上させることができるようになる。 As a result, the object corresponding to the utterance content that failed in voice recognition is automatically registered in the voice recognition database, so that the recognition rate of the utterance content that is not registered in advance in the voice recognition database can be improved. become.

本発明の実施の形態によれば、取得した音声の発話内容に基づいて、音声認識データベースを参照して音声認識を行う音声認識装置において、予め音声認識データベースに登録されていない発話内容の認識率を向上させることができる。 According to the embodiment of the present invention, in the voice recognition device that performs voice recognition by referring to the voice recognition database based on the acquired voice utterance content, the recognition rate of the utterance content that is not registered in the voice recognition database in advance. Can be improved.

一実施形態に係る音声認識装置の構成と処理の一例を示す図（１）である。It is a figure (1) which shows an example of the structure and processing of the voice recognition apparatus which concerns on one Embodiment. 一実施形態に係る母音への変換、及び認識データベースへの登録について説明するための図である。It is a figure for demonstrating the conversion into a vowel and the registration in a recognition database which concerns on one Embodiment. 一実施形態に係る音声認識装置の構成と処理の一例を示す図（２）である。It is a figure (2) which shows an example of the structure and processing of the voice recognition apparatus which concerns on one Embodiment. 一実施形態に係る音声認識装置の処理の流れを示すフローチャートである。It is a flowchart which shows the process flow of the voice recognition apparatus which concerns on one Embodiment.

以下、図面を参照して発明を実施するための形態について説明する。 Hereinafter, modes for carrying out the invention will be described with reference to the drawings.

＜音声認識装置の構成＞
図１は、一実施形態に係る音声認識装置の構成と処理の一例を示す図（１）である。音声認識装置１００は、発話者の音声を取得し、取得した音声の発話内容に基づいて音声認識データベース（以下、認識ＤＢと呼ぶ）１４０を参照して、発話内容に対応する目的語（例えば、目的地等）を決定する音声認識を行う情報処理装置である。 <Configuration of voice recognition device>
FIG. 1 is a diagram (1) showing an example of the configuration and processing of the voice recognition device according to the embodiment. The voice recognition device 100 acquires the voice of the speaker, refers to the voice recognition database (hereinafter referred to as recognition DB) 140 based on the utterance content of the acquired voice, and refers to the object word (for example, for example) corresponding to the utterance content. It is an information processing device that performs voice recognition to determine the destination, etc.).

音声認識装置１００は、一般的なコンピュータのハードウェア構成を有しており、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ストレージ装置、表示装置、及び入力装置等を有する。 The voice recognition device 100 has a general computer hardware configuration, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), a storage device, a display device, and a display device. It has an input device and the like.

また、音声認識装置１００は、ＣＰＵで所定のプログラムを実行することにより、図１に示す音声認識部１１０、目的語設定部１２０、登録処理部１３０、及び認識ＤＢ１４０等を実現している。 Further, the voice recognition device 100 realizes the voice recognition unit 110, the object setting unit 120, the registration processing unit 130, the recognition DB 140, and the like shown in FIG. 1 by executing a predetermined program on the CPU.

音声認識部１１０は、音声認識装置１００の外部又は内部に設けられたマイク等を用いて発話者の音声を取得し、取得した音声の発話内容（例えば、音声データ）で認識ＤＢ１４０を検索して、発話内容に対応する目的語を決定する音声認識を行う。音声認識部１１０は、例えば、音声認識装置１００のＣＰＵで実行されるプログラムによって実現される。或いは、音声認識部１１０は、専用のモジュールやマイコン（マイクロコンピュータ）等によって実現されるものであっても良い。 The voice recognition unit 110 acquires the voice of the speaker using a microphone or the like provided outside or inside the voice recognition device 100, and searches the recognition DB 140 for the utterance content (for example, voice data) of the acquired voice. , Performs voice recognition to determine the target word corresponding to the utterance content. The voice recognition unit 110 is realized by, for example, a program executed by the CPU of the voice recognition device 100. Alternatively, the voice recognition unit 110 may be realized by a dedicated module, a microcomputer (microcomputer), or the like.

音声認識部１１０によって決定される目的語は、例えば、ナビゲーション装置等に設定する「目的地」等の情報である。また、目的語は、目的地に限られず、例えば、ナビゲーション装置等の情報処理装置に対する操作の指示等の情報であっても良い。ここでは、目的語が、ナビゲーション装置に設定する目的地であるものとして、以下の説明を行う。 The object determined by the voice recognition unit 110 is, for example, information such as a "destination" set in a navigation device or the like. Further, the object is not limited to the destination, and may be, for example, information such as an operation instruction to an information processing device such as a navigation device. Here, the following description will be given assuming that the object is the destination set in the navigation device.

音声認識部１１０は、取得した音声の発話内容で認識ＤＢ１４０を検索し、発話内容に対応する目的語が検索された場合（音声認識に成功した場合）、検索された目的語を、例えば、ナビゲーション装置の目的地として設定（決定）する。一方、音声認識部１１０は、発話内容に対応する目的語が検索されなかった場合（音声認識に失敗した場合）、音声認識に失敗した発話内容を、音声認識装置１００のＲＡＭやストレージ装置等の記憶部に記憶する。 The voice recognition unit 110 searches the recognition DB 140 based on the utterance content of the acquired voice, and when the object corresponding to the utterance content is searched (when the voice recognition is successful), the searched object is navigated, for example. Set (determine) as the destination of the device. On the other hand, when the target word corresponding to the utterance content is not searched (when the voice recognition fails), the voice recognition unit 110 transmits the utterance content for which the voice recognition fails to the RAM, the storage device, or the like of the voice recognition device 100. Store in the storage section.

目的語設定部１２０は、例えば、音声認識装置１００のＣＰＵで実行されるプログラムによって実現され、音声認識部１１０が音声認識に失敗したときに、失敗した音声認識とは別の方法で目的語の設定を行うための手段である。 The object setting unit 120 is realized by, for example, a program executed by the CPU of the voice recognition device 100, and when the voice recognition unit 110 fails in voice recognition, the object is set by a method different from the failed voice recognition. It is a means for making settings.

なお、目的語設定部１２０による、目的語の設定を行う別の方法は、任意の方法であって良い。 The other method for setting the object by the object setting unit 120 may be any method.

例えば、目的語設定部１２０は、音声認識部１１０を用いて、音声認識のリトライにより、目的語を設定するものであって良い。この場合、発話者は、例えば、声の大きさ、アクセント、発話速度等を代えて、発話を繰り返すことにより、目的語を設定する。 For example, the object setting unit 120 may use the voice recognition unit 110 to set the object by retrying voice recognition. In this case, the speaker sets the object by repeating the utterance, for example, by changing the loudness, accent, utterance speed, and the like of the voice.

また、別の一例として、発話者は、音声認識に失敗した発話内容（例えば「モレロ皮膚」）の一部（例えば「モレロ」）を発話し、表示装置に表示された「モレロ」に対応する１つ以上の候補の中から、目的語（例えば「モレロ岐阜」）を選択し目的語を設定するもの等であっても良い。 Further, as another example, the speaker utters a part (for example, "Morero") of the utterance content (for example, "Morero skin") that fails in voice recognition, and corresponds to the "Morero" displayed on the display device. From one or more candidates, an object (for example, "Morero Gifu") may be selected and the object may be set.

さらに、別の一例として、発話者は、音声認識装置１００の表示装置に表示されたソフトウェアキーボードや、リモコン等を用いて、目的語を示す文字列を音声認識装置１００に入力し目的語を設定するもの等であっても良い。 Further, as another example, the speaker sets the object by inputting the character string indicating the object into the voice recognition device 100 by using the software keyboard displayed on the display device of the voice recognition device 100, the remote controller, or the like. It may be something that does.

目的語設定部１２０は、設定された目的語を、例えば、ナビゲーション装置の目的地として設定（決定）すると共に、設定された目的語を、音声認識装置１００のＲＡＭやストレージ装置等の記憶部に記憶する。 The object setting unit 120 sets (determines) the set object as, for example, the destination of the navigation device, and sets the set object in a storage unit such as a RAM or a storage device of the voice recognition device 100. Remember.

登録処理部１３０は、音声認識部１１０が音声認識に失敗し、かつ目的語設定部１２０により目的語が設定された場合、音声認識に失敗した発話内容と、設定された目的語とを対応付けて認識ＤＢ１４０に登録する登録処理を実行する。登録処理部１３０は、例えば、音声認識装置１００のＣＰＵで実行されるプログラムによって実現され、図１に示すように、変換部１３１、判定部１３２、及び登録部１３３等を含む。 When the voice recognition unit 110 fails in voice recognition and the object is set by the object setting unit 120, the registration processing unit 130 associates the utterance content in which the voice recognition fails with the set object. The registration process for registering in the recognition DB 140 is executed. The registration processing unit 130 is realized by, for example, a program executed by the CPU of the voice recognition device 100, and includes a conversion unit 131, a determination unit 132, a registration unit 133, and the like, as shown in FIG.

変換部１３１は、音声認識部１１０が記憶部に記憶した「音声認識に失敗した発話内容」、及び目的語設定部１２０が記憶部に記憶した「設定された目的語」を、それぞれ、母音に変換する。 The conversion unit 131 converts the "speech content that failed in voice recognition" stored in the storage unit by the voice recognition unit 110 and the "set object" stored in the storage unit by the object setting unit 120 into vowels, respectively. Convert.

例えば、音声認識に失敗した発話内容が、「モレロ皮膚」である場合、変換部１３１は、例えば、取得した音声の発話内容を解析し、図２（ａ）に示すように、「モレロ皮膚」のカナ「モレロヒフ」を抽出する。例えば、変換部１３１は、発話内容「モレロ皮膚」を音声認識し、文字変換することにより、カナ「モレロヒフ」を抽出する。 For example, when the utterance content for which voice recognition has failed is "morero skin", the conversion unit 131 analyzes, for example, the utterance content of the acquired voice, and as shown in FIG. 2A, "morero skin". Extract the kana "Morerohifu". For example, the conversion unit 131 extracts the kana "Morerohifu" by voice-recognizing the utterance content "Morero skin" and converting the characters.

さらに、変換部１３１は、抽出したカナ「モレロヒフ」を、母音「オエオイウ」に変換する。 Further, the conversion unit 131 converts the extracted kana "Morerohifu" into the vowel "Oeoiu".

同様に、設定された目的語が、「モレロ岐阜」である場合、変換部１３１は、図２（ｂ）に示すように、「モレロ岐阜」のカナ「モレロギフ」を、母音「オエオイウ」に変換する。 Similarly, when the set object is "Morero Gifu", the conversion unit 131 converts the kana "Morerogif" of "Morero Gifu" into the vowel "Oeoiu" as shown in FIG. 2 (b). To do.

なお、カナを母音に変換する方法は任意の方法であって良いが、例えば、全てのカナと、各カナに対応する母音とを記憶部に予め記憶しておくことにより、カナから母音に変換することができる。 The method of converting katakana to vowels may be any method, but for example, all katakana and vowels corresponding to each katakana are stored in the storage unit in advance to convert katakana to vowels. can do.

なお、撥音である「ん」は、直前に母音を伴う子音であり、母音に変換することができないので、例えば、母音に変換せず、そのまま「ん」として扱われる。（例えば、撥音「ん」は、母音と同様に扱われる。）
判定部１３２は、変換部１３１によって変換された、音声認識に失敗した発話内容の母音と、設定された目的語の母音との一致率を判定する。 The sound-repellent "n" is a consonant accompanied by a vowel immediately before, and cannot be converted into a vowel. Therefore, for example, it is treated as "n" without being converted into a vowel. (For example, the nasal "n" is treated in the same way as a vowel.)
The determination unit 132 determines the matching rate between the vowel of the utterance content that failed in voice recognition and the vowel of the set object, which is converted by the conversion unit 131.

例えば、図２（ａ）に示す、「モレロ皮膚」の母音「オエオイウ」と、図２（ｂ）に示す「モレロ岐阜」の母音「オエオイウ」は、全ての母音が一致するので、一致率は１００％となる。また、母音の数が５個であり、４つの母音が一致する場合、一致率は８０％となる。この一致率は、例えば、次の式（１）で表される。
（一致率）＝（一致した母音の数）／（母音の数）…（１）
なお、音声認識に失敗した発話内容の母音の数と、設定された目的語の母音の数が異なる場合は、例えば、設定された目的語の母音の数を、（母音の数）として用いることができる。或いは、音声認識に失敗した発話内容の母音の数と、設定された目的語の母音の数が異なる場合、例えば、母音の数が多い方（又は少ない方）を、（母音の数）として用いるもの等であっても良い。 For example, the vowel "Oeoiu" of "Morero skin" shown in FIG. 2A and the vowel "Oeoiu" of "Morero Gifu" shown in FIG. 2B match all the vowels, so the matching rate is high. It becomes 100%. Further, when the number of vowels is 5, and the four vowels match, the matching rate is 80%. This matching rate is expressed by, for example, the following equation (1).
(Match rate) = (Number of matched vowels) / (Number of vowels) ... (1)
If the number of vowels in the utterance that failed in speech recognition and the number of vowels in the set object are different, for example, the number of vowels in the set object should be used as (the number of vowels). Can be done. Alternatively, when the number of vowels of the utterance content for which voice recognition has failed and the number of vowels of the set object are different, for example, the one with the larger number (or the smaller number) of the vowels is used as the (number of vowels). It may be a thing or the like.

登録部１３３は、判定部１３２によって判定された一致率が、予め定められた閾値以上である場合、音声認識に失敗した発話内容（例えば「モレロ皮膚」）と、設定された目的語（例えば「モレロ岐阜」）とを対応付けて認識ＤＢ１４０に登録する。 When the matching rate determined by the determination unit 132 is equal to or higher than a predetermined threshold value, the registration unit 133 includes the utterance content (for example, "morero skin") that failed in voice recognition and the set object (for example, "" Morero Gifu ") is associated and registered in the recognition DB 140.

ここで、予め定められた閾値は、例えば、音声認識に失敗した発話内容の母音と、設定された目的語の母音とが一致すると判断するための値が、予め設定されているものとする。ここでは、予め定められた閾値が１００％であるものとして、以下の説明を行う。なお、予め定められた閾値は、１００％より小さい値（例えば、８０〜９９％等）であっても良い。 Here, it is assumed that the predetermined threshold value is set in advance, for example, a value for determining that the vowel of the utterance content for which voice recognition has failed and the vowel of the set object match. Here, the following description will be given assuming that the predetermined threshold value is 100%. The predetermined threshold value may be a value smaller than 100% (for example, 80 to 99%).

図２（ｃ）は、発話内容と目的語とを対応付けて、認識ＤＢ１４０に登録された情報（以下、対応情報と呼ぶ）２０１のイメージを示している。図２（ｃ）の例では、対応情報２０１には、音声認識に失敗した発話内容「モレロ皮膚」（音声データ、又は音声データから抽出された文字列）と、設定された目的語「モレロ岐阜」（例えば、文字列）とが対応付けられて記憶されている。これにより、音声認識部１１０は、発話内容「モレロ皮膚」で認識ＤＢ１４０を検索した場合、検索結果として「モレロ岐阜」を取得することができるようになる。 FIG. 2C shows an image of information (hereinafter referred to as correspondence information) 201 registered in the recognition DB 140 by associating the utterance content with the object. In the example of FIG. 2C, the correspondence information 201 includes the utterance content "Morero skin" (voice data or a character string extracted from the voice data) for which voice recognition failed, and the set target word "Morero Gifu". "(For example, a character string) is stored in association with it. As a result, when the voice recognition unit 110 searches the recognition DB 140 based on the utterance content "Morero skin", the voice recognition unit 110 can acquire "Morero Gifu" as the search result.

認識ＤＢ（認識データベース）１４０は、音声認識部１１０による音声認識で用いられる音声認識辞書であり、音声認識の対象となる複数の目的語が予め登録されている。また、認識ＤＢ１４０には、目的語毎に、ナビゲーション装置等で用いられる様々な情報、例えば、座標情報、電話番号、施設情報等が、さらに記憶されているもの等であっても良い。 The recognition DB (recognition database) 140 is a voice recognition dictionary used in voice recognition by the voice recognition unit 110, and a plurality of objects to be voice recognition are registered in advance. Further, the recognition DB 140 may further store various information used in the navigation device or the like for each object, for example, coordinate information, telephone number, facility information, and the like.

音声認識部１１０は、例えば、発話者が発話した音声を取得し、取得した音声の発話内容（例えば、音声データ）で、認識ＤＢ１４０に登録された目的語を検索する。これにより、音声認識部１１０は、認識ＤＢ１４０に予め登録された複数の目的語の中から、取得した音声の発話内容に対応する目的語を、検索結果として取得することができる。 The voice recognition unit 110 acquires, for example, the voice spoken by the speaker, and searches for the target word registered in the recognition DB 140 from the utterance content (for example, voice data) of the acquired voice. As a result, the voice recognition unit 110 can acquire the object corresponding to the utterance content of the acquired voice as a search result from the plurality of objects registered in advance in the recognition DB 140.

さらに、本実施形態では、音声認識部１１０は、認識ＤＢ１４０に予め登録された複数の目的語の中に、取得した音声の発話内容に対応する目的語がない場合、図２（ｃ）に示すような対応情報２０１から、発話内容に対応する目的語を検索結果として取得する。 Further, in the present embodiment, the voice recognition unit 110 shows in FIG. 2C when there is no object corresponding to the utterance content of the acquired voice among the plurality of objects registered in advance in the recognition DB 140. From such correspondence information 201, the object corresponding to the utterance content is acquired as a search result.

＜処理の概要＞
続いて、図１〜３を用いて、音声認識装置１００の具体的な処理の一例について説明する。図１に示す音声認識装置１００において、利用者（発話者）が、例えば、「モレロ岐阜」をナビゲーション装置の目的地に設定するために、音声認識装置１００に対して、「モレロ岐阜」と発話するものとする。 <Outline of processing>
Subsequently, an example of specific processing of the voice recognition device 100 will be described with reference to FIGS. In the voice recognition device 100 shown in FIG. 1, the user (speaker) utters "Morero Gifu" to the voice recognition device 100 in order to set, for example, "Morero Gifu" as the destination of the navigation device. It shall be.

図１の（１）において、音声認識部１１０は、例えば、利用者が発話した発話内容「モレロ岐阜」で、認識ＤＢ１４０を検索するが、認識結果が「モレロ皮膚」となってしまい、検索（音声認識）に失敗したものとする。 In (1) of FIG. 1, the voice recognition unit 110 searches the recognition DB 140 with, for example, the utterance content "Morero Gifu" spoken by the user, but the recognition result becomes "Morero skin" and the search ( It is assumed that voice recognition) has failed.

図１の（２）において、目的語設定部１２０は、音声認識部１１０による音声認識が失敗した場合、失敗した音声認識とは別の方法で、利用者による目的語「モレロ岐阜」の設定を受付する。例えば、発話者は、声の大きさ、アクセント、発話速度等を代えて、「モレロ岐阜」の音声認識をリトライすることにより、目的語「モレロ岐阜」を設定する。 In (2) of FIG. 1, when the voice recognition by the voice recognition unit 110 fails, the object setting unit 120 sets the object "Morero Gifu" by the user by a method different from the failed voice recognition. To accept. For example, the speaker sets the object "Morero Gifu" by retrying the voice recognition of "Morero Gifu" by changing the loudness, accent, speech speed, etc. of the voice.

図１の（３）において、目的語設定部１２０は、利用者によって設定された目的語「モレロ岐阜」を、ナビゲーション装置等の目的地に決定する。 In (3) of FIG. 1, the object setting unit 120 determines the object "Morero Gifu" set by the user as the destination of the navigation device or the like.

また、音声認識装置１００の登録処理部１３０は、音声認識部１１０による音声認識に失敗し、かつ目的語設定部１２０により目的語が設定された場合、（４）〜（６）に示す登録処理を実行する。 Further, when the voice recognition unit 130 fails in voice recognition by the voice recognition unit 110 and the object is set by the object setting unit 120, the registration processing unit 130 of the voice recognition device 100 performs the registration processing shown in (4) to (6). To execute.

図１の（４）において、変換部１３１は、音声認識に失敗した発話内容、及び設定された目的語を、それぞれ、母音に変換する。例えば、図２（ａ）に示すように、音声認識に失敗した発話内容「モレロ皮膚」は、母音「オエオイウ」に変換され、図２（ｂ）に示すように、設定された目的地「モレロ岐阜」は、母音「オエオイウ」に変換される。 In (4) of FIG. 1, the conversion unit 131 converts the utterance content that failed in voice recognition and the set object into vowels, respectively. For example, as shown in FIG. 2 (a), the utterance content "Morero skin" that failed in voice recognition is converted into the vowel "Oeoiu", and as shown in FIG. 2 (b), the set destination "Morero" is converted. "Gifu" is converted to the vowel "Oeoiu".

図１の（５）において、判定部１３２は、変換部１３１が変換した、音声認識に失敗した発話内容の母音と、設定された目的語の母音との一致率を判定する。ここでは、音声認識に失敗した発話内容「モレロ皮膚」の母音「オエオイウ」と、設定された目的地「モレロ岐阜」の母音「オエオイウ」が一致するので、一致率は１００％と判定される。 In (5) of FIG. 1, the determination unit 132 determines the matching rate between the vowel of the utterance content that failed in voice recognition and the vowel of the set object, which is converted by the conversion unit 131. Here, since the vowel "Oeoiu" of the utterance content "Morero skin" that failed in voice recognition and the vowel "Oeoiu" of the set destination "Morero Gifu" match, the match rate is determined to be 100%.

図１の（６）において、登録部１３３は、判定部１３２が判定した一致率が、閾値（例えば、１００％）以上である場合、音声認識に失敗した発話内容「モレロ皮膚」と、設定された目的語「モレロ岐阜」とを対応付けて、認識ＤＢ１４０に登録する。ここでは、判定部１３２が判定した一致率１００％は、閾値（１００％）以上なので、登録部１３３は、例えば、図２（ｃ）に示すように、「モレロ皮膚」と「モレロ岐阜」とを対応付けて、認識ＤＢ１４０の対応情報２０１に登録する。 In (6) of FIG. 1, when the match rate determined by the determination unit 132 is equal to or greater than a threshold value (for example, 100%), the registration unit 133 is set as the utterance content "morero skin" in which voice recognition fails. It is registered in the recognition DB 140 in association with the object "Morero Gifu". Here, since the matching rate of 100% determined by the determination unit 132 is equal to or greater than the threshold value (100%), the registration unit 133 refers to, for example, "Morero skin" and "Morero Gifu" as shown in FIG. 2C. Are associated with each other and registered in the corresponding information 201 of the recognition DB 140.

上記の処理により、認識ＤＢ１４０に、「モレロ皮膚」と「モレロ岐阜」とが対応付けて記憶され、認識ＤＢ１４０に予め登録されていなかった発話内容「モレロ皮膚」を用いて、検索結果として目的語「モレロ岐阜」を取得することができるようになる。 By the above processing, "Morero skin" and "Morero Gifu" are stored in the recognition DB 140 in association with each other, and the object word is used as the search result using the utterance content "Morero skin" that has not been registered in the recognition DB 140 in advance. You will be able to acquire "Morero Gifu".

これにより、例えば、図３の（７）に示すように、音声認識部１１０が、例えば、発話内容「モレロ皮膚」で認識ＤＢ１４０を検索すると、発話内容「モレロ皮膚」が、認識ＤＢ１４０で目的語「モレロ岐阜」に変換され、検索されるようになる。 As a result, for example, as shown in (7) of FIG. 3, when the voice recognition unit 110 searches for the recognition DB 140 in the utterance content "morero skin", the utterance content "morero skin" is the object in the recognition DB 140. It will be converted to "Morero Gifu" and will be searched.

このように、音声認識装置１００は、音声認識に失敗した場合でも、母音の認識は正しい傾向があることに着目し、音声認識に失敗した発話内容と、設定された目的語の母音の一致率が閾値以上である場合、両者を対応付けて音声認識データベースに登録する。 In this way, the voice recognition device 100 pays attention to the fact that the recognition of vowels tends to be correct even when the voice recognition fails, and the matching rate between the utterance content in which the voice recognition fails and the vowel of the set target word. If is greater than or equal to the threshold value, the two are associated and registered in the speech recognition database.

従って、本実施形態によれば、取得した音声の発話内容に基づいて、音声認識データベース１４０を参照して音声認識を行う音声認識装置１００において、音声認識データベースに予め登録されていない発話内容の認識率を向上させることができるようになる。 Therefore, according to the present embodiment, the voice recognition device 100 that performs voice recognition by referring to the voice recognition database 140 based on the acquired voice utterance content recognizes the utterance content that is not registered in the voice recognition database in advance. You will be able to improve the rate.

＜処理の流れ＞
続いて、本実施形態に係る音声認識方法の処理の流れについて説明する。この処理は、図１〜３で説明した処理の一例を一般化した処理の流れを示している。 <Processing flow>
Subsequently, the processing flow of the voice recognition method according to the present embodiment will be described. This process shows a generalized flow of an example of the process described with reference to FIGS. 1 to 3.

ステップＳ４０１において、音声認識装置１００の音声認識部１１０は、発話者の音声を取得し、取得した音声の発話内容で認識ＤＢ１４０を検索する。 In step S401, the voice recognition unit 110 of the voice recognition device 100 acquires the voice of the speaker and searches the recognition DB 140 based on the utterance content of the acquired voice.

ステップＳ４０２において、音声認識部１１０は、取得した音声の発話内容に対応する目的語が検索されたか（音声認識に成功したか）を判断する。 In step S402, the voice recognition unit 110 determines whether the object corresponding to the utterance content of the acquired voice has been searched (successful voice recognition).

対応する目的語が検索された場合（音声認識に成功した場合）、音声認識部１１０は、処理をステップＳ４０３に移行させる。一方、対応する目的語が検索されなかった場合（音声認識に失敗した場合）、音声認識部１１０は、処理をステップＳ４０４、Ｓ４０５に移行させる。 When the corresponding object is searched (when the voice recognition is successful), the voice recognition unit 110 shifts the process to step S403. On the other hand, when the corresponding object is not searched (when the voice recognition fails), the voice recognition unit 110 shifts the process to steps S404 and S405.

ステップＳ４０３に移行すると、音声認識部１１０は、ステップＳ４０１で検索された目的語を、例えば、目的地に設定（決定）する。 When moving to step S403, the voice recognition unit 110 sets (determines), for example, the object searched in step S401 as the destination.

ステップＳ４０４に移行すると、音声認識部１１０は、音声認識に失敗した発話内容を、音声認識装置１００のＲＡＭ、ストレージ装置等の記憶部に記憶する。 When the process proceeds to step S404, the voice recognition unit 110 stores the utterance content that failed in voice recognition in a storage unit such as a RAM or a storage device of the voice recognition device 100.

ステップＳ４０５に移行すると、音声認識装置１００の目的語設定部１２０は、失敗した音声認識とは別の方法で目的語の設定を受付し、別の方法で設定された目的語を、例えば、目的地に設定（決定）する。 When the process proceeds to step S405, the object setting unit 120 of the voice recognition device 100 accepts the setting of the object by a method different from the failed voice recognition, and sets the object by another method, for example, the object. Set (decide) on the ground.

ステップＳ４０６において、目的語設定部１２０は、ステップＳ４０５で設定された目的語を、音声認識装置１００のＲＡＭ、ストレージ装置等の記憶部に記憶する。 In step S406, the object setting unit 120 stores the object set in step S405 in a storage unit such as a RAM or a storage device of the voice recognition device 100.

上記の処理により、音声認識装置１００が、利用者の発話、又は操作に応じて、目的地を設定する１つのセッション（処理）が完了する。一方、音声認識装置１００の登録処理部１３０は、目的地を設定するセッションとは別に、図１の（４）〜（６）で説明した登録処理を、例えば、バッチ処理等で実行する。 By the above processing, one session (processing) in which the voice recognition device 100 sets the destination according to the utterance or operation of the user is completed. On the other hand, the registration processing unit 130 of the voice recognition device 100 executes the registration processing described in FIGS. 1 (4) to (6) separately from the session for setting the destination, for example, by batch processing or the like.

例えば、登録処理部１３０は、１つのセッションの中で、音声認識部１１０による音声認識に失敗し、かつ失敗した音声認識とは別の方法で目的語が設定された場合、ステップＳ４０７において、登録処理部１３０による登録処理を実行する。 For example, when the registration processing unit 130 fails in voice recognition by the voice recognition unit 110 in one session and the object is set by a method different from the failed voice recognition, the registration processing unit 130 registers in step S407. The registration process by the processing unit 130 is executed.

具体的には、図１を用いて前述したように、登録処理部１３０の変換部１３１は、ステップＳ４０４で記憶した音声認識に失敗した発話内容、及びステップＳ４０６で記憶した設定された目的語を、それぞれ、母音に変換する。 Specifically, as described above with reference to FIG. 1, the conversion unit 131 of the registration processing unit 130 transmits the utterance content that failed in voice recognition stored in step S404 and the set object stored in step S406. , Convert to vowels, respectively.

また、登録処理部１３０の判定部１３２は、変換部１３１が変換した、音声認識に失敗した発話内容の母音と、設定された目的語の母音との一致率を判定する。 Further, the determination unit 132 of the registration processing unit 130 determines the matching rate between the vowel of the utterance content that has failed in voice recognition and the vowel of the set object, which is converted by the conversion unit 131.

さらに、登録処理部１３０の登録部１３３は、判定部１３２が判定した一致率が閾値以上である場合、音声認識に失敗した発話内容と、設定された目的語とを対応付けて認識ＤＢ１４０に登録する。 Further, when the match rate determined by the determination unit 132 is equal to or greater than the threshold value, the registration unit 133 of the registration processing unit 130 registers the utterance content that failed in voice recognition and the set object in the recognition DB 140 in association with each other. To do.

上記の処理により、認識ＤＢ１４０には、予め登録された目的語に加えて、音声認識に失敗した発話内容に対応する目的語が、自動的に追加される。 By the above processing, in addition to the object registered in advance, the object corresponding to the utterance content in which the voice recognition fails is automatically added to the recognition DB 140.

これにより、音声認識装置１００は、取得した音声の発話内容に基づいて、音声認識データベース１４０を参照して音声認識を行う音声認識装置１００において、音声認識データベースに予め登録されていない発話内容の認識率を向上させることができるようになる。 As a result, the voice recognition device 100 recognizes the utterance content that is not registered in the voice recognition database in advance in the voice recognition device 100 that performs voice recognition by referring to the voice recognition database 140 based on the utterance content of the acquired voice. You will be able to improve the rate.

１００音声認識装置
１１０音声認識部
１２０目的語設定部
１３１変換部
１３２判定部
１３３登録部
１４０認識ＤＢ（音声認識データベース） 100 Speech recognition device 110 Speech recognition unit 120 Object setting unit 131 Conversion unit 132 Judgment unit 133 Registration unit 140 Recognition DB (speech recognition database)

Claims

A voice recognition device that acquires the voice of a speaker, refers to a voice recognition database based on the utterance content of the acquired voice, and performs voice recognition to determine an object corresponding to the utterance content.
When the voice recognition fails and the object is set by a method different from the voice recognition, the utterance content for which the voice recognition fails and the conversion unit that converts the set object into a vowel. When,
A determination unit that determines the matching rate between the vowel of the utterance content that failed in voice recognition and the vowel of the set object.
When the match rate determined by the determination unit is equal to or greater than the threshold value, the registration unit that registers the utterance content that failed in the voice recognition and the set object in the voice recognition database.
A voice recognition device.