JPWO2020016967A1

JPWO2020016967A1 - Voice recognition device, in-vehicle navigation device, automatic voice dialogue device, and voice recognition method

Info

Publication number: JPWO2020016967A1
Application number: JP2020530789A
Authority: JP
Inventors: 小谷　亮; 亮小谷
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2020-10-01
Anticipated expiration: 2038-07-18
Also published as: JP6786018B2; WO2020016967A1

Abstract

音声認識装置（１００）は、音声入力部（１３）から音声信号を取得する音声信号取得部（１１１）と、音声信号取得部（１１１）が取得した音声信号に基づいて音声認識を行い、認識結果を出力する音声認識部（１１２）と、音声信号取得部（１１１）が取得した音声信号に超音波信号が含まれるか否かを判定し、音声信号に超音波信号が含まれると判定した場合、当該音声信号に基づいた認識結果を音声認識部（１１２）から出力させないよう制御する音声認識制御部（１１３）と、を備えた。The voice recognition device (100) performs voice recognition based on the voice signal acquisition unit (111) that acquires the voice signal from the voice input unit (13) and the voice signal acquired by the voice signal acquisition unit (111), and recognizes the voice. It was determined whether or not the voice signal acquired by the voice recognition unit (112) and the voice signal acquisition unit (111) that output the result contained an ultrasonic signal, and it was determined that the voice signal contained an ultrasonic signal. In this case, the voice recognition control unit (113) is provided to control the recognition result based on the voice signal so as not to be output from the voice recognition unit (112).

Description

この発明は、音声認識装置、車載用ナビゲーション装置、自動音声対話装置、及び音声認識方法に関するものである。 The present invention relates to a voice recognition device, an in-vehicle navigation device, an automatic voice dialogue device, and a voice recognition method.

音声認識技術の精度が高まるにつれ、音声認識技術を電子機器等に適用し、操作者が発した音声に基づいて電子機器等を制御することが行われている。
例えば、特許文献１には、乗員により発せられた音声を収集する音声収集手段と、収集された音声を車外設備に送信する音声送信手段と、送信された音声に基づき車外設備において作成される目的地情報を車外設備から受信する目的地情報受信手段と、を備え、受信された目的地情報に基づく案内を行なう車両用車載用ナビゲーション装置であって、音声収集手段により収集された音声に対する音声認識を行なって、音声収集手段により収集された音声から目的地を抽出する目的地抽出手段を備え、音声収集手段により音声が収集された後、目的地情報受信手段により目的地情報が受信されるまでは、目的地抽出手段により抽出された目的地に基づく案内を行なう車両用車載用ナビゲーション装置が開示されている。As the accuracy of the voice recognition technology increases, the voice recognition technology is applied to electronic devices and the like, and the electronic devices and the like are controlled based on the voice emitted by the operator.
For example, Patent Document 1 describes a voice collecting means for collecting voices emitted by an occupant, a voice transmitting means for transmitting the collected voices to the equipment outside the vehicle, and an object created in the equipment outside the vehicle based on the transmitted voice. An in-vehicle navigation device for vehicles that includes a destination information receiving means for receiving location information from equipment outside the vehicle and provides guidance based on the received destination information, and voice recognition for voice collected by the voice collecting means. Is provided with a destination extraction means for extracting a destination from the voice collected by the voice collecting means, and after the voice is collected by the voice collecting means, until the destination information is received by the destination information receiving means. Discloses an in-vehicle navigation device for vehicles that provides guidance based on a destination extracted by a destination extraction means.

特開２００８−２５６６５９号公報Japanese Unexamined Patent Publication No. 2008-2566559

しかしながら、音声認識は、操作者が発した音声だけでなく、例えば、パラメトリックスピーカ等の超音波を発生させる装置から発せられた人間の可聴領域外の周波数を有する音声が入力された場合にも、認識されてしまう場合がある。
超音波は、人間の可聴領域外の周波数を有するため、通常、人間には聞き取ることができない。更に、超音波には、高い指向性を持たせることができる。このため、音声認識技術が適用された電子機器等の音声入力部に向けて超音波が発せられた場合、音声入力部の周囲にいる人間ですら音声入力部に音声が入力されていることに気が付かないうちに、入力された超音波信号により電子機器等が制御されてしまうという問題があった。However, in voice recognition, not only the voice emitted by the operator but also the voice having a frequency outside the human audible range emitted from a device that generates ultrasonic waves such as a parametric speaker is input. It may be recognized.
Ultrasound has frequencies outside the human audible range and is usually inaudible to humans. Further, the ultrasonic wave can have high directivity. For this reason, when an ultrasonic wave is emitted toward a voice input unit of an electronic device or the like to which voice recognition technology is applied, even a person around the voice input unit inputs the voice to the voice input unit. There is a problem that electronic devices and the like are controlled by the input ultrasonic signal without noticing it.

この発明は、上述の問題を解決するためのもので、超音波による音声認識の認識結果出力を抑制できる音声認識装置を提供することを目的としている。 The present invention is for solving the above-mentioned problems, and an object of the present invention is to provide a voice recognition device capable of suppressing the recognition result output of voice recognition by ultrasonic waves.

この発明に係る音声認識装置は、音声入力部から音声信号を取得する音声信号取得部と、音声信号取得部が取得した音声信号に基づいて音声認識を行い、認識結果を出力する音声認識部と、音声信号取得部が取得した音声信号に超音波信号が含まれるか否かを判定し、音声信号に超音波信号が含まれると判定した場合、当該音声信号に基づいた認識結果を音声認識部から出力させないよう制御する音声認識制御部と、を備えたものである。 The voice recognition device according to the present invention includes a voice signal acquisition unit that acquires a voice signal from a voice input unit, and a voice recognition unit that performs voice recognition based on the voice signal acquired by the voice signal acquisition unit and outputs a recognition result. , It is determined whether or not the audio signal acquired by the audio signal acquisition unit contains an ultrasonic signal, and when it is determined that the audio signal includes an ultrasonic signal, the recognition result based on the audio signal is obtained by the audio recognition unit. It is equipped with a voice recognition control unit that controls not to output from.

この発明によれば、超音波による音声認識の認識結果出力を抑制できる。 According to the present invention, it is possible to suppress the recognition result output of voice recognition by ultrasonic waves.

図１は、実施の形態１に係る音声認識装置が適用された車載用ナビゲーション装置の要部を示すブロック図である。FIG. 1 is a block diagram showing a main part of an in-vehicle navigation device to which the voice recognition device according to the first embodiment is applied. 図２Ａ及び図２Ｂは、実施の形態１に係る音声認識装置の要部のハードウェア構成の一例を示す図である。2A and 2B are diagrams showing an example of the hardware configuration of the main part of the voice recognition device according to the first embodiment. 図３は、実施の形態１に係る音声認識装置の処理の一例を説明するフローチャートである。FIG. 3 is a flowchart illustrating an example of processing of the voice recognition device according to the first embodiment. 図４は、実施の形態１の変形例に係る音声認識装置が適用された車載用ナビゲーション装置の要部を示すブロック図である。FIG. 4 is a block diagram showing a main part of an in-vehicle navigation device to which the voice recognition device according to the modified example of the first embodiment is applied. 図５は、実施の形態２に係る音声認識装置が適用された自動音声対話装置の要部を示すブロック図である。FIG. 5 is a block diagram showing a main part of an automatic voice dialogue device to which the voice recognition device according to the second embodiment is applied.

以下、この発明の実施の形態について、図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

実施の形態１．
実施の形態１に係る音声認識装置１００は、一例として、車載用ナビゲーション装置１０に適用されるものとして、以下説明する。
図１は、実施の形態１に係る音声認識装置１００が適用された車載用ナビゲーション装置１０の要部を示すブロック図である。Embodiment 1.
The voice recognition device 100 according to the first embodiment will be described below, as an example, as being applied to the vehicle-mounted navigation device 10.
FIG. 1 is a block diagram showing a main part of an in-vehicle navigation device 10 to which the voice recognition device 100 according to the first embodiment is applied.

車両１は、車載用ナビゲーション装置１０、航法信号受信機１１、地図データベース１２、音声入力部１３、表示装置１４、及び音声出力装置１５を備える。 The vehicle 1 includes an in-vehicle navigation device 10, a navigation signal receiver 11, a map database 12, a voice input unit 13, a display device 14, and a voice output device 15.

航法信号受信機１１は、航法衛星からＧＰＳ信号等の航法信号を受信する受信装置である。 The navigation signal receiver 11 is a receiving device that receives a navigation signal such as a GPS signal from a navigation satellite.

地図データベース１２は、道路地図に関する情報が記された地図情報を格納する記憶装置である。 The map database 12 is a storage device that stores map information in which information related to a road map is recorded.

音声入力部１３は、取得した音波を音声信号に変換して、変換した音声信号を後述する音声認識装置１００に出力する、例えば、マイクである。 The voice input unit 13 is, for example, a microphone that converts the acquired sound wave into a voice signal and outputs the converted voice signal to the voice recognition device 100 described later.

表示装置１４は、後述する車載用ナビゲーション装置１０が出力した目的地までの経路案内を行うための案内画像情報を表示する、例えば、ディスプレイである。 The display device 14 is, for example, a display that displays guidance image information for providing route guidance to a destination output by the vehicle-mounted navigation device 10 described later.

音声出力装置１５は、後述する車載用ナビゲーション装置１０が出力した目的地までの経路案内を行うための案内音声を音声出力する、例えば、スピーカである。 The voice output device 15 is, for example, a speaker that outputs guidance voice for providing route guidance to a destination output by the vehicle-mounted navigation device 10 described later.

車載用ナビゲーション装置１０は、音声認識装置１００、航法信号取得部１０１、地図情報取得部１０２、ナビゲーション制御部１０３、表示出力部１０４、及び音声出力部１０５を備える。 The in-vehicle navigation device 10 includes a voice recognition device 100, a navigation signal acquisition unit 101, a map information acquisition unit 102, a navigation control unit 103, a display output unit 104, and a voice output unit 105.

航法信号取得部１０１は、航法信号受信機１１が受信した航法信号を取得する。 The navigation signal acquisition unit 101 acquires the navigation signal received by the navigation signal receiver 11.

地図情報取得部１０２は、地図データベース１２から地図情報を取得する。地図データベース１２は、地図情報取得部１０２が地図情報を取得できればよく、自車両に搭載されているとは限らない。例えば、地図情報取得部１０２は、インターネット、公衆回線等の公衆ネットワークを介して、公衆ネットワーク上に存在する地図データベース１２から地図情報を取得してもよい。 The map information acquisition unit 102 acquires map information from the map database 12. The map database 12 is not always installed in the own vehicle as long as the map information acquisition unit 102 can acquire the map information. For example, the map information acquisition unit 102 may acquire map information from the map database 12 existing on the public network via a public network such as the Internet or a public line.

ナビゲーション制御部１０３は、航法信号取得部１０１が取得した航法信号と、地図情報取得部１０２が取得した地図情報とに基づいて、自車両が走行する道路における地点、すなわち、自車両の走行位置を特定する。ナビゲーション制御部１０３は、特定した走行位置を示す走行位置情報を生成する。
ナビゲーション制御部１０３は、例えば、後述する音声認識装置１００が音声認識した認識結果に基づいて目的地を設定し、自車両の走行位置から目的地の地点までの走行経路を決定する。ナビゲーション制御部１０３は、決定した走行経路に基づいて経路案内情報を生成する。The navigation control unit 103 determines a point on the road on which the own vehicle travels, that is, a traveling position of the own vehicle, based on the navigation signal acquired by the navigation signal acquisition unit 101 and the map information acquired by the map information acquisition unit 102. Identify. The navigation control unit 103 generates travel position information indicating the specified travel position.
The navigation control unit 103 sets a destination based on the recognition result of voice recognition by the voice recognition device 100, which will be described later, and determines a travel route from the travel position of the own vehicle to the destination point. The navigation control unit 103 generates route guidance information based on the determined travel route.

表示出力部１０４は、ナビゲーション制御部１０３を介して取得した地図情報と、ナビゲーション制御部１０３が生成した走行位置情報及び経路案内情報とに基づいて、経路案内を行うための案内画像情報を生成し、当該案内画像情報を表示装置１４に出力する。 The display output unit 104 generates guidance image information for performing route guidance based on the map information acquired via the navigation control unit 103 and the traveling position information and the route guidance information generated by the navigation control unit 103. , The guide image information is output to the display device 14.

音声出力部１０５は、ナビゲーション制御部１０３が生成した経路案内情報に基づいて経路案内を行うための案内音声情報を生成し、当該案内音声情報を音声出力装置１５に出力する。 The voice output unit 105 generates guidance voice information for performing route guidance based on the route guidance information generated by the navigation control unit 103, and outputs the guidance voice information to the voice output device 15.

すなわち、車載用ナビゲーション装置１０は、航法信号受信機１１から取得した航法信号と、地図データベース１２から取得した地図情報とに基づいて、設定された目的地までの走行経路を決定し、経路案内を行うための情報を表示装置１４及び音声出力装置１５に出力するものである。 That is, the in-vehicle navigation device 10 determines the traveling route to the set destination based on the navigation signal acquired from the navigation signal receiver 11 and the map information acquired from the map database 12, and provides route guidance. The information to be performed is output to the display device 14 and the audio output device 15.

音声認識装置１００は、音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４を備える。 The voice recognition device 100 includes a voice signal acquisition unit 111, a voice recognition unit 112, a voice recognition control unit 113, and a notification output unit 114.

音声信号取得部１１１は、音声入力部１３から音声信号を取得する。
音声信号取得部１１１は、取得した音声信号を音声認識部１１２及び音声認識制御部１１３に出力する。
音声信号取得部１１１は、音声信号取得部１１１が音声信号を取得した際にタイムスタンプを付加し、タイムスタンプを付加した音声信号を音声情報として音声認識部１１２及び音声認識制御部１１３に出力しても良い。The voice signal acquisition unit 111 acquires a voice signal from the voice input unit 13.
The voice signal acquisition unit 111 outputs the acquired voice signal to the voice recognition unit 112 and the voice recognition control unit 113.
The voice signal acquisition unit 111 adds a time stamp when the voice signal acquisition unit 111 acquires the voice signal, and outputs the voice signal to which the time stamp is added to the voice recognition unit 112 and the voice recognition control unit 113 as voice information. You may.

音声認識部１１２は、音声信号取得部１１１が取得した音声信号に基づいて音声認識を行い、認識結果を出力する。
音声認識部１１２は、例えば、認識結果をナビゲーション制御部１０３に出力し、ナビゲーション制御部１０３は、音声認識部１１２から取得した認識結果に基づいて、目的地を設定する。音声認識部１１２が音声信号に基づいて行う音声認識処理は、周知の音声認識技術を適用することにより実施可能であるため、詳細な説明は省略する。The voice recognition unit 112 performs voice recognition based on the voice signal acquired by the voice signal acquisition unit 111, and outputs the recognition result.
For example, the voice recognition unit 112 outputs the recognition result to the navigation control unit 103, and the navigation control unit 103 sets the destination based on the recognition result acquired from the voice recognition unit 112. Since the voice recognition process performed by the voice recognition unit 112 based on the voice signal can be performed by applying a well-known voice recognition technique, detailed description thereof will be omitted.

音声認識制御部１１３は、音声信号取得部１１１が取得した音声信号に超音波信号が含まれるか否かを判定する。音声認識制御部１１３は、音声信号に超音波信号が含まれると判定した場合、当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないよう制御する。
具体的には、音声認識制御部１１３が行う音声信号に超音波信号が含まれるか否かの判定処理は、例えば、離散フーリエ変換により音声信号をスペクトル解析し、所定周波数より高い周波数の信号の有無により判定する。より具体的には、例えば、音声認識制御部１１３は、音声信号に超音波信号が含まれると判定した場合、音声認識部１１２に音声認識をさせないよう制御することで、当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないよう制御する。所定周波数は、２万ヘルツに限るものではなく、人間が聞き取れるとされる周波数の上限の近傍であれば、例えば、１万ヘルツ等の２万ヘルツより低い周波数でも良い。The voice recognition control unit 113 determines whether or not the voice signal acquired by the voice signal acquisition unit 111 includes an ultrasonic signal. When it is determined that the voice signal includes an ultrasonic signal, the voice recognition control unit 113 controls so that the voice recognition unit 112 does not output the recognition result based on the voice signal to the navigation control unit 103.
Specifically, in the process of determining whether or not the audio signal included in the audio signal by the speech recognition control unit 113 is performed, for example, the speech signal is spectrally analyzed by discrete Fourier transform, and a signal having a frequency higher than a predetermined frequency is used. Judge by the presence or absence. More specifically, for example, when the voice recognition control unit 113 determines that the voice signal includes an ultrasonic signal, the voice recognition control unit 113 controls the voice recognition unit 112 so as not to perform voice recognition, based on the voice signal. The recognition result is controlled so as not to be output from the voice recognition unit 112 to the navigation control unit 103. The predetermined frequency is not limited to 20,000 hertz, and may be a frequency lower than 20,000 hertz, for example, 10,000 hertz, as long as it is near the upper limit of the frequency that humans can hear.

また、音声認識制御部１１３が行う音声信号に超音波信号が含まれるか否かの判定処理は、所定周波数より高い周波数の信号が所定の振幅以上であるか否かにより判定しても良い。所定の振幅は、例えば、音声認識部１１２が音声認識処理を行う際に必要な振幅の下限値である。
また、音声認識制御部１１３は、音声信号に超音波信号が含まれると判定した場合、例えば、音声信号に超音波信号が含まれると判定している期間の当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないよう制御することで、当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないよう制御しても良い。より具体的には、例えば、音声信号取得部１１１が付加したタイムスタンプを参照して、音声認識制御部１１３は、音声信号に超音波信号が含まれると判定している期間の始期及び終期を示す情報を音声認識部１１２に出力する。更に具体的には、音声認識制御部１１３は、音声信号に超音波信号が含まれると判定した際に、音声信号における超音波信号が含まれた時点、すなわち、音声信号に超音波信号が含まれると判定している期間の始期を示す情報を音声認識部１１２に即座に出力する。その後、音声認識制御部１１３は、音声信号に超音波信号が含まれないと判定した際に、音声信号における超音波信号が含まれなくなった時点、すなわち、音声信号に超音波信号が含まれると判定している期間の終期を示す情報を音声認識部１１２に出力する。音声認識部１１２は、音声認識制御部１１３が出力した音声信号に超音波信号が含まれると判定している期間の始期及び終期を示す情報に基づいて、当該期間に音声認識した認識結果をナビゲーション制御部１０３に出力せずに破棄する。Further, the process of determining whether or not the audio signal performed by the audio recognition control unit 113 includes an ultrasonic signal may be determined by whether or not a signal having a frequency higher than a predetermined frequency has a predetermined amplitude or more. The predetermined amplitude is, for example, the lower limit value of the amplitude required when the voice recognition unit 112 performs the voice recognition process.
Further, when the voice recognition control unit 113 determines that the voice signal includes an ultrasonic signal, for example, the voice recognition control unit 113 obtains a recognition result based on the voice signal during the period in which the voice signal is determined to contain the ultrasonic signal. By controlling so that the voice recognition unit 112 does not output to the navigation control unit 103, it may be controlled so that the recognition result based on the voice signal is not output from the voice recognition unit 112 to the navigation control unit 103. More specifically, for example, referring to the time stamp added by the voice signal acquisition unit 111, the voice recognition control unit 113 determines the start and end of the period in which the voice signal includes the ultrasonic signal. The indicated information is output to the voice recognition unit 112. More specifically, when the voice recognition control unit 113 determines that the voice signal includes an ultrasonic signal, the time when the ultrasonic signal is included in the voice signal, that is, the voice signal includes the ultrasonic signal. Information indicating the start of the period determined to be performed is immediately output to the voice recognition unit 112. After that, when the voice recognition control unit 113 determines that the voice signal does not include the ultrasonic signal, the time when the ultrasonic signal in the voice signal is no longer included, that is, the voice signal contains the ultrasonic signal. Information indicating the end of the determination period is output to the voice recognition unit 112. The voice recognition unit 112 navigates the recognition result of voice recognition during the period based on the information indicating the start and end of the period in which the voice signal output by the voice recognition control unit 113 is determined to include the ultrasonic signal. It is discarded without being output to the control unit 103.

通知出力部１１４は、音声認識制御部１１３が音声信号に基づいた認識結果をナビゲーション制御部１０３に出力させないよう音声認識部１１２を制御する際に、認識結果を出力させないよう制御した旨を示す通知情報を生成し、生成した通知情報を出力する。
より具体的には、例えば、通知出力部１１４は、音声認識制御部１１３が音声信号に基づいた認識結果をナビゲーション制御部１０３に出力させないよう音声認識部１１２を制御する際に、音声認識制御部１１３から認識結果を出力させないよう制御した旨の情報を取得する。通知出力部１１４は、音声認識制御部１１３から取得した認識結果を出力させないよう制御した旨の情報に基づいて、認識結果を出力させないよう制御した旨を示す通知情報を生成し、例えば、生成した通知情報をナビゲーション制御部１０３に出力する。ナビゲーション制御部１０３は、通知出力部１１４が出力した通知情報を、表示出力部１０４又は音声出力部１０５を介して当該通知情報を表示装置１４又は音声出力装置１５から出力させる。ナビゲーション制御部１０３は、当該通知情報を表示装置１４及び音声出力装置１５の両方から出力させても良い。通知出力部１１４が出力した通知情報を発声した操作者等に知らしめることができれば、通知情報を出力する装置は、表示装置１４及び音声出力装置１５に限定されるものではない。例えば、ナビゲーション制御部１０３は、通知出力部１１４が出力した通知情報に基づいて、例えば、発光ダイオード等のランプ（図示せず）を点灯させても良い。The notification output unit 114 indicates that when the voice recognition control unit 113 controls the voice recognition unit 112 so that the navigation control unit 103 does not output the recognition result based on the voice signal, the notification output unit 114 controls not to output the recognition result. Generate information and output the generated notification information.
More specifically, for example, the notification output unit 114 controls the voice recognition control unit 112 so that the voice recognition control unit 113 does not output the recognition result based on the voice signal to the navigation control unit 103. Information to the effect that the recognition result is not output is acquired from 113. The notification output unit 114 generates notification information indicating that the recognition result is not output, based on the information that the recognition result acquired from the voice recognition control unit 113 is controlled so as not to be output, and is generated, for example. The notification information is output to the navigation control unit 103. The navigation control unit 103 causes the notification information output by the notification output unit 114 to be output from the display device 14 or the voice output device 15 via the display output unit 104 or the voice output unit 105. The navigation control unit 103 may output the notification information from both the display device 14 and the voice output device 15. The devices that output the notification information are not limited to the display device 14 and the voice output device 15 as long as the notification information output by the notification output unit 114 can be notified to the uttering operator or the like. For example, the navigation control unit 103 may turn on a lamp (not shown) such as a light emitting diode based on the notification information output by the notification output unit 114.

なお、通知出力部１１４は、音声認識装置１００において必須な構成ではなく、適宜、音声認識装置１００に追加又は削除することが可能である。
すなわち、音声認識装置１００の要部は、音声信号取得部１１１、音声認識部１１２、及び音声認識制御部１１３により構成されても良い。The notification output unit 114 is not an essential configuration in the voice recognition device 100, and can be added or deleted from the voice recognition device 100 as appropriate.
That is, the main part of the voice recognition device 100 may be composed of a voice signal acquisition unit 111, a voice recognition unit 112, and a voice recognition control unit 113.

図２Ａ及び図２Ｂは、実施の形態１に係る音声認識装置１００の要部のハードウェア構成の一例を示す図である。
図２Ａ及び図２Ｂを参照して、実施の形態１に係る音声認識装置１００の要部のハードウェア構成について説明する。2A and 2B are diagrams showing an example of the hardware configuration of the main part of the voice recognition device 100 according to the first embodiment.
The hardware configuration of the main part of the voice recognition device 100 according to the first embodiment will be described with reference to FIGS. 2A and 2B.

図２Ａに示す如く、音声認識装置１００はコンピュータにより構成されており、当該コンピュータはプロセッサ２０１及びメモリ２０２を有している。メモリ２０２には、当該コンピュータを音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４として機能させるためのプログラムが記憶されている。メモリ２０２に記憶されているプログラムをプロセッサ２０１が読み出して実行することにより、音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４の機能が実現される。 As shown in FIG. 2A, the voice recognition device 100 is composed of a computer, which has a processor 201 and a memory 202. The memory 202 stores a program for causing the computer to function as a voice signal acquisition unit 111, a voice recognition unit 112, a voice recognition control unit 113, and a notification output unit 114. When the processor 201 reads and executes the program stored in the memory 202, the functions of the voice signal acquisition unit 111, the voice recognition unit 112, the voice recognition control unit 113, and the notification output unit 114 are realized.

また、図２Ｂに示す如く、音声認識装置１００は処理回路２０３により構成されても良い。この場合、音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４の機能が処理回路２０３により実現されても良い。 Further, as shown in FIG. 2B, the voice recognition device 100 may be configured by the processing circuit 203. In this case, the functions of the voice signal acquisition unit 111, the voice recognition unit 112, the voice recognition control unit 113, and the notification output unit 114 may be realized by the processing circuit 203.

また、音声認識装置１００はプロセッサ２０１、メモリ２０２及び処理回路２０３により構成されても良い（不図示）。この場合、音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４の機能のうちの一部の機能がプロセッサ２０１及びメモリ２０２により実現されて、残余の機能が処理回路２０３により実現されるものであっても良い。 Further, the voice recognition device 100 may be composed of a processor 201, a memory 202, and a processing circuit 203 (not shown). In this case, some of the functions of the voice signal acquisition unit 111, the voice recognition unit 112, the voice recognition control unit 113, and the notification output unit 114 are realized by the processor 201 and the memory 202, and the remaining functions are processed. It may be realized by the circuit 203.

プロセッサ２０１は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、マイクロプロセッサ、マイクロコントローラ又はＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）を用いたものである。 The processor 201 uses, for example, a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a microprocessor, a microcontroller, or a DSP (Digital Signal Processor).

メモリ２０２は、例えば、半導体メモリ又は磁気ディスクを用いたものである。より具体的には、メモリ２０２は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）又はＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などを用いたものである。 The memory 202 uses, for example, a semiconductor memory or a magnetic disk. More specifically, the memory 202 includes a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Only Memory), and an EEPROM (Electrically Memory). State Drive) or HDD (Hard Disk Drive) or the like is used.

処理回路２０３は、例えば、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、ＳｏＣ（Ｓｙｓｔｅｍ−ｏｎ−ａ−Ｃｈｉｐ）又はシステムＬＳＩ（Ｌａｒｇｅ−ＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）を用いたものである。 The processing circuit 203 includes, for example, an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), an FPGA (Field-Programmable Gate Array), a System-System (System) System, a System LSI. Is used.

図３を参照して、実施の形態１に係る音声認識装置１００の動作について説明する。
図３は、実施の形態１に係る音声認識装置１００の処理の一例を説明するフローチャートである。
音声認識装置１００は、図３に示したフローチャートに示した処理を繰り返し実行する。The operation of the voice recognition device 100 according to the first embodiment will be described with reference to FIG.
FIG. 3 is a flowchart illustrating an example of processing of the voice recognition device 100 according to the first embodiment.
The voice recognition device 100 repeatedly executes the process shown in the flowchart shown in FIG.

まず、ステップＳＴ３０１にて、音声信号取得部１１１は、音声入力部１３から音声信号を取得する。
なお、音声信号取得部１１１は、ステップＳＴ３０１の処理をバックグランド処理により逐次行い、音声認識装置１００は、音声信号取得部１１１が取得した音声信号に対して、ステップＳＴ３０２以降の処理を逐次行うようにしても良い。First, in step ST301, the voice signal acquisition unit 111 acquires a voice signal from the voice input unit 13.
The voice signal acquisition unit 111 sequentially performs the processing of step ST301 by background processing, and the voice recognition device 100 sequentially performs the processing after step ST302 on the voice signal acquired by the voice signal acquisition unit 111. You can do it.

次に、音声認識制御部１１３は、音声信号取得部１１１が取得した音声信号に超音波信号が含まれるか否かを判定する（ステップＳＴ３０２）。 Next, the voice recognition control unit 113 determines whether or not the voice signal acquired by the voice signal acquisition unit 111 includes an ultrasonic signal (step ST302).

ステップＳＴ３０２にて、音声信号に超音波信号が含まれないと判定した場合（ステップＳＴ３０２：ＮＯ）、ステップＳＴ３０３にて、音声認識部１１２は、当該音声信号に基づいて音声認識を行い、認識結果を出力する。
ステップＳＴ３０３の処理の後、音声認識装置１００は、図３に示したフローチャートに示した処理を終了する。音声認識装置１００は、当該フローチャートに示した処理を終了後、ステップＳＴ３０１に戻り、当該フローチャートに示した処理を繰り返し実行する。When it is determined in step ST302 that the voice signal does not include the ultrasonic signal (step ST302: NO), in step ST303, the voice recognition unit 112 performs voice recognition based on the voice signal, and the recognition result. Is output.
After the process of step ST303, the voice recognition device 100 ends the process shown in the flowchart shown in FIG. After completing the process shown in the flowchart, the voice recognition device 100 returns to step ST301 and repeatedly executes the process shown in the flowchart.

ステップＳＴ３０２にて、音声信号に超音波信号が含まれると判定した場合（ステップＳＴ３０２：ＹＥＳ）、ステップＳＴ３０４にて、音声認識制御部１１３は、当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないよう制御する。 When it is determined in step ST302 that the voice signal includes an ultrasonic signal (step ST302: YES), in step ST304, the voice recognition control unit 113 outputs the recognition result based on the voice signal to the voice recognition unit 112. Is controlled so as not to be output to the navigation control unit 103.

ステップＳＴ３０４の後、ステップＳＴ３０５にて、通知出力部１１４は、認識結果を出力させないよう制御した旨を示す通知情報を生成し、生成した通知情報を出力する。
ステップＳＴ３０４の処理の後、音声認識装置１００は、図３に示したフローチャートに示した処理を終了する。音声認識装置１００は、当該フローチャートに示した処理を終了後、ステップＳＴ３０１に戻り、当該フローチャートに示した処理を繰り返し実行する。After step ST304, in step ST305, the notification output unit 114 generates notification information indicating that the recognition result is not output, and outputs the generated notification information.
After the process of step ST304, the voice recognition device 100 ends the process shown in the flowchart shown in FIG. After completing the process shown in the flowchart, the voice recognition device 100 returns to step ST301 and repeatedly executes the process shown in the flowchart.

以上のように、音声認識装置１００は、音声入力部１３から音声信号を取得する音声信号取得部１１１と、音声信号取得部１１１が取得した音声信号に基づいて音声認識を行い、認識結果を出力する音声認識部１１２と、音声信号取得部１１１が取得した音声信号に超音波信号が含まれるか否かを判定し、音声信号に超音波信号が含まれると判定した場合、当該音声信号に基づいた認識結果を音声認識部１１２から出力させないよう制御する音声認識制御部１１３と、を備えた。
このように構成することで、音声認識装置１００は、超音波による音声認識の認識結果出力を抑制できる。As described above, the voice recognition device 100 performs voice recognition based on the voice signal acquisition unit 111 that acquires the voice signal from the voice input unit 13 and the voice signal acquired by the voice signal acquisition unit 111, and outputs the recognition result. It is determined whether or not the voice signal acquired by the voice recognition unit 112 and the voice signal acquisition unit 111 includes an ultrasonic signal, and when it is determined that the voice signal includes an ultrasonic signal, it is based on the voice signal. A voice recognition control unit 113 for controlling the recognition result so as not to be output from the voice recognition unit 112 is provided.
With this configuration, the voice recognition device 100 can suppress the recognition result output of voice recognition by ultrasonic waves.

また、音声認識制御部１１３が音声信号に基づいた認識結果を出力させないよう音声認識部１１２を制御する際に、認識結果を出力させないよう制御した旨を示す通知情報を生成し、生成した通知情報を出力する通知出力部１１４を備えることで、音声認識装置１００は、音声信号に超音波信号が含まれるために認識結果が出力されない旨を、発声した操作者等に知らしめることができる。 Further, when the voice recognition control unit 113 controls the voice recognition unit 112 so as not to output the recognition result based on the voice signal, the notification information indicating that the recognition result is not output is generated and the generated notification information is generated. By providing the notification output unit 114 that outputs the above, the voice recognition device 100 can notify the uttering operator or the like that the recognition result is not output because the voice signal includes the ultrasonic signal.

図４を参照して実施の形態１の変形例に係る音声認識装置１００ａを説明する。
図４は、実施の形態１の変形例に係る音声認識装置１００ａが適用された車載用ナビゲーション装置１０の要部を示すブロック図である。
なお、図４において、図１に示す図と同様の構成には同一符号を付して説明を省略する。The voice recognition device 100a according to the modified example of the first embodiment will be described with reference to FIG.
FIG. 4 is a block diagram showing a main part of the vehicle-mounted navigation device 10 to which the voice recognition device 100a according to the modified example of the first embodiment is applied.
In FIG. 4, the same reference numerals are given to the same configurations as those shown in FIG. 1, and the description thereof will be omitted.

図１に示した実施の形態１に係る音声認識装置１００と、実施の形態１の変形例に係る音声認識装置１００ａとは、以下の点において相違する。 The voice recognition device 100 according to the first embodiment shown in FIG. 1 differs from the voice recognition device 100a according to the modified example of the first embodiment in the following points.

実施の形態１に係る音声認識装置１００の音声認識部１１２は、音声信号取得部１１１が取得した音声信号を、音声信号取得部１１１から直接取得するのに対して、実施の形態１の変形例に係る音声認識装置１００ａの音声認識部１１２ａは、音声信号取得部１１１ａが取得した音声信号を、音声認識制御部１１３ａを介して取得する。
また、実施の形態１に係る音声認識装置１００の音声認識制御部１１３は、音声信号に超音波信号が含まれると判定した場合、音声認識部１１２に音声認識をさせないように、又は、音声信号に超音波信号が含まれると判定している期間の当該音声信号に基づいた認識結果を音声認識部１１２からナビゲーション制御部１０３に出力させないように制御することで、当該音声信号に基づいた認識結果を音声認識部１１２ａからナビゲーション制御部１０３に出力させないよう制御するものであった。これに対して、実施の形態１の変形例に係る音声認識装置１００ａの音声認識制御部１１３ａは、音声信号に超音波信号が含まれると判定した場合、音声認識部１１２ａに当該音声信号を出力しないように制御する、すなわち、音声認識部１１２ａが音声認識するための当該音声信号を取得できないように制御することで、当該音声信号に基づいた認識結果を音声認識部１１２ａからナビゲーション制御部１０３に出力させないよう制御するものである。The voice recognition unit 112 of the voice recognition device 100 according to the first embodiment directly acquires the voice signal acquired by the voice signal acquisition unit 111 from the voice signal acquisition unit 111, whereas the modified example of the first embodiment The voice recognition unit 112a of the voice recognition device 100a according to the above means acquires the voice signal acquired by the voice signal acquisition unit 111a via the voice recognition control unit 113a.
Further, when the voice recognition control unit 113 of the voice recognition device 100 according to the first embodiment determines that the voice signal includes an ultrasonic signal, the voice recognition unit 112 is prevented from performing voice recognition or the voice signal. By controlling so that the voice recognition unit 112 does not output the recognition result based on the voice signal during the period in which it is determined that the ultrasonic signal is included in the navigation control unit 103, the recognition result based on the voice signal is obtained. Was controlled so as not to be output from the voice recognition unit 112a to the navigation control unit 103. On the other hand, when the voice recognition control unit 113a of the voice recognition device 100a according to the modified example of the first embodiment determines that the voice signal includes an ultrasonic signal, the voice recognition control unit 112a outputs the voice signal to the voice recognition unit 112a. By controlling so that the voice recognition unit 112a cannot acquire the voice signal for voice recognition, the recognition result based on the voice signal is transmitted from the voice recognition unit 112a to the navigation control unit 103. It controls not to output.

実施の形態１の変形例に係る音声認識装置１００ａの各構成における機能は、上述の機能以外において、実施の形態１に係る音声認識装置１００の各構成における機能と同様であるため、説明を省略する。
また、実施の形態１の変形例に係る音声認識装置１００ａのハードウェア構成は、実施の形態１に係る音声認識装置１００のハードウェア構成と同様であるため、説明を省略する。すなわち、音声信号取得部１１１ａ、音声認識部１１２ａ、音声認識制御部１１３ａ、及び通知出力部１１４の各々の機能は、プロセッサ２０１及びメモリ２０２により実現されるものであっても良く、又は処理回路２０３により実現されるものであっても良い。Since the functions in each configuration of the voice recognition device 100a according to the modified example of the first embodiment are the same as the functions in each configuration of the voice recognition device 100 according to the first embodiment except for the above-mentioned functions, the description thereof is omitted. To do.
Further, since the hardware configuration of the voice recognition device 100a according to the modified example of the first embodiment is the same as the hardware configuration of the voice recognition device 100 according to the first embodiment, the description thereof will be omitted. That is, each function of the voice signal acquisition unit 111a, the voice recognition unit 112a, the voice recognition control unit 113a, and the notification output unit 114 may be realized by the processor 201 and the memory 202, or the processing circuit 203. It may be realized by.

更に、実施の形態１の変形例に係る音声認識装置１００ａの処理フローは、実施の形態１に係る音声認識装置１００の処理フローと同様であるため、説明を省略する。すなわち、図３に示すフローチャートにおける音声信号取得部１１１、音声認識部１１２、音声認識制御部１１３、及び通知出力部１１４における処理は、それぞれ、音声信号取得部１１１ａ、音声認識部１１２ａ、音声認識制御部１１３ａ、及び通知出力部１１４において処理される。 Further, since the processing flow of the voice recognition device 100a according to the modified example of the first embodiment is the same as the processing flow of the voice recognition device 100 according to the first embodiment, the description thereof will be omitted. That is, the processes in the voice signal acquisition unit 111, the voice recognition unit 112, the voice recognition control unit 113, and the notification output unit 114 in the flowchart shown in FIG. 3 are the voice signal acquisition unit 111a, the voice recognition unit 112a, and the voice recognition control, respectively. It is processed by the unit 113a and the notification output unit 114.

このように構成することで、実施の形態１の変形例に係る音声認識装置１００ａは、超音波による音声認識の認識結果出力を抑制できる。 With this configuration, the voice recognition device 100a according to the modified example of the first embodiment can suppress the recognition result output of voice recognition by ultrasonic waves.

なお、実施の形態１及び実施の形態１の変形例では、車載用ナビゲーション装置１０は、音声認識装置１００，１００ａから取得した認識結果に基づいて、目的地を設定する例を示したが、車載用ナビゲーション装置１０が、音声認識装置１００，１００ａから取得した認識結果に基づいて、動作するのは目的地を設定には限定されない。例えば、車載用ナビゲーション装置１０は、音声認識装置１００，１００ａから取得した認識結果に基づいて、経路の再設定及び案内画像情報の拡大又は縮小表示設定等を行っても良い。また、例えば、車載用ナビゲーション装置１０が車載用オーディオ装置の機能を有している場合、車載用ナビゲーション装置１０は、音声認識装置１００，１００ａから取得した認識結果に基づいて、音楽情報等を再生するための制御を行っても良い。 In the first embodiment and the modified example of the first embodiment, the vehicle-mounted navigation device 10 shows an example in which the destination is set based on the recognition results acquired from the voice recognition devices 100 and 100a. The operation of the navigation device 10 based on the recognition results acquired from the voice recognition devices 100 and 100a is not limited to setting the destination. For example, the vehicle-mounted navigation device 10 may reset the route and set the enlargement or reduction display of the guidance image information based on the recognition result acquired from the voice recognition devices 100 and 100a. Further, for example, when the in-vehicle navigation device 10 has the function of the in-vehicle audio device, the in-vehicle navigation device 10 reproduces music information or the like based on the recognition result acquired from the voice recognition devices 100, 100a. You may control to do so.

実施の形態２．
実施の形態２に係る音声認識装置１００は、一例として、自動音声対話装置５０に適用されるものとして、以下説明する。Embodiment 2.
The voice recognition device 100 according to the second embodiment will be described below, assuming that it is applied to the automatic voice dialogue device 50 as an example.

図５は、実施の形態２に係る音声認識装置１００が適用された自動音声対話装置５０の要部を示すブロック図である。
なお、図５において、図１に示す図と同様の構成には同一符号を付して説明を省略する。FIG. 5 is a block diagram showing a main part of the automatic voice dialogue device 50 to which the voice recognition device 100 according to the second embodiment is applied.
In FIG. 5, the same reference numerals are given to the same configurations as those shown in FIG. 1, and the description thereof will be omitted.

自動音声対話装置５０については、後述する。 The automatic voice dialogue device 50 will be described later.

例文データベース１６は、後述する自動音声対話装置５０が音声認識装置１００から取得した認識結果に基づいて、認識結果に対応する例文を検索するための例文情報が格納された記憶装置である。 The example sentence database 16 is a storage device in which example sentence information for searching an example sentence corresponding to the recognition result is stored based on the recognition result acquired from the voice recognition device 100 by the automatic voice dialogue device 50 described later.

音声入力部１７は、取得した音波を音声信号に変換して、変換した音声信号を後述する音声認識装置１００に出力する、例えば、マイクである。 The voice input unit 17 is, for example, a microphone that converts the acquired sound wave into a voice signal and outputs the converted voice signal to the voice recognition device 100 described later.

音声出力装置１８は、後述する自動音声対話装置５０が出力した音声信号を音声出力する、例えば、スピーカである。 The voice output device 18 is, for example, a speaker that outputs a voice signal output by the automatic voice dialogue device 50 described later.

表示装置１９は、後述する自動音声対話装置５０が出力した画像情報を表示する、例えば、ディスプレイである。 The display device 19 is, for example, a display that displays the image information output by the automatic voice dialogue device 50 described later.

自動音声対話装置５０、例文データベース１６、音声入力部１７、音声出力装置１８、及び表示装置１９により、自動音声対話システムが構成される。 An automatic voice dialogue system is composed of an automatic voice dialogue device 50, an example sentence database 16, a voice input unit 17, a voice output device 18, and a display device 19.

自動音声対話装置５０は、音声認識装置１００、マッチング部１５２、回答作成部１５３、音声生成部１５４、回答音声出力部１５５、及び表示出力部１５６を備える。 The automatic voice dialogue device 50 includes a voice recognition device 100, a matching unit 152, an answer creation unit 153, a voice generation unit 154, an answer voice output unit 155, and a display output unit 156.

マッチング部１５２は、後述する音声認識装置１００から取得した認識結果に基づいて、認識結果に対応する例文を例文情報が格納された例文データベース１６から検索する。
より具体的には、例えば、音声認識装置１００から取得した認識結果が「いまなんじですか」という文字列である場合、当該文字列に対応する「今何時ですか」という文字列を例文データベース１６から検索する。The matching unit 152 searches the example sentence database 16 in which the example sentence information is stored for an example sentence corresponding to the recognition result based on the recognition result acquired from the voice recognition device 100 described later.
More specifically, for example, when the recognition result acquired from the voice recognition device 100 is the character string "What is it now?", The character string "What time is it now?" Corresponding to the character string is used in the example sentence database. Search from 16.

回答作成部１５３は、マッチング部１５２が検索した結果に基づいて、認識結果に対応する回答の文字列を生成する。
より具体的には、例えば、マッチング部１５２が検索した結果が「今何時ですか」という文字列である場合、当該文字列に対応する回答として、例えば、「午後１時１５分です」という文字列を生成する。The answer creation unit 153 generates a character string of the answer corresponding to the recognition result based on the result searched by the matching unit 152.
More specifically, for example, when the result of the search by the matching unit 152 is the character string "what time is it now?", The answer corresponding to the character string is, for example, the character "1:15 pm". Generate a column.

音声生成部１５４は、回答作成部１５３が生成した文字列を音声信号に変換して、後述する回答音声出力部１５５に出力する。 The voice generation unit 154 converts the character string generated by the answer creation unit 153 into a voice signal and outputs it to the answer voice output unit 155, which will be described later.

回答音声出力部１５５は、音声生成部１５４が出力した音声信号をスピーカ等の音声出力装置１８に出力する。 The answer voice output unit 155 outputs the voice signal output by the voice generation unit 154 to a voice output device 18 such as a speaker.

表示出力部１５６は、例えば、マッチング部１５２が認識結果に対応する文字列を例文データベース１６から検索した結果に基づいて、自動音声対話装置５０の状態を示す画像情報を生成して、生成した画像情報を表示装置１９に出力する。より具体的には、例えば、マッチング部１５２が認識結果に対応する文字列を例文データベース１６から検索した結果、当該文字列に対応する例文情報が例文データベース１６に存在しない場合、表示出力部１５６は、音声認識に失敗した旨を示す画像情報を生成して、生成した画像情報を表示装置１９に出力する。 The display output unit 156 generates, for example, image information indicating the state of the automatic voice dialogue device 50 based on the result of the matching unit 152 searching the example sentence database 16 for the character string corresponding to the recognition result, and the generated image. The information is output to the display device 19. More specifically, for example, when the matching unit 152 searches the example sentence database 16 for a character string corresponding to the recognition result and the example sentence information corresponding to the character string does not exist in the example sentence database 16, the display output unit 156 , Image information indicating that the voice recognition has failed is generated, and the generated image information is output to the display device 19.

音声認識装置１００及び音声認識装置１００が有する各構成は、実施の形態１で説明したものと同様であるため、説明を省略する。 Since each configuration of the voice recognition device 100 and the voice recognition device 100 is the same as that described in the first embodiment, the description thereof will be omitted.

なお、実施の形態２に係る音声認識装置１００における音声信号取得部１１１は、音声入力部１７から音声信号を取得する。 The voice signal acquisition unit 111 in the voice recognition device 100 according to the second embodiment acquires a voice signal from the voice input unit 17.

また、実施の形態２に係る音声認識装置１００における通知出力部１１４は、音声認識制御部１１３から取得した認識結果を出力させないよう制御した旨の情報に基づいて、認識結果を出力させないよう制御した旨を示す通知情報を生成し、生成した通知情報を例えば、マッチング部１５２に出力する。マッチング部１５２は、通知出力部１１４が出力した通知情報を、表示出力部１５６又は回答音声出力部１５５を介して表示装置１９又は音声出力装置１８から出力させる。マッチング部１５２は、当該通知情報を表示装置１９及び音声出力装置１８の両方から出力させても良い。音声信号に超音波信号が含まれるために認識結果が出力されない旨を発声した操作者等に知らしめることができれば、通知情報を出力する装置は、表示装置１９及び音声出力装置１８に限定されるものではない。例えば、マッチング部１５２は、通知出力部１１４が出力した通知情報に基づいて、例えば、発光ダイオード等のランプ（図示せず）を点灯させても良い。 Further, the notification output unit 114 in the voice recognition device 100 according to the second embodiment is controlled so as not to output the recognition result based on the information that the recognition result acquired from the voice recognition control unit 113 is controlled not to be output. Notification information indicating that is generated, and the generated notification information is output to, for example, the matching unit 152. The matching unit 152 outputs the notification information output by the notification output unit 114 from the display device 19 or the voice output device 18 via the display output unit 156 or the response voice output unit 155. The matching unit 152 may output the notification information from both the display device 19 and the voice output device 18. If it is possible to notify the operator or the like who has uttered that the recognition result is not output because the voice signal includes an ultrasonic signal, the devices that output the notification information are limited to the display device 19 and the voice output device 18. It's not a thing. For example, the matching unit 152 may turn on a lamp (not shown) such as a light emitting diode based on the notification information output by the notification output unit 114.

なお、実施の形態２に係る通知出力部１１４は、実施の形態１と同様に、音声認識装置１００において必須な構成ではなく、適宜、音声認識装置１００に追加又は削除することが可能である。
すなわち、実施の形態２に係る音声認識装置１００の要部は、音声信号取得部１１１、音声認識部１１２、及び音声認識制御部１１３により構成されても良い。Note that the notification output unit 114 according to the second embodiment is not an essential configuration in the voice recognition device 100 as in the first embodiment, and can be added or deleted from the voice recognition device 100 as appropriate.
That is, the main part of the voice recognition device 100 according to the second embodiment may be composed of a voice signal acquisition unit 111, a voice recognition unit 112, and a voice recognition control unit 113.

実施の形態２に係る音声認識装置１００のハードウェア構成は、実施の形態１に係る音声認識装置１００のハードウェア構成と同様であるため、説明を省略する。 Since the hardware configuration of the voice recognition device 100 according to the second embodiment is the same as the hardware configuration of the voice recognition device 100 according to the first embodiment, the description thereof will be omitted.

実施の形態２に係る音声認識装置１００の処理フローは、実施の形態１に係る音声認識装置１００の処理フローと同様であるため、説明を省略する。 Since the processing flow of the voice recognition device 100 according to the second embodiment is the same as the processing flow of the voice recognition device 100 according to the first embodiment, the description thereof will be omitted.

自動音声対話装置５０は、上述のように例えば、質問した時刻を回答する等の単純な対話に限らず、音声認識装置１００から取得した認識結果に基づいて、例えば、インターネットを介して商品の購入等の商取引を行うものがある。従来の自動音声対話装置は、超音波を受信した場合にも音声認識を行ってしまうため、例えば、悪意の第三者により発せられた超音波により、自動音声対話装置の所有者等の利用者が意図しない商取引が行われてしまうという問題点があった。 As described above, the automatic voice dialogue device 50 is not limited to a simple dialogue such as answering a question time, for example, and purchases a product via the Internet, for example, based on the recognition result obtained from the voice recognition device 100. There are things that carry out commercial transactions such as. Since the conventional automatic voice dialogue device recognizes the voice even when the ultrasonic wave is received, for example, a user such as the owner of the automatic voice dialogue device by the ultrasonic wave emitted by a malicious third party. However, there was a problem that unintended commercial transactions were carried out.

しかしながら、実施の形態２に係る音声認識装置１００が適用された自動音声対話装置５０は、音声信号に超音波信号が含まれると判定した場合、当該音声信号に基づいた認識結果を出力させないよう制御するため、利用者が意図しない商取引を抑制できる。 However, when the automatic voice dialogue device 50 to which the voice recognition device 100 according to the second embodiment is applied determines that the voice signal includes an ultrasonic signal, it is controlled so as not to output a recognition result based on the voice signal. Therefore, it is possible to suppress unintended commercial transactions by the user.

なお、実施の形態２に係る自動音声対話装置５０は、実施の形態１の変形例において説明した音声認識装置１００ａが適用されたものであっても良い。 The automatic voice dialogue device 50 according to the second embodiment may be the one to which the voice recognition device 100a described in the modified example of the first embodiment is applied.

これまでに説明した実施の形態では、音声認識装置１００，１００ａは、音声認識装置１００，１００ａ内に音声認識部１１２，１１２ａを有する例を示したが、この限りではない。例えば、音声認識装置１００，１００ａがインターネット又は公衆回線等の公衆ネットワークに接続するための構成（図示せず）を有し、音声認識装置１００，１００ａは、公衆ネットワーク上に存在する音声認識部１１２，１１２ａを有する音声認識サーバ（図示せず）に当該構成を介して音声信号を送信し、音声認識サーバが当該音声信号に基づいた認識結果を出力し、音声認識装置１００，１００ａは、当該構成を介して音声認識サーバが出力した認識結果を取得しても良い。 In the embodiments described so far, the voice recognition devices 100 and 100a have shown an example in which the voice recognition units 112 and 112a are included in the voice recognition devices 100 and 100a, but the present invention is not limited to this. For example, the voice recognition devices 100 and 100a have a configuration (not shown) for connecting to a public network such as the Internet or a public line, and the voice recognition devices 100 and 100a have a voice recognition unit 112 existing on the public network. , 112a is transmitted to a voice recognition server (not shown) via the configuration, the voice recognition server outputs a recognition result based on the voice signal, and the voice recognition devices 100 and 100a have the configuration. The recognition result output by the voice recognition server may be acquired via.

また、これまでに説明した実施の形態では、音声認識装置１００，１００ａは、音声信号取得部１１１，１１１ａが音声入力部１３，１７から取得した音声信号を音声認識部１１２及び音声認識制御部１１３に出力する例を示したが、この限りではない。例えば、音声信号取得部１１１，１１１ａは、音声入力部１３，１７から取得した音声信号を音声認識部１１２に出力し、音声入力部１３，１７の近傍に配置された超音波を受信するための超音波入力部（図示せず）から取得した超音波信号を音声認識制御部１１３に出力するようにしても良い。ここで、超音波入力部は、例えば、超音波を受信する超音波マイクである。 Further, in the embodiments described so far, in the voice recognition devices 100 and 100a, the voice recognition units 112 and the voice recognition control unit 113 receive the voice signals acquired by the voice signal acquisition units 111 and 111a from the voice input units 13 and 17. An example of output to is shown, but this is not the case. For example, the voice signal acquisition units 111 and 111a output the voice signal acquired from the voice input units 13 and 17 to the voice recognition unit 112 and receive ultrasonic waves arranged in the vicinity of the voice input units 13 and 17. The ultrasonic signal acquired from the ultrasonic input unit (not shown) may be output to the voice recognition control unit 113. Here, the ultrasonic input unit is, for example, an ultrasonic microphone that receives ultrasonic waves.

なお、この発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 It should be noted that, within the scope of the present invention, any combination of the embodiments can be freely combined, any component of the embodiment can be modified, or any component can be omitted in each embodiment. ..

この発明に係る音声認識装置は、利用者が音声により入力操作を行う機器に適用することができる。 The voice recognition device according to the present invention can be applied to a device in which a user performs an input operation by voice.

１車両、１０車載用ナビゲーション装置、１１航法信号受信機、１２地図データベース、１３，１７音声入力部、１４，１９表示装置、１５，１８音声出力装置、１６例文データベース、５０自動音声対話装置、１００，１００ａ音声認識装置、１０１航法信号取得部、１０２地図情報取得部、１０３ナビゲーション制御部、１０４，１５６表示出力部、１０５音声出力部、１１１，１１１ａ音声信号取得部、１１２，１１２ａ音声認識部、１１３，１１３ａ音声認識制御部、１１４通知出力部、１５２マッチング部、１５３回答作成部、１５４音声生成部、１５５回答音声出力部、２０１プロセッサ、２０２メモリ、２０３処理回路。 1 vehicle, 10 in-vehicle navigation device, 11 navigation signal receiver, 12 map database, 13,17 voice input unit, 14,19 display device, 15,18 voice output device, 16 example sentence database, 50 automatic voice dialogue device, 100 , 100a voice recognition device, 101 navigation signal acquisition unit, 102 map information acquisition unit, 103 navigation control unit, 104,156 display output unit, 105 voice output unit, 111,111a voice signal acquisition unit, 112,112a voice recognition unit, 113, 113a Voice recognition control unit, 114 Notification output unit, 152 Matching unit, 153 Answer creation unit, 154 Voice generation unit, 155 Answer voice output unit, 201 processor, 202 memory, 203 processing circuit.

Claims

An audio signal acquisition unit that acquires an audio signal from the audio input unit,
A voice recognition unit that performs voice recognition based on the voice signal acquired by the voice signal acquisition unit and outputs the recognition result.
When it is determined whether or not the voice signal acquired by the voice signal acquisition unit includes an ultrasonic signal and it is determined that the voice signal includes the ultrasonic signal, the recognition result based on the voice signal A voice recognition control unit that controls not to output the sound from the voice recognition unit,
A voice recognition device characterized by being equipped with.

The voice recognition device according to claim 1, wherein when the voice recognition control unit determines that the voice signal includes the ultrasonic signal, the voice recognition control unit controls the voice recognition unit so as not to perform the voice recognition. ..

When the ultrasonic signal contained in the voice signal has a predetermined amplitude or more, the voice recognition control unit determines that the voice signal includes the ultrasonic signal, and the voice signal includes the ultrasonic signal. The voice recognition device according to claim 1, wherein the recognition result based on the voice signal during a period determined to be included is controlled so as not to be output from the voice recognition unit.

When the voice recognition control unit controls the voice recognition unit so as not to output the recognition result based on the voice signal, the notification information indicating that the recognition result is not output is generated and generated. The voice recognition device according to claim 1, further comprising a notification output unit that outputs notification information.

An in-vehicle navigation device comprising the voice recognition device according to any one of claims 1 to 4, and operating based on the recognition result output from the voice recognition device.

An automatic voice dialogue device comprising the voice recognition device according to any one of claims 1 to 4, and operating based on the recognition result output from the voice recognition device.

The audio signal acquisition unit acquires the audio signal from the audio input unit,
The voice recognition unit outputs the recognition result of performing voice recognition based on the voice signal acquired by the voice signal acquisition unit.
When the voice recognition control unit determines whether or not the voice signal acquired by the voice signal acquisition unit contains an ultrasonic signal, and determines that the voice signal includes the ultrasonic signal, the voice Control so that the recognition result based on the signal is not output from the voice recognition unit,
A voice recognition method characterized by.