JP2014202808A

JP2014202808A - Input/output device

Info

Publication number: JP2014202808A
Application number: JP2013076769A
Authority: JP
Inventors: 貴嗣外山; Takashi Toyama; 猪谷　浩和; Hirokazu Itani; 浩和猪谷; 正陽松本; Masaaki Matsumoto; 正史田辺; Masashi Tanabe
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2013-04-02
Filing date: 2013-04-02
Publication date: 2014-10-27

Abstract

PROBLEM TO BE SOLVED: To provide an input/output device capable of changing and outputting a response to an input in accordance with surrounding conditions.SOLUTION: In a voice recognition device 1, the level of a voice signal to be output from a microphone 2 is detected by a level check part 31, and a use case determination part 33 determines whether or not the detected voice signal level is smaller than a predetermined voice signal level. Then, when it is determined that the detected voice signal level is smaller than the predetermined voice signal level, a sound to be output from a speaker is reduced, and the luminance of a display device is decreased.

Description

本発明は、発話した音声を認識する音声認識装置等に用いる入出力装置に関する。 The present invention relates to an input / output device used for a speech recognition device that recognizes spoken speech.

近年、車載機器や携帯機器等において、ボタン等の操作が不要で、音声のみで簡単に操作できるようにするため、音声認識装置（音声認識機能）が内蔵されているものが多い。 In recent years, in-vehicle devices, portable devices, and the like often have a built-in speech recognition device (speech recognition function) so that the operation of buttons or the like is not necessary and the operation can be easily performed only by voice.

この種の音声認識装置においては、入力音声に対して、入力音声に対応する処理結果を音声情報や画像等の表示情報で応答として出力したり、入力を受け付けた旨や認識結果等の応答を音声情報や表示情報で出力したりする。このような応答の方法は、発話者の周囲の状況を考慮せずに、例えば、一定の音声レベルや一定の輝度など必ず決められた方式で行われていた。 In this type of speech recognition apparatus, the processing result corresponding to the input speech is output as a response with display information such as speech information or an image as a response to the input speech, or a response such as acceptance of the input or a recognition result is sent. Output as audio information or display information. Such a response method has been performed in a predetermined manner such as a constant voice level and a constant luminance without considering the situation around the speaker.

音声認識装置で発話者の周囲の状況を考慮して動作する方法としては、特許文献１に記載の方法が一例として挙げられる。特許文献１に記載の音声認識装置は、携帯型情報端末装置の使用状態に応じて入力した音声レベルを適切なレベルに増幅し、認識率低下の防止を可能としている。 As a method of operating in consideration of the situation around the speaker in the speech recognition apparatus, the method described in Patent Document 1 can be cited as an example. The speech recognition apparatus described in Patent Document 1 amplifies the input speech level according to the usage state of the portable information terminal device to an appropriate level, thereby preventing the recognition rate from being lowered.

特許第４２９９７６８号公報Japanese Patent No. 4299768

従来の音声認識装置では、発話者の周囲の状況を考慮せずに、必ず決められた方式で応答するので、発話者が周囲の人間に応答の内容を、聞かれたくない、又は、見られたくない、といった場合には対応できず音声認識装置の利用を控えるしかなかった。そのため、そのような状況ではボタン等による操作を行わなければならず不便に感じるという問題があった。 In the conventional speech recognition apparatus, the response is always made in a predetermined manner without considering the situation around the speaker, so the speaker does not want to see or see the response content by the surrounding people. If you don't want to use it, you can't cope with it and you have to refrain from using the voice recognition device. Therefore, in such a situation, there is a problem that it is inconvenient to operate with a button or the like.

特許文献１に記載された音声認識装置は、あくまで認識率低下を防止するために、入力音声レベルを制御するものであり、上述したような音声認識装置からの応答については何ら考慮されていない。 The speech recognition apparatus described in Patent Document 1 controls the input speech level in order to prevent the recognition rate from being lowered, and does not consider any response from the speech recognition apparatus as described above.

そこで、本発明は、上述した問題に鑑み、例えば、入力に対する応答を周囲の状況に応じて変化させて出力することができる入出力装置を提供することを課題とする。 Therefore, in view of the above-described problems, an object of the present invention is to provide an input / output device capable of changing and outputting a response to an input according to a surrounding situation.

上記課題を解決するために、請求項１に記載の発明は、発話した入力音声を集音する第１集音手段と、前記第１集音手段が集音した前記入力音声を音声認識手段に出力する第１出力手段と、前記音声認識手段からの応答を取得する応答取得手段と、前記応答取得手段が取得した前記応答を出力する第２出力手段と、前記第１集音手段が集音した前記入力音声の音声レベルである入力音声レベルを検出し、その入力音声レベルを予め定めた所定の音声レベルと比較する音声レベル比較手段と、前記音声レベル比較手段が比較した前記入力音声レベルが前記所定の音声レベルよりも小さい場合に、周囲から前記応答が認識しにくくなるように前記第２出力手段の出力を変化させる制御手段と、を有することを特徴としている。 In order to solve the above-mentioned problem, the invention according to claim 1 is characterized in that the first sound collecting means for collecting the uttered input sound and the input sound collected by the first sound collecting means for the sound recognition means. A first output means for outputting; a response acquisition means for acquiring a response from the voice recognition means; a second output means for outputting the response acquired by the response acquisition means; and the first sound collection means for collecting sound. An input voice level that is a voice level of the input voice is detected, and the input voice level compared with the voice level comparison means that compares the input voice level with a predetermined voice level determined in advance. And control means for changing the output of the second output means so that the response is difficult to recognize from the surroundings when the sound level is lower than the predetermined sound level.

請求項７に記載の発明は、発話した入力音声を集音する第１集音手段と、前記入力音声以外の周囲音を集音する第２集音手段と、前記第１集音手段が集音した前記入力音声を音声認識手段に出力する第１出力手段と、前記第２集音手段が集音した周囲音の音声レベルである周囲音レベルを検出する周囲音レベル検出手段と、前記音声認識手段からの応答を取得する応答取得手段と、前記応答取得手段が取得した前記応答を出力する第２出力手段と、前記第１集音手段が集音した前記入力音声の音声レベルである入力音声レベルを検出し、その入力音声レベルと前記周囲音レベル検出手段が検出した前記周囲音レベルとの比を算出する比算出手段と、前記比算出手段が算出した前記比が予め定めた所定の値よりも小さい場合に、周囲から前記応答が認識しにくくなるように前記第２出力手段の出力を変化させる制御手段と、を有することを特徴としている。 According to a seventh aspect of the present invention, the first sound collecting means for collecting spoken input sound, the second sound collecting means for collecting ambient sound other than the input sound, and the first sound collecting means are collected. A first output means for outputting the sung input sound to a voice recognition means; an ambient sound level detection means for detecting an ambient sound level that is an ambient sound level collected by the second sound collection means; and the sound A response acquisition means for acquiring a response from the recognition means; a second output means for outputting the response acquired by the response acquisition means; and an input that is a sound level of the input sound collected by the first sound collection means A ratio calculating means for detecting a sound level and calculating a ratio between the input sound level and the ambient sound level detected by the ambient sound level detecting means; and the ratio calculated by the ratio calculating means is a predetermined value determined in advance. If the value is smaller than the There is characterized by and a control means for changing the output of said second output means to be hard to recognize.

請求項８に記載の発明は、発話した入力音声に対して音声認識手段からの応答を出力する入出力装置における入出力方法であって、前記入力音声を集音する第１集音手段が集音した音声の音声レベルである入力音声レベルを検出し、その入力音声レベルを予め定めた所定の音声レベルと比較する音声レベル比較工程と、前記音声レベル比較工程で比較した前記入力音声レベルが前記所定の音声レベルよりも小さい場合に、周囲から前記音声認識手段の応答が認識しにくくなるように、前記応答の出力を変化させる制御工程と、を含むことを特徴としている。 The invention according to claim 8 is an input / output method in an input / output device for outputting a response from the voice recognition means to the spoken input voice, wherein the first sound collecting means for collecting the input voice collects the input voice. An input sound level that is a sound level of the sound that has been sounded is detected, and the input sound level that is compared in the sound level comparing step is compared with a sound level comparing step that compares the input sound level with a predetermined sound level that is determined in advance. And a control step of changing the output of the response so that the response of the voice recognition means is difficult to recognize from the surroundings when the level is lower than a predetermined voice level.

請求項９に記載の発明は、請求項８に記載の音声認識方法をコンピュータにより実行させることを特徴としている。 The invention according to claim 9 is characterized in that the speech recognition method according to claim 8 is executed by a computer.

請求項１０に記載の発明は、請求項９に記載の音声認識プログラムを格納したことを特徴としている。 The invention described in claim 10 is characterized in that the voice recognition program described in claim 9 is stored.

請求項１１に記載の発明は、発話した入力音声に対して音声認識手段からの応答を出力する入出力装置における入出力方法であって、前記入力音声を集音する第１集音手段が集音した音声の音声レベルである入力音声レベルを検出し、前記入力音声以外の周囲音を集音する第２集音手段が集音した周囲音の音声レベルである周囲音レベルを検出し、前記入力音声レベルと前記周囲音レベルとの比を算出する比算出工程と、前記比算出工程で比較した前記比が予め定めた所定の値よりも小さい場合に、周囲から前記音声認識手段の応答が認識しにくくなるように、前記応答の出力を変化させる制御工程と、を含むことを特徴としている。 The invention according to claim 11 is an input / output method in an input / output device for outputting a response from the voice recognition means to the spoken input voice, wherein the first sound collection means for collecting the input voice is collected. Detecting an input sound level that is a sound level of the sound that has been sounded, and detecting an ambient sound level that is a sound level of the ambient sound collected by the second sound collecting means for collecting the ambient sound other than the input sound; When the ratio calculated in the ratio between the input voice level and the ambient sound level and the ratio compared in the ratio calculating step is smaller than a predetermined value, the response of the voice recognition means from the surroundings And a control step of changing the output of the response so as to be difficult to recognize.

請求項１２に記載の発明は、請求項１１に記載の入出力方法をコンピュータにより実行させることを特徴としている。 The invention described in claim 12 is characterized in that the input / output method described in claim 11 is executed by a computer.

請求項１３に記載の発明は、請求項１２に記載の入出力プログラムを格納したことを特徴としている。 The invention described in claim 13 is characterized in that the input / output program described in claim 12 is stored.

本発明の第１の実施例にかかる入出力装置の構成図である。1 is a configuration diagram of an input / output device according to a first embodiment of the present invention. 図１に示された入出力装置の動作のフローチャートである。2 is a flowchart of the operation of the input / output device shown in FIG. 1. 本発明の第２の実施例にかかる入出力装置の構成図である。It is a block diagram of the input / output apparatus concerning the 2nd Example of this invention. 図２に示された入出力装置の動作のフローチャートである。It is a flowchart of operation | movement of the input / output device shown by FIG. 本発明の他の実施例にかかる入出力装置の構成図である。It is a block diagram of the input / output device concerning the other Example of this invention. 本発明の他の実施例にかかる入出力装置の構成図である。It is a block diagram of the input / output device concerning the other Example of this invention.

以下、本発明の一実施形態にかかる入出力装置を説明する。本発明の一実施形態にかかる入出力装置は、発話した入力音声を集音する第１集音手段と、第１集音手段が集音した入力音声を音声認識手段に出力する第１出力手段と、音声認識手段からの応答を取得する応答取得手段と、応答取得手段が取得した応答を出力する第２出力手段と、を有している。そして、第１集音手段が集音した入力音声の音声レベルである入力音声レベルを検出し、その入力音声レベルを予め定めた所定の音声レベルと比較する音声レベル比較手段と、音声レベル比較手段が比較した入力音声レベルが所定の音声レベルよりも小さい場合に、周囲から応答が認識しにくくなるように第２出力手段の出力を変化させる制御手段と、を更に有している。このようにすることにより、入力音声の音声レベルが小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくないと判断して第２出力手段の出力を周囲から認識しにくくなるように変化させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させて出力することができる。 Hereinafter, an input / output device according to an embodiment of the present invention will be described. An input / output device according to an embodiment of the present invention includes a first sound collecting unit that collects spoken input sound, and a first output unit that outputs the input sound collected by the first sound collecting unit to the sound recognition unit. And a response acquisition unit that acquires a response from the voice recognition unit, and a second output unit that outputs the response acquired by the response acquisition unit. A voice level comparing means for detecting an input voice level that is a voice level of the input voice collected by the first sound collecting means and comparing the input voice level with a predetermined voice level; a voice level comparing means; And a control means for changing the output of the second output means so that the response is difficult to recognize from the surroundings when the input sound level compared with is lower than the predetermined sound level. By doing so, when the voice level of the input voice is low, it is determined that the user does not want to hear or see the voice recognition response, and recognizes the output of the second output means from the surroundings. It can be changed to be difficult. Therefore, the response to the input can be changed according to the surrounding situation and output.

また、第２出力手段は、応答を音として出力する音声出力手段を有し、制御手段は、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、音声出力手段から出力される音を小さくしてもよい。このようにすることにより、音声認識の応答を周囲に聞かれたくない場合にスピーカ等の音声出力手段から出力される音を小さくすることができる。 The second output means has audio output means for outputting the response as sound, and the control means outputs from the audio output means when the result of comparison by the audio level comparison means is smaller than a predetermined audio level. The sound to be played may be reduced. By doing so, it is possible to reduce the sound output from the sound output means such as a speaker when it is not desired to hear the voice recognition response in the surroundings.

また、第２出力手段は、応答を画像として表示する表示手段を更に有し、制御手段は、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、表示手段の表示を停止させるとともに、音声出力手段から出力される音を小さくしてもよい。このようにすることにより、音声出力手段と表示手段の双方を有する際には、表示手段の表示を止めてスピーカ等の音声出力手段から出力される音を小さくして出力することができる。 The second output means further includes display means for displaying the response as an image, and the control means displays the display means when the result of comparison by the sound level comparison means is smaller than a predetermined sound level. While stopping, you may make the sound output from an audio | voice output means small. In this way, when both the audio output means and the display means are provided, it is possible to stop the display of the display means and reduce the sound output from the audio output means such as a speaker.

また、第２出力手段は、応答を音として外部音声出力手段から出力させるための出力インタフェースを更に有し、制御手段は、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、出力インタフェースのみに応答を出力させるようにしてもよい。このようにすることにより、音声認識の応答を周囲に聞かれたくない場合に、イヤホンなどの外部音声出力手段のみから音を出力させることができる。 The second output means further includes an output interface for outputting a response as a sound from the external sound output means, and the control means is a case where the result of comparison by the sound level comparison means is smaller than a predetermined sound level. In addition, the response may be output only to the output interface. By doing so, it is possible to output sound only from an external sound output means such as an earphone when it is not desired to hear a voice recognition response in the surroundings.

また、第２出力手段は、応答を画像として表示する表示手段を有し、制御手段は、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、画像が周囲から認識しにくくなるように表示手段の表示を変化させるようにしてもよい。このようにすることにより、音声認識の応答を周囲に見られたくない場合に、例えば、液晶ディスプレイ等の表示手段の輝度や視認角度などを変化させることができる。 The second output means has display means for displaying the response as an image, and the control means recognizes the image from the surroundings when the result of comparison by the sound level comparison means is smaller than a predetermined sound level. You may make it change the display of a display means so that it may become difficult. In this way, when it is not desired to see a voice recognition response in the surroundings, for example, the brightness or viewing angle of a display means such as a liquid crystal display can be changed.

また、第２出力手段は、応答を音として出力する音声出力手段を更に有し、制御手段は、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、音声出力手段の出力を停止させるとともに、画像が周囲から認識しにくくなるように表示手段の表示を変化させてもよい。このようにすることにより、音声出力手段と表示手段の双方を有する際には、音声出力手段からの音の出力を止めて表示装置の表示を認識しにくくすることができる。 The second output means further includes an audio output means for outputting the response as a sound, and the control means outputs the response of the audio output means when the result of comparison by the audio level comparison means is smaller than a predetermined audio level. While stopping the output, the display of the display means may be changed so that the image is difficult to recognize from the surroundings. In this way, when both the audio output means and the display means are provided, the sound output from the audio output means can be stopped to make it difficult to recognize the display on the display device.

また、本発明の一実施形態にかかる入出力装置は、発話した入力音声を集音する第１集音手段と、入力音声以外の周囲音を集音する第２集音手段と、第１集音手段が集音した入力音声を音声認識手段に出力する第１出力手段と、第２集音手段が集音した周囲音の音声レベルである周囲音レベルを検出する周囲音レベル検出手段と、音声認識手段からの応答を取得する応答取得手段と、応答取得手段が取得した応答を出力する第２出力手段と、を有している。そして、第１集音手段が集音した入力音声の音声レベルである入力音声レベルを検出し、その入力音声レベルと周囲音レベル検出手段が検出した周囲音レベルとの比を算出する比算出手段と、比算出手段が算出した比が予め定めた所定の値よりも小さい場合に、周囲から応答が認識しにくくなるように第２出力手段の出力を変化させる制御手段と、とを更に有している。このようにすることにより、発話者の周囲の状況を入力音声と周囲音との比から判断することができる。つまり、発話した入力音声レベルと周囲音レベルの比（Ｓ／Ｎ比）が小さい場合は周囲に人が多い状況であって小さい声で発話していると判断できるので、音声認識の応答を周囲に聞かれたくない、又は、見られたくないとして出力手段の出力を変化させることができる。 An input / output apparatus according to an embodiment of the present invention includes a first sound collecting unit that collects spoken input sound, a second sound collecting unit that collects ambient sounds other than the input sound, and a first collection. A first output means for outputting the input sound collected by the sound means to the voice recognition means; an ambient sound level detection means for detecting an ambient sound level that is the sound level of the ambient sound collected by the second sound collection means; Response acquisition means for acquiring a response from the voice recognition means, and second output means for outputting the response acquired by the response acquisition means. Then, a ratio calculating unit that detects an input sound level that is a sound level of the input sound collected by the first sound collecting unit and calculates a ratio between the input sound level and the ambient sound level detected by the ambient sound level detecting unit. And a control means for changing the output of the second output means so that the response is difficult to recognize from the surroundings when the ratio calculated by the ratio calculating means is smaller than a predetermined value. ing. In this way, the situation around the speaker can be determined from the ratio between the input voice and the ambient sound. In other words, when the ratio of the spoken input voice level to the ambient sound level (S / N ratio) is small, it can be determined that there are many people around and the voice is spoken with low voice, It is possible to change the output of the output means so that the user does not want to be heard or wants to be seen.

また、本発明の一実施形態にかかる入出力方法は、発話した入力音声に対して音声認識手段からの応答を出力する入出力装置における入出力方法であって、入力音声を集音する第１集音手段が集音した音声の音声レベルである入力音声レベルを検出し、その入力音声レベルを予め定めた所定の音声レベルと比較する音声レベル比較工程と、音声レベル比較工程で比較した入力音声レベルが所定の音声レベルよりも小さい場合に、周囲から音声認識手段の応答が認識しにくくなるように、応答の出力を変化させる制御工程と、を含んでいる。このようにすることにより、入力音声の音声レベルが小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくないと判断して応答の出力を変化させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させて出力することができる。 An input / output method according to an embodiment of the present invention is an input / output method in an input / output device that outputs a response from speech recognition means to spoken input speech, and is a first method for collecting input speech. A voice level comparison step of detecting an input voice level that is a voice level of the voice collected by the sound collecting means and comparing the input voice level with a predetermined voice level, and an input voice compared in the voice level comparison step And a control step of changing the output of the response so that the response of the voice recognition means is difficult to recognize from the surroundings when the level is lower than the predetermined voice level. In this way, when the voice level of the input voice is small, it is possible to change the output of the response by determining that the voice recognition response is not desired to be heard or viewed. Therefore, the response to the input can be changed according to the surrounding situation and output.

また、上述した入出力方法をコンピュータにより実行させる入出力プログラムとして構成してもよい。このようにすることにより、コンピュータを利用して、入力音声の音声レベルが小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくないと判断して応答の出力を変化させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させて出力することができる。 The input / output method described above may be configured as an input / output program that is executed by a computer. In this way, if the input speech level is low, it is determined that you do not want to hear the voice recognition response in the surroundings or you do not want to see it, and change the response output. Can be made. Therefore, the response to the input can be changed according to the surrounding situation and output.

また、上述した音声認識プログラムをコンピュータ読み取り可能な記録媒体に格納してもよい。このようにすることにより、当該プログラムを機器に組み込む以外に単体でも流通させることができ、バージョンアップ等も容易に行える。 Further, the above-described voice recognition program may be stored in a computer-readable recording medium. In this way, the program can be distributed as a single unit in addition to being incorporated in the device, and version upgrades can be easily performed.

また、本発明の一実施形態にかかる入出力方法は、発話した入力音声に対して音声認識手段からの応答を出力する入出力装置における入出力方法であって、入力音声を集音する第１集音手段が集音した音声の音声レベルである入力音声レベルを検出し、入力音声以外の周囲音を集音する第２集音手段が集音した周囲音の音声レベルである周囲音レベルを検出し、入力音声レベルと周囲音レベルとの比を算出する比算出工程と、比算出工程で比較した比が予め定めた所定の値よりも小さい場合に、周囲から音声認識手段の応答が認識しにくくなるように、応答の出力を変化させる制御工程と、を含んでいる。このようにすることにより、発話者の周囲の状況を入力音声と周囲音との比から判断することができる。つまり、発話した入力音声レベルと周囲音レベルの比（Ｓ／Ｎ比）が小さい場合は周囲に人が多い状況であって小さい声で発話していると判断できるので、音声認識の応答を周囲に聞かれたくない、又は、見られたくないとして出力手段の出力を変化させることができる。 An input / output method according to an embodiment of the present invention is an input / output method in an input / output device that outputs a response from speech recognition means to spoken input speech, and is a first method for collecting input speech. The input sound level that is the sound level of the sound collected by the sound collecting means is detected, and the ambient sound level that is the sound level of the ambient sound collected by the second sound collecting means that collects ambient sounds other than the input sound is obtained. When the ratio calculation step for detecting and calculating the ratio between the input voice level and the ambient sound level and the ratio compared in the ratio calculation step is smaller than a predetermined value, the response of the voice recognition means is recognized from the surroundings. And a control step of changing the output of the response so as to be difficult to perform. In this way, the situation around the speaker can be determined from the ratio between the input voice and the ambient sound. In other words, when the ratio of the spoken input voice level to the ambient sound level (S / N ratio) is small, it can be determined that there are many people around and the voice is spoken with low voice, It is possible to change the output of the output means so that the user does not want to be heard or wants to be seen.

また、上述した入出力方法をコンピュータにより実行させる入出力プログラムとして構成してもよい。このようにすることにより、コンピュータを利用して、Ｓ／Ｎ比が小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくないと判断して応答の出力を変化させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させて出力することができる。 The input / output method described above may be configured as an input / output program that is executed by a computer. By doing so, when the S / N ratio is small, it is determined that the user does not want to hear the voice recognition response in the surroundings or does not want to see it, and changes the response output. be able to. Therefore, the response to the input can be changed according to the surrounding situation and output.

本発明の第１の実施例にかかる入出力装置を有する音声認識装置を図１および図２を参照して説明する。音声認識装置１は図１に示すように、マイク２と、制御装置３と、外部出力装置４と、を有している。 A speech recognition apparatus having an input / output device according to a first embodiment of the present invention will be described with reference to FIGS. As shown in FIG. 1, the voice recognition device 1 includes a microphone 2, a control device 3, and an external output device 4.

第１集音手段としてのマイク２は、ユーザが発話した音声（入力音声）を集音して電気信号に変換し音声信号として制御装置３に出力する。 The microphone 2 as the first sound collecting means collects the voice (input voice) uttered by the user, converts it into an electrical signal, and outputs it as an audio signal to the control device 3.

制御装置３は、レベルチェック部３１と、音声認識エンジン部３２と、ユースケース判断部３３と、を有している。制御装置３は、例えばマイクロコンピュータ（マイコン）やデジタルシグナルプロセッサ（ＤＳＰ）、あるいはＡＳＩＣ（Application Specific Integrated Circuit）などで構成されている。 The control device 3 includes a level check unit 31, a speech recognition engine unit 32, and a use case determination unit 33. The control device 3 is constituted by, for example, a microcomputer, a digital signal processor (DSP), or an ASIC (Application Specific Integrated Circuit).

第１出力手段、音声レベル比較手段としてのレベルチェック部３１は、マイク２から入力された音声信号を音声認識エンジン部３２に出力する。即ち、第１集音手段が集音した入力音声を音声認識手段に出力する。レベルチェック部３１は、マイク２から入力された音声信号のレベルを検出して入力音声レベルとしてユースケース判断部３３に出力する。即ち、第１集音手段が集音した入力音声の音声レベルである入力音声レベルを検出する。なお、本明細書における音声信号のレベルとは対象とする音の大きさを示し、例えば、音声信号の振幅の最大値や平均値などを示している。 The level check unit 31 serving as the first output unit and the voice level comparison unit outputs the voice signal input from the microphone 2 to the voice recognition engine unit 32. In other words, the input voice collected by the first sound collecting means is output to the voice recognition means. The level check unit 31 detects the level of the audio signal input from the microphone 2 and outputs it to the use case determination unit 33 as the input audio level. That is, the input sound level that is the sound level of the input sound collected by the first sound collecting means is detected. Note that the level of the audio signal in this specification indicates the volume of the target sound, for example, the maximum value or average value of the amplitude of the audio signal.

音声認識エンジン部３２は、レベルチェック部３１から入力された音声信号をデジタル信号に変換し音声認識処理を行う（レベルチェック部３１でデジタル信号に変換してもよい）。音声認識処理は、統計的手法、動的時間伸縮法、隠れマルコフモデルなど公知の方法を用いればよく特に限定されない。音声認識エンジン部３２は、音声認識処理の結果に関する応答を外部出力装置４に出力する。音声認識処理の結果に関する応答とは、発話された音声内容に対する回答にかかる音声情報や表示情報に限らず、当該音声を認識したことを示す音声情報や表示情報、または当該音声を認識出来なかったことを示す音声情報や表示情報、あるいは次の命令等の入力を促す音声情報や表示情報等も含む。 The voice recognition engine unit 32 converts the voice signal input from the level check unit 31 into a digital signal and performs voice recognition processing (the level check unit 31 may convert the signal into a digital signal). The speech recognition processing is not particularly limited as long as a known method such as a statistical method, a dynamic time expansion / contraction method, a hidden Markov model, or the like is used. The voice recognition engine unit 32 outputs a response related to the result of the voice recognition process to the external output device 4. The response related to the result of the speech recognition process is not limited to the speech information and display information related to the response to the spoken speech content, but the speech information and display information indicating that the speech has been recognized, or the speech could not be recognized. This also includes voice information and display information indicating that, or voice information and display information that prompts input of the next command or the like.

また、音声認識エンジン部３２は、音声認識した結果、図示しない他の処理装置等に対する命令であった場合は当該他の処理装置に対して命令を出力する。なお、この他の処理装置は、音声認識装置１と一体的に構成されているものに限らず、着脱自在またはネットワーク等を介して無線または有線で通信するようになっていてもよい。図１に示した構成の場合は、制御装置３に音声認識エンジン部３２が含まれているので、音声認識エンジン部３２が、音声認識手段と音声認識手段からの応答を取得する応答取得手段とを兼ねる。 In addition, when the voice recognition engine unit 32 is a command for another processing device (not shown) as a result of voice recognition, the voice recognition engine unit 32 outputs a command to the other processing device. The other processing device is not limited to the one configured integrally with the voice recognition device 1, but may be detachable or communicated wirelessly or via a network or the like. In the case of the configuration shown in FIG. 1, since the speech recognition engine unit 32 is included in the control device 3, the speech recognition engine unit 32 includes a speech recognition unit and a response acquisition unit that acquires a response from the speech recognition unit. Doubles as

音声レベル比較手段、制御手段としてのユースケース判断部３３は、レベルチェック部３１で検出した入力音声レベルが、予め定めた所定の音声信号レベル（予め定めた所定の音声レベル）よりも小さい場合は、周囲に音声認識の応答を聞かれたくない、又は、見られたくない状況を示すモードであるプライベートモードと判断し、外部出力装置４に対して当該プライベートモードに対応した出力に変化させるよう制御信号を出力する。即ち、入力音声レベルを予め定めた所定の音声レベルと比較している。そして、音声レベル比較手段が比較した入力音声レベルが所定の音声レベルよりも小さい場合に、周囲から応答が認識しにくくなるように第２出力手段の出力を変化させている。 The use case determination unit 33 as the sound level comparison unit and the control unit, when the input sound level detected by the level check unit 31 is smaller than a predetermined sound signal level (predetermined predetermined sound level). Control is made so that the external output device 4 is changed to an output corresponding to the private mode by determining that the private mode is a mode indicating a situation in which it is not desired to hear or recognize a voice recognition response. Output a signal. That is, the input voice level is compared with a predetermined voice level. Then, when the input sound level compared by the sound level comparison means is smaller than the predetermined sound level, the output of the second output means is changed so that the response is difficult to recognize from the surroundings.

なお、入力音声レベルが小さいと音声認識エンジン部３２における認識率が低下する可能性があるため、予め定めた所定の音声信号レベルは、音声認識エンジン部３２における認識率が低下しない範囲で定めることが望ましい。あるいは、特許文献１に記載された処理など周囲の雑音の影響を少なくするような処理を施した上で音声認識処理を行うようにしても良い。 Note that if the input speech level is low, the recognition rate in the speech recognition engine unit 32 may decrease. Therefore, the predetermined predetermined speech signal level is determined within a range in which the recognition rate in the speech recognition engine unit 32 does not decrease. Is desirable. Alternatively, the speech recognition processing may be performed after performing processing that reduces the influence of ambient noise such as the processing described in Patent Document 1.

なお、図１では、制御装置３は、レベルチェック部３１と、音声認識エンジン部３２と、ユースケース判断部３３が一体的に構成されているが、それに限らない。例えば、それぞれ個別の部品（マイコン、ＤＳＰ、ＡＳＩＣ等）で構成されていてもよい。 In FIG. 1, the control device 3 includes a level check unit 31, a speech recognition engine unit 32, and a use case determination unit 33, but is not limited thereto. For example, it may be configured by individual components (microcomputer, DSP, ASIC, etc.).

第２出力手段としての外部出力装置４は、音声出力手段としての音声出力部４１と、表示手段としての表示部４２と、を有している。音声出力部４１は、音声認識エンジン部３２から出力された音声認識処理の結果に関する応答のうち、音声情報で入力された応答を音声として出力するスピーカと、スピーカに出力する音量を制御するアンプ等を有している。表示部４２は、音声認識エンジン部３２から出力された音声認識処理の結果に関する応答のうち、表示情報で入力された応答を画像（テキストのみの情報も含む）として表示する液晶ディスプレイや有機ＥＬ（Electro Luminescence）ディスプレイ等の表示デバイスと、その表示デバイスの表示を制御するドライバ回路等を有している。即ち、外部出力装置４は、応答取得手段が取得した応答を出力する。 The external output device 4 as the second output means includes an audio output unit 41 as an audio output means and a display unit 42 as a display means. The voice output unit 41 includes a speaker that outputs, as a voice, a response that is input as voice information among responses related to the result of the voice recognition process output from the voice recognition engine unit 32, an amplifier that controls a volume output to the speaker, and the like. have. The display unit 42 is a liquid crystal display or an organic EL (an organic EL) that displays a response input as display information as an image (including text-only information) among responses related to the result of the speech recognition process output from the speech recognition engine unit 32. Electro Luminescence) A display device such as a display and a driver circuit for controlling display of the display device. That is, the external output device 4 outputs the response acquired by the response acquisition means.

そして、ユースケース判断部３３がプライベートモードと判断して出力を変化させるような制御信号が入力されると、音声出力部４１は、スピーカから出力される音が小さくなるようにアンプ等が増幅率を変化させる。即ち、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、音声出力手段から出力される音を小さくする。また、表示部４２は、表示デバイスの輝度を低下させるようにドライバ回路が制御する。即ち、音声レベル比較手段が比較した結果が、所定の音声レベルよりも小さい場合に、画像が周囲から認識しにくくなるように表示手段の表示を変化させる。 When the use case determination unit 33 determines that the private mode is selected and a control signal that changes the output is input, the audio output unit 41 causes the amplifier or the like to increase the amplification factor so that the sound output from the speaker is reduced. To change. That is, when the result of comparison by the sound level comparison means is smaller than a predetermined sound level, the sound output from the sound output means is reduced. The display unit 42 is controlled by a driver circuit so as to reduce the luminance of the display device. That is, when the result of comparison by the sound level comparison means is smaller than a predetermined sound level, the display of the display means is changed so that the image is difficult to recognize from the surroundings.

上述した説明から明らかなように、マイク２、レベルチェック部３１、ユースケース判断部３３、外部出力装置４で、本発明の第１の実施例にかかる入出力装置１０を構成する。 As is clear from the above description, the microphone 2, the level check unit 31, the use case determination unit 33, and the external output device 4 constitute the input / output device 10 according to the first embodiment of the present invention.

次に、上述した構成の入出力装置１０の動作を図２のフローチャートを参照して説明する。図２に示したフローチャートは制御装置３で実行される。 Next, the operation of the input / output device 10 configured as described above will be described with reference to the flowchart of FIG. The flowchart shown in FIG. 2 is executed by the control device 3.

まず、ステップＳ１１において、入力音声の音声信号がマイク２からレベルチェック部３１に入力されてステップＳ１２に進む。 First, in step S11, the voice signal of the input voice is input from the microphone 2 to the level check unit 31, and the process proceeds to step S12.

次に、ステップＳ１２において、レベルチェック部３１が、マイク２から入力された入力音声の音声信号の入力音声レベルを検出してユースケース判断部３３に出力し、ステップＳ１３に進む。 Next, in step S12, the level check unit 31 detects the input sound level of the sound signal of the input sound input from the microphone 2 and outputs it to the use case determination unit 33, and the process proceeds to step S13.

次に、ステップＳ１３において、ユースケース判断部３３が、レベルチェック部３１で検出した入力音声レベルと、予め定めた所定の音声信号レベルと、を比較し、所定の音声信号レベルより小さい場合（ＹＥＳの場合）はステップＳ１４に進み、所定の音声信号レベル以上の場合（ＮＯの場合）はステップＳ１５に進む。即ち、ステップＳ１２とＳ１３で、音声レベル比較工程として機能する。 Next, in step S13, the use case determination unit 33 compares the input audio level detected by the level check unit 31 with a predetermined audio signal level that is determined in advance, and if it is smaller than the predetermined audio signal level (YES) In the case of (1), the process proceeds to step S14, and in the case of a predetermined audio signal level or higher (in the case of NO), the process proceeds to step S15. That is, in steps S12 and S13, it functions as an audio level comparison process.

次に、ステップＳ１４において、ステップＳ１３で所定の音声信号レベルより小さいと判断されたので、ユースケース判断部３３が、プライベートモードとして外部出力装置４の出力を周囲から認識しにくくなるように変化させる（出力制御）。具体的には上述したように、音声出力部４１は、スピーカから出力される音がデフォルトの音量よりも小さくなるようにアンプ等に増幅率を変化させ、表示部４２は、表示デバイスの輝度をデフォルトの輝度よりも低下させるようにドライバ回路に制御させる。即ち、本ステップは制御工程として機能する。ここで、デフォルトの音量、輝度とは音声認識装置１が初期状態の音量、輝度とする。 Next, in step S14, since it is determined in step S13 that the level is lower than the predetermined audio signal level, the use case determination unit 33 changes the output of the external output device 4 as a private mode so that it is difficult to recognize from the surroundings. (Output control). Specifically, as described above, the audio output unit 41 changes the amplification factor to an amplifier or the like so that the sound output from the speaker is smaller than the default volume, and the display unit 42 adjusts the luminance of the display device. The driver circuit is controlled to lower the default brightness. That is, this step functions as a control process. Here, the default volume and luminance are the volume and luminance of the voice recognition device 1 in the initial state.

一方、ステップＳ１５においては、ステップＳ１３で所定のレベル以上と判断されたので、ユースケース判断部３３が、通常モードとしてデフォルトの音量および輝度とする。つまり、本ステップ実行前がデフォルトの音量および輝度であった場合は、そのまま変化させない。本ステップ実行前がデフォルトの音量および輝度よりも低下させていた場合は、デフォルトの音量および輝度に戻す。 On the other hand, in step S15, since it is determined that the level is equal to or higher than the predetermined level in step S13, the use case determination unit 33 sets the default volume and luminance as the normal mode. That is, if the default volume and brightness before the execution of this step are not changed. If the volume and brightness are lower than the default volume before executing this step, the default volume and brightness are restored.

本実施例によれば、音声認識装置１において、マイク２から出力された入力音声レベルをレベルチェック部３１が検出し、ユースケース判断部３３が、検出された入力音声レベルが予め定められた所定の音声信号レベルより小さいか否か判断する。そして、入力音声レベルが予め定められた所定の音声信号レベルより小さい場合は、スピーカから出力される音を小さくするとともに表示デバイスの輝度を低下させる。このようにすることにより、入力音声レベルが小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくない状況と判断して音を小さくしたり、輝度を低下させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させて出力することができる。 According to the present embodiment, in the voice recognition device 1, the level check unit 31 detects the input voice level output from the microphone 2, and the use case determination unit 33 sets the detected input voice level to a predetermined value. It is determined whether or not the audio signal level is lower. When the input sound level is smaller than a predetermined sound signal level, the sound output from the speaker is reduced and the luminance of the display device is decreased. In this way, when the input voice level is low, it may be judged that the voice recognition response is not desired to be heard around or is not desired to be seen, and the sound is reduced or the luminance is reduced. it can. Therefore, the response to the input can be changed according to the surrounding situation and output.

次に、本発明の第２の実施例にかかる音声認識装置１を図３および図４を参照して説明する。なお、前述した第１の実施例と同一部分には、同一符号を付して説明を省略する。 Next, a speech recognition apparatus 1 according to a second embodiment of the present invention will be described with reference to FIGS. The same parts as those in the first embodiment described above are denoted by the same reference numerals and description thereof is omitted.

本実施例にかかる入出力装置１０は、図１に示した音声認識装置１に対してマイク５が追加されている。第２集音手段としてのマイク５は、ユーザが発話する音声を集音するのではなく、音声認識装置１の周囲の音（周囲音）を集音する。即ち、発話した入力音声以外の周囲音を集音する。 In the input / output device 10 according to the present embodiment, a microphone 5 is added to the voice recognition device 1 shown in FIG. The microphone 5 as the second sound collecting unit collects sounds around the voice recognition device 1 (ambient sounds), not collecting voices uttered by the user. That is, ambient sounds other than the spoken input voice are collected.

マイク５で集音された周囲音はレベルチェック部３１でレベルを検出し、その音声信号のレベル（周囲音レベル）をユースケース判断部３３に出力する。即ち、レベルチェック部３１が、第２集音手段が集音した周囲音の音声レベルである周囲音レベルを検出する周囲音レベル検出手段として機能する。 The level of the ambient sound collected by the microphone 5 is detected by the level check unit 31, and the level of the audio signal (ambient sound level) is output to the use case determination unit 33. That is, the level check unit 31 functions as an ambient sound level detection unit that detects an ambient sound level that is the sound level of the ambient sound collected by the second sound collection unit.

ユースケース判断部３３は、レベルチェック部３１で検出されたマイク２が集音した入力音声レベルと周囲音レベルとの比（Ｓ／Ｎ比）を算出する。ここで、本実施例におけるＳ／Ｎ比は、入力音声レベルを周囲音レベルで除算した値（入力音声レベル／周囲音レベル）である。そして、算出されたＳ／Ｎ比が予め定めた所定の値より小さい場合は、プライベートモードと判断し、外部出力装置４に対してプライベートモードに対応した出力に変化させるよう制御信号を出力する。即ち、ユースケース判断部３３が比算出手段として機能する。 The use case determination unit 33 calculates the ratio (S / N ratio) between the input sound level collected by the microphone 2 detected by the level check unit 31 and the ambient sound level. Here, the S / N ratio in the present embodiment is a value obtained by dividing the input sound level by the ambient sound level (input sound level / ambient sound level). If the calculated S / N ratio is smaller than a predetermined value, the private mode is determined and a control signal is output to the external output device 4 so as to change the output to the private mode. That is, the use case determination unit 33 functions as a ratio calculation unit.

つまり、Ｓ／Ｎ比が小さい場合は、ユーザの発話に対して周囲音が相対的に大きいことを意味するので、周囲に人が多くいる状況において小声で発話していると推測することができる。したがって、Ｓ／Ｎ比が小さい場合は周囲に音声認識エンジン部３２の応答を聞かれたくない、又は、見られたくない状況と判断してプライベートモードの動作を行わせる。なお、プライベートモード時の外部出力装置４の動作は第１の実施例と同様である。即ち、スピーカから出力される音を小さくし、表示デバイスに表示される画像が周囲から認識しにくくなるように輝度を低下させる。 That is, when the S / N ratio is small, it means that the ambient sound is relatively loud with respect to the user's utterance, so that it can be assumed that the utterance is uttered in a loud situation in a situation where there are many people around. . Therefore, when the S / N ratio is small, it is determined that the user does not want to hear the response of the voice recognition engine unit 32 or does not want to see the response, and the private mode operation is performed. The operation of the external output device 4 in the private mode is the same as that in the first embodiment. That is, the sound output from the speaker is reduced, and the brightness is lowered so that the image displayed on the display device is difficult to recognize from the surroundings.

次に、本実施例における音声認識装置１の動作を図４のフローチャートを参照して説明する。図４に示したフローチャートは制御装置３で実行される。 Next, the operation of the speech recognition apparatus 1 in this embodiment will be described with reference to the flowchart of FIG. The flowchart shown in FIG. 4 is executed by the control device 3.

まず、ステップＳ２１において、音声信号がマイク２とマイク５からレベルチェック部３１に入力されてステップＳ１２に進む。 First, in step S21, an audio signal is input from the microphone 2 and the microphone 5 to the level check unit 31, and the process proceeds to step S12.

次に、ステップＳ２２において、レベルチェック部３１が、マイク２から入力された音声信号の入力音声レベルを検出し、マイク５から入力された音声信号の周囲音レベルを検出して、それぞれユースケース判断部３３に出力し、ステップＳ２３に進む。 Next, in step S22, the level check unit 31 detects the input sound level of the sound signal input from the microphone 2, detects the ambient sound level of the sound signal input from the microphone 5, and determines each use case. The data is output to the unit 33, and the process proceeds to step S23.

次に、ステップＳ２３において、ユースケース判断部３３が、レベルチェック部３１で検出した入力音声レベルと周囲音レベルとの比（Ｓ／Ｎ比）を算出し、Ｓ／Ｎ比が所定の値より小さい場合（ＹＥＳの場合）はステップＳ２４に進み、所定の値以上の場合（ＮＯの場合）はステップＳ２５に進む。即ち、ステップＳ２２とＳ２３で、比算出工程として機能する。 Next, in step S23, the use case determination unit 33 calculates a ratio (S / N ratio) between the input sound level detected by the level check unit 31 and the ambient sound level, and the S / N ratio is determined from a predetermined value. If it is smaller (in the case of YES), the process proceeds to step S24. If it is equal to or greater than the predetermined value (in the case of NO), the process proceeds to step S25. That is, it functions as a ratio calculation process in steps S22 and S23.

ステップＳ２４とステップＳ２５は図２のステップＳ１４とステップＳ１５と同様である。 Steps S24 and S25 are the same as steps S14 and S15 in FIG.

本実施例によれば、音声認識装置１において、入力音声レベルとマイク５から出力された周囲音のレベル（周囲音レベル）をレベルチェック部３１が検出し、ユースケース判断部３３が、入力音声レベルと周囲音レベルの比（Ｓ／Ｎ比）が予め定めた所定の値より小さいか否か判断する。そして、Ｓ／Ｎ比が予め定めた所定の値より小さい場合は、例えばスピーカから出力される音を小さくするとともに表示デバイスの輝度を低下させる。このようにすることにより、Ｓ／Ｎ比が小さい場合は、音声認識の応答を周囲に聞かれたくない、又は、見られたくないと判断して音を小さくしたり、輝度を低下させることができる。したがって、入力に対する応答を周囲の状況に応じて変化させることができる。 According to the present embodiment, in the speech recognition apparatus 1, the level check unit 31 detects the input sound level and the level of the ambient sound output from the microphone 5 (ambient sound level), and the use case determination unit 33 detects the input sound level. It is determined whether the ratio between the level and the ambient sound level (S / N ratio) is smaller than a predetermined value. When the S / N ratio is smaller than a predetermined value, for example, the sound output from the speaker is reduced and the luminance of the display device is decreased. In this way, when the S / N ratio is small, it may be judged that the user does not want to hear the voice recognition response in the surroundings or does not want to see it, so that the sound is reduced or the luminance is lowered. it can. Therefore, the response to the input can be changed according to the surrounding situation.

なお、上述した２つの実施例では、表示部４２が有する表示デバイスの輝度を低下させることで表示される画像が周囲から認識しにくくなるようにしていたが、それに限らず、例えば、表示デバイスの視認角度を狭くするようにしてもよい。この場合は、例えば液晶素子に電圧を印加することで液晶の配向状態を変化させるなどとして偏光方向を変化させるフィルタ等を表示デバイスの表面に設ければよい。 In the above-described two embodiments, the brightness of the display device included in the display unit 42 is reduced so that the displayed image is difficult to recognize from the surroundings. The viewing angle may be narrowed. In this case, for example, a filter that changes the polarization direction by changing the alignment state of the liquid crystal by applying a voltage to the liquid crystal element may be provided on the surface of the display device.

また、上述した２つの実施例では、音声出力部４１と表示部４２の双方の制御を変化させていたが、いずれか一方のみであってもよい。 In the two embodiments described above, the control of both the audio output unit 41 and the display unit 42 is changed, but only one of them may be used.

また、上述した２の実施例のように、スピーカ（音声出力部４１）と表示デバイス（表示部４２）の双方を有している場合において、プライベートモードと判断された際は、表示デバイスの表示を停止し（画面を消し）、スピーカが出力する音を小さくするようにしてもよい。または、逆に、スピーカからの音の出力を停止し、表示デバイスの輝度を低下させたり視認角度を狭くするようにしてもよい。即ち、音声出力手段と表示手段の双方を有している場合は、一方の動作を停止させることも周囲から応答が認識しにくくなるように出力を変化させることに含まれる。 Further, as in the second embodiment described above, when both the speaker (sound output unit 41) and the display device (display unit 42) are included, when the private mode is determined, the display device display is performed. May be stopped (screen is turned off), and the sound output from the speaker may be reduced. Or conversely, the output of sound from the speaker may be stopped to reduce the brightness of the display device or narrow the viewing angle. That is, when both the audio output means and the display means are provided, stopping one operation includes changing the output so that the response is difficult to recognize from the surroundings.

また、音声認識エンジン部３２は、図１や図３に示したような制御装置３に含む形態に限らず、例えば、ネットワーク等を介して無線または有線で通信する外部サーバ等に設けられていてもよい。その一例を図５に示す。図５では、制御装置３に通信部３４が設けられている。通信部３４は、レベルチェック部３１から入力された音声信号をインターネット３０に接続されたサーバ２０内に設けられた音声認識エンジン部２１に出力する。そして、通信部３４は、音声認識エンジン部２１から入力された応答を外部出力装置４や他の処理装置等に出力する。図５に示した場合においては、通信部３４が第１出力手段および応答取得手段として機能する。 The speech recognition engine unit 32 is not limited to the form included in the control device 3 as shown in FIGS. 1 and 3, and is provided in, for example, an external server that communicates wirelessly or via a network or the like. Also good. An example is shown in FIG. In FIG. 5, a communication unit 34 is provided in the control device 3. The communication unit 34 outputs the voice signal input from the level check unit 31 to the voice recognition engine unit 21 provided in the server 20 connected to the Internet 30. Then, the communication unit 34 outputs the response input from the speech recognition engine unit 21 to the external output device 4 or another processing device. In the case shown in FIG. 5, the communication unit 34 functions as a first output unit and a response acquisition unit.

また、図６に示したように、イヤホンやヘッドホンなどの外部音声出力手段６を接続するための端子や外部音声出力手段６とＢｌｕｅｔｏｏｔｈ（登録商標）などで無線通信をするための回路やアンテナ等の出力インタフェース４３を有している場合がある。 Further, as shown in FIG. 6, a terminal for connecting an external audio output means 6 such as an earphone or a headphone, a circuit for performing wireless communication with the external audio output means 6 and Bluetooth (registered trademark), an antenna, or the like Output interface 43 may be provided.

図６に示した出力インタフェース４３は、音声出力部４１と切替スイッチ４４で切替可能となっている。つまり、イヤホンやヘッドホンが接続された場合は、切替スイッチ４４を出力インタフェース４３側に切り替えて音声出力部のスピーカからは音が出力されないようになっている。 The output interface 43 shown in FIG. 6 can be switched by an audio output unit 41 and a changeover switch 44. That is, when earphones or headphones are connected, the changeover switch 44 is switched to the output interface 43 side so that no sound is output from the speaker of the audio output unit.

図６に示した出力インタフェース４３を有している場合において、プライベートモードと判断された際は、表示デバイスの表示を停止し、出力インタフェースのみから音声認識エンジン部３２の応答にかかる音（音声信号）を出力するようにしてもよい。このようにすることにより、音声認識の応答を周囲に見られたくない場合に、イヤホンやヘッドホンなどの外部音声出力手段から音のみを出力させることができる。 In the case of having the output interface 43 shown in FIG. 6, when the private mode is determined, the display of the display device is stopped, and the sound (voice signal) that responds to the response of the voice recognition engine unit 32 only from the output interface. ) May be output. By doing so, it is possible to output only sound from an external sound output means such as an earphone or a headphone when it is not desired to see a voice recognition response in the surroundings.

また、レベルチェック部３１と、ユースケース判断部３３をマイコン等のコンピュータで構成し、図２や図４に示したフローチャートをコンピュータプログラムとすれば、入出力プログラムとして構成することができる。 Further, if the level check unit 31 and the use case determination unit 33 are configured by a computer such as a microcomputer and the flowcharts shown in FIGS. 2 and 4 are computer programs, they can be configured as input / output programs.

また、本発明は上記実施例に限定されるものではない。即ち、当業者は、従来公知の知見に従い、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。かかる変形によってもなお本発明の入出力装置の構成を具備する限り、勿論、本発明の範疇に含まれるものである。 Further, the present invention is not limited to the above embodiment. That is, those skilled in the art can implement various modifications in accordance with conventionally known knowledge without departing from the scope of the present invention. Of course, such modifications are included in the scope of the present invention as long as the configuration of the input / output device of the present invention is provided.

２マイク（第１集音手段）
３１レベルチェック部（第１出力手段、音声レベル比較手段、周囲音レベル検出手段）
３２音声認識エンジン部（応答取得手段）
３３ユースケース判断部（音声レベル比較手段、制御手段、比算出手段）
４外部出力装置（第２出力手段）
４１音声出力部（第２出力手段、音声出力手段）
４２表示部（第２出力手段、表示手段）
５マイク（第２集音手段）
６外部音声出力手段
１０入出力装置
Ｓ１２レベルチェック（音声レベル比較工程）
Ｓ１３所定の音声信号レベルより小（音声レベル比較工程）
Ｓ１４プライベートモード（制御工程）
Ｓ２２レベルチェック（比算出工程）
Ｓ２３所定の値より小（比算出工程）
Ｓ２４プライベートモード（制御工程） 2 Microphone (first sound collecting means)
31 Level check section (first output means, sound level comparison means, ambient sound level detection means)
32 Voice recognition engine (response acquisition means)
33 Use case determination unit (voice level comparison means, control means, ratio calculation means)
4 External output device (second output means)
41 Audio output unit (second output means, audio output means)
42 display section (second output means, display means)
5 Microphone (second sound collecting means)
6 External audio output means 10 Input / output device S12 Level check (audio level comparison process)
S13 Smaller than predetermined audio signal level (audio level comparison process)
S14 Private mode (control process)
S22 Level check (ratio calculation process)
S23 Smaller than a predetermined value (ratio calculation step)
S24 Private mode (control process)

Claims

A first sound collecting means for collecting the spoken input sound;
First output means for outputting the input voice collected by the first sound collection means to voice recognition means;
Response acquisition means for acquiring a response from the voice recognition means;
Second output means for outputting the response acquired by the response acquisition means;
Voice level comparison means for detecting an input voice level that is a voice level of the input voice collected by the first sound collecting means and comparing the input voice level with a predetermined voice level;
Control means for changing the output of the second output means so that the response is difficult to recognize from the surroundings when the input voice level compared by the voice level comparing means is smaller than the predetermined voice level; An input / output device comprising:

The second output means has voice output means for outputting the response as sound,
The said control means reduces the sound output from the said audio | voice output means, when the result compared with the said audio | voice level comparison means is smaller than the said predetermined audio | voice level, The said output means is characterized by the above-mentioned. I / O device.

The second output means further includes display means for displaying the response as an image,
The control means stops the display of the display means and reduces the sound output from the sound output means when the result of the comparison by the sound level comparison means is smaller than the predetermined sound level. The input / output device according to claim 2.

The second output means further includes an output interface for outputting the response as sound from an external sound output means,
4. The input / output according to claim 3, wherein the control means causes only the output interface to output the response when a result of comparison by the sound level comparison means is smaller than the predetermined sound level. apparatus.

The second output means has display means for displaying the response as an image,
The control means changes the display of the display means so that the image is difficult to recognize from the surroundings when the result of the comparison by the sound level comparison means is smaller than the predetermined sound level. The input / output device according to claim 1.

The second output means further includes sound output means for outputting the response as sound,
The control means stops the output of the sound output means when the result of comparison by the sound level comparison means is smaller than the predetermined sound level, and makes the image difficult to recognize from the surroundings. 6. The input / output device according to claim 5, wherein the display of the display means is changed.

A first sound collecting means for collecting the spoken input sound;
Second sound collecting means for collecting ambient sounds other than the input sound;
First output means for outputting the input voice collected by the first sound collection means to voice recognition means;
Ambient sound level detection means for detecting an ambient sound level that is the sound level of the ambient sound collected by the second sound collection means;
Response acquisition means for acquiring a response from the voice recognition means;
Second output means for outputting the response acquired by the response acquisition means;
A ratio for detecting an input sound level that is a sound level of the input sound collected by the first sound collecting means and calculating a ratio between the input sound level and the ambient sound level detected by the ambient sound level detecting means A calculation means;
Control means for changing the output of the second output means so that it is difficult to recognize the response from the surroundings when the ratio calculated by the ratio calculation means is smaller than a predetermined value. An input / output device characterized by

An input / output method in an input / output device that outputs a response from speech recognition means to spoken input speech,
A voice level comparison step of detecting an input voice level, which is a voice level of the voice collected by the first sound collecting means for collecting the input voice, and comparing the input voice level with a predetermined voice level;
A control step of changing the output of the response so that the response of the speech recognition means is difficult to recognize from the surroundings when the input speech level compared in the speech level comparison step is smaller than the predetermined speech level; The input / output method characterized by including.

An input / output program that causes a computer to execute the input / output method according to claim 8.

A computer-readable recording medium storing the input / output program according to claim 9.

An input / output method in an input / output device that outputs a response from speech recognition means to spoken input speech,
The surroundings collected by the second sound collecting means for detecting the input sound level, which is the sound level of the sound collected by the first sound collecting means for collecting the input sound, and collecting ambient sounds other than the input sound A ratio calculating step of detecting an ambient sound level that is a sound level of sound and calculating a ratio between the input sound level and the ambient sound level;
A control step of changing the output of the response so that the response of the voice recognition means is difficult to recognize from the surrounding when the ratio compared in the ratio calculation step is smaller than a predetermined value set in advance. An input / output method comprising:

An input / output program that causes a computer to execute the input / output method according to claim 11.

A computer-readable recording medium storing the input / output program according to claim 12.