JP2003044075A

JP2003044075A - Electronic equipment with speech recognizing function

Info

Publication number: JP2003044075A
Application number: JP2001230450A
Authority: JP
Inventors: Sunako Asayama; 砂子朝山; Yoshihiro Kojima; 良宏小島; Katsumi Fujisaki; 克巳藤▲さき▼
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-07-30
Filing date: 2001-07-30
Publication date: 2003-02-14

Abstract

PROBLEM TO BE SOLVED: To solve such a problem that, when electronic equipment outputting voice is operated by voice, conventional equipment has lacked swiftness because it has performed time-consuming recognition processing even when a voice signal inputted is obviously unsuitable for recognition processing before performing it. SOLUTION: The electronic equipment is provided with a voice level detection part 104, a voice section detection part 106, and a loudness control part 111, and when the voice level detection part 104 judges that an input level is too high, and the voice range detection part 106 judges that a voice section is too long, the loudness control part 111 is informed to reduce loudness of the voice signal outputted from the electronic equipment before a word recognition part 107 performs recognition processing.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識機能を備
えた電子機器で、特に電子機器本体から音声を出力する
電子機器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an electronic device having a voice recognition function, and more particularly to an electronic device which outputs a voice from an electronic device main body.

【０００２】[0002]

【従来の技術】近年、電子機器の操作を音声で行う音声
インターフェースが実現されている。2. Description of the Related Art In recent years, a voice interface for operating electronic equipment by voice has been realized.

【０００３】これは、電子機器の機能が複雑化、多機能
化し、操作を行うためのスイッチやボタンが多様化した
結果、機器の操作手順が複雑になったり、目的の操作を
完了するためにボタン操作を数回以上行わなければなら
なくなったりしたため、これらの操作を一回の音声コマ
ンドの発声で可能にするなど、音声認識技術を用いてユ
ーザーインターフェースの改善を図ったことによる。This is because the functions of electronic devices have become complicated and multifunctional, and the switches and buttons for operating have become diversified, so that the operation procedure of the devices becomes complicated and the intended operation is completed. Because it was necessary to perform button operations more than once, it was possible to improve the user interface by using voice recognition technology, such as enabling these operations with a single voice command.

【０００４】しかし、音声インターフェースを実現する
電子機器が、例えばテレビやオーディオ機器のようにそ
の機器本体から音声信号を出力する場合、この出力音声
が話者の発声する音声コマンドと一緒に入力され、認識
処理を行う際の雑音となり、認識失敗が発生して認識性
能が低下するという問題があった。However, when an electronic device that realizes a voice interface outputs a voice signal from its main body such as a television or an audio device, this output voice is input together with a voice command uttered by a speaker, There is a problem that it becomes noise when performing the recognition process, recognition failure occurs, and the recognition performance deteriorates.

【０００５】このような場合、すなわち話者の発声以外
の音声信号（発声の周囲環境とする）の影響によって認
識性能が低下した場合の対策として、この周囲環境を改
善する改善処理を行う装置が開示されている。In such a case, that is, in the case where the recognition performance is deteriorated by the influence of a voice signal other than the utterance of the speaker (referred to as the surrounding environment of the utterance), an apparatus for performing an improvement process for improving the surrounding environment is provided. It is disclosed.

【０００６】例えば特開平１１−１２６０９２号公報に
は、周囲環境に起因して話者の音声の認識失敗が生じた
場合に、周囲環境の改善を行う音声認識装置が開示され
ている。[0006] For example, Japanese Patent Laid-Open No. 11-126092 discloses a voice recognition device for improving the surrounding environment when a speaker's voice recognition fails due to the surrounding environment.

【０００７】図３に従来例の音声認識装置のブロック図
を示す。FIG. 3 shows a block diagram of a conventional voice recognition apparatus.

【０００８】図３において、２０１は音声入力部であ
り、話者が発声した音声が入力される。２０２は音声認
識部であり、音声入力部２０１から入力された音声デー
タを認識処理し、認識結果を出力する。２０３は機器制
御部であり、音声認識部２０２で得られた認識結果に基
づいてオーディオ装置２０４、およびエアーコンディシ
ョナ２０５の各機器を制御する。In FIG. 3, reference numeral 201 denotes a voice input unit, which inputs a voice uttered by a speaker. A voice recognition unit 202 recognizes voice data input from the voice input unit 201 and outputs a recognition result. A device control unit 203 controls each device of the audio device 204 and the air conditioner 205 based on the recognition result obtained by the voice recognition unit 202.

【０００９】以上のように構成された従来の音声認識装
置において、周囲環境に起因して話者の音声の認識に失
敗した場合に、音声認識装置が周囲環境を改善する場合
の動作について以下に説明する。In the conventional speech recognition apparatus configured as described above, when the speech recognition apparatus fails to recognize the speech of the speaker due to the surrounding environment, the operation when the speech recognition apparatus improves the surrounding environment will be described below. explain.

【００１０】音声入力部２０１は、話者からの音声が入
力されると入力された音声データをデジタル信号に変換
し、音声認識部２０２へ出力する。音声認識部２０２
は、音声入力部２０１から入力された音声データと標準
音声データとを比較する認識処理を行う。音声認識部２
０２は、認識処理に失敗すると、機器制御部２０３に音
声入力部２０１の周囲環境の改善要求を送る。機器制御
部２０３は、音声認識部２０２から音声入力部２０１の
周囲環境の改善要求が送られると、制御対象の機器、す
なわち、オーディオ装置２０４及びエアーコンディショ
ナ２０５に制御信号を出力して機器を動作させる。例え
ば、原因機器がオーディオ装置２０４であった場合、オ
ーディオ装置２０４の音量を下げる。これにより、音声
入力部２０１の周囲環境が改善され、次回からの話者の
発声に対する認識性能を向上させることができる。When the voice from the speaker is input, the voice input unit 201 converts the input voice data into a digital signal and outputs the digital signal to the voice recognition unit 202. Voice recognition unit 202
Performs a recognition process of comparing the voice data input from the voice input unit 201 with the standard voice data. Speech recognition unit 2
When the recognition process fails, 02 sends a request for improving the surrounding environment of the voice input unit 201 to the device control unit 203. When a request for improving the surrounding environment of the voice input unit 201 is sent from the voice recognition unit 202, the device control unit 203 outputs a control signal to the control target device, that is, the audio device 204 and the air conditioner 205, and controls the device. To operate. For example, when the causal device is the audio device 204, the volume of the audio device 204 is lowered. As a result, the environment around the voice input unit 201 is improved, and the recognition performance for the next utterance of the speaker can be improved.

【００１１】[0011]

【発明が解決しようとする課題】しかし、上記のような
構成では、話者が発声した音声データの音声レベルや音
声区間が認識処理に明らかに不適切な場合、すなわち、
音声認識部で認識処理が正確に行われない可能性が高い
場合でも、音声データを音声認識部に入力して時間のか
かる認識処理を行い、認識が失敗した後で発声の周囲環
境を改善するので、話者がすぐに再発声できない等、機
器の操作性が低下するという問題があった。However, in the above configuration, when the voice level or voice section of the voice data uttered by the speaker is clearly inappropriate for the recognition process, that is,
Even if there is a high possibility that the voice recognition unit will not perform the recognition process correctly, input the voice data to the voice recognition unit to perform the time-consuming recognition process, and improve the surrounding environment of the utterance after the recognition fails. Therefore, there is a problem that the operability of the device is deteriorated such that the speaker cannot immediately re-voice.

【００１２】このため、入力された音声データが認識処
理に明らかに不適切な場合には、時間のかかる認識処理
を行わずに、迅速に電子機器本体から出力する音声を制
御することが必要である。Therefore, when the input voice data is obviously inappropriate for the recognition process, it is necessary to quickly control the voice output from the electronic equipment body without performing the time-consuming recognition process. is there.

【００１３】本発明は上記課題を解決するためになされ
その目的とするところは、音声を出力する電子機器にお
いて、その電子機器を音声で操作する場合に、入力音声
データが認識処理に明らかに不適切な場合には、迅速に
電子機器本体から出力する音声信号の音量を制御し、時
間のかかる認識処理を行わずに速やかに話者の発声に対
する周囲環境を改善することによって、電子機器の操作
性を向上させることにある。The present invention has been made in order to solve the above problems, and an object of the present invention is to provide an electronic device which outputs voice, and when the electronic device is operated by voice, the input voice data is obviously unclear to the recognition process. When appropriate, control the volume of the audio signal output from the electronic device itself, and quickly improve the surrounding environment for the speaker's utterance without time-consuming recognition processing to operate the electronic device. Is to improve the sex.

【００１４】[0014]

【課題を解決するための手段】第１の本発明（請求項１
に対応）は、音声認識手段と、音声を出力する音声出力
手段と、その音声出力手段の音量を制御する音量制御手
段と、を備えた電子機器において、話者の音声コマンド
の開始を検知する音声検出手段と、音声を収集する音声
収集手段と、収集された音声のレベルを検出し、そのレ
ベルに基づいて、前記音量制御手段を制御する音声入力
レベル検出手段と、前記音声認識手段から得られた認識
結果に基づいて、少なくとも、前記話者の音声コマンド
を実行させる音声出力制御信号を出力する認識結果判定
手段とを備えた電子機器である。[Means for Solving the Problems] The first invention (Claim 1)
Corresponds to), in an electronic device equipped with a voice recognition means, a voice output means for outputting a voice, and a volume control means for controlling the volume of the voice output means, to detect the start of a voice command of a speaker. A voice detection unit, a voice collection unit that collects a voice, a voice input level detection unit that detects a level of the collected voice and controls the volume control unit based on the level, and a voice input level detection unit that obtains from the voice recognition unit. The electronic device is provided with at least a recognition result determination means for outputting a voice output control signal for executing the voice command of the speaker based on the recognized recognition result.

【００１５】第２の本発明（請求項２に対応）は、前記
音声入力レベル検出手段は、入力される音声の音声コマ
ンドのレベルに対応するように、予め定められたレベル
パラメータを保持し、前記音声レベルが前記レベルパラ
メータより大きければ、前記音量制御手段へ、音声の音
量を小さくさせる音量制御信号を出力する第１の本発明
の電子機器である。In a second aspect of the present invention (corresponding to claim 2), the voice input level detecting means holds a predetermined level parameter so as to correspond to the level of a voice command of an input voice, If the audio level is higher than the level parameter, the electronic device according to the first aspect of the present invention outputs a volume control signal for reducing the volume of audio to the volume control means.

【００１６】第３の本発明（請求項３に対応）は、音声
認識手段と、音声を出力する音声出力手段と、その音声
出力手段の音量を制御する音量制御手段と、を備えた電
子機器において、話者の音声コマンドの開始を検知する
音声検出手段と、音声を収集する音声収集手段と、収集
された音声の音声区間を検出し、その音声区間の長さに
基づいて、前記音量制御手段を制御する音声区間検出手
段と、前記音声認識手段から得られた認識結果に基づい
て、少なくとも、前記話者の音声コマンドを実行させる
音声出力制御信号を出力する認識結果判定手段とを備え
た電子機器である。According to a third aspect of the present invention (corresponding to claim 3), an electronic device comprising voice recognition means, voice output means for outputting voice, and volume control means for controlling the volume of the voice output means. In, a voice detecting means for detecting the start of a voice command of a speaker, a voice collecting means for collecting voice, a voice section of the collected voice is detected, and the volume control is performed based on the length of the voice section. And a recognition result determining means for outputting at least a voice output control signal for executing the voice command of the speaker based on the recognition result obtained from the voice recognition means. It is an electronic device.

【００１７】第４の本発明（請求項４に対応）は、前記
音声区間検出手段は、入力される音声の音声コマンドの
音声区間に対応するように、予め定められた区間検出パ
ラメータを保持し、前記音声区間が前記区間検出より長
い場合は、前記音量制御手段へ、音声の音量を小さくさ
せる音量制御信号を出力する第３の本発明の電子機器で
ある。In a fourth aspect of the present invention (corresponding to claim 4), the voice section detecting means holds a predetermined section detection parameter so as to correspond to a voice section of a voice command of an input voice. If the voice section is longer than the section detection, the electronic device according to the third aspect of the present invention outputs a volume control signal for reducing the volume of voice to the volume control unit.

【００１８】第５の本発明（請求項５に対応）は、前記
音声検出手段は、前記音声収集手段を基準とする所定エ
リアに人が入ったことを検知するセンサ、あるいは、話
者自身が音声以外の方法方法でこれから音声コマンドを
開始することを知らせることのできるスイッチ若しくは
センサで構成されている第１又は３の本発明の電子機器
である。According to a fifth aspect of the present invention (corresponding to claim 5), the voice detecting means is a sensor for detecting that a person has entered a predetermined area with the voice collecting means as a reference, or a speaker himself / herself. The electronic device according to the first or third aspect of the present invention, which comprises a switch or a sensor capable of notifying that a voice command is about to start by a method other than voice.

【００１９】第６の本発明（請求項６に対応）は、前記
機器が出力する音量の設定値を検出する音量設定値検出
手段と、その音量設定値検出手段によって検出された設
定値に基づいて、前記レベルパラメータ、または区間検
出パラメータを変更することを示すパラメータ変更手段
を備えたことを特徴とする第１、又は３の本発明の電子
機器である。A sixth aspect of the present invention (corresponding to claim 6) is based on volume setting value detecting means for detecting the setting value of the volume output by the device, and the setting value detected by the volume setting value detecting means. The electronic device according to the first or third aspect of the present invention is provided with a parameter changing unit that indicates changing the level parameter or the section detection parameter.

【００２０】また、第７の本発明（請求項７に対応）
は、第１の本発明の電子機器の、話者の音声コマンドの
開始を検知する音声検出手段と、収集された音声のレベ
ルを検出し、そのレベルに基づいて、前記音量制御手段
を制御する音声入力レベル検出手段と、前記音声認識手
段から得られた認識結果に基づいて、少なくとも、前記
話者の音声コマンドを実行させる音声出力制御信号を出
力する認識結果判定手段との全部または一部としてコン
ピュータを機能させるためのプログラムである。The seventh invention (corresponding to claim 7)
Is a voice detecting means for detecting the start of a voice command of a speaker of the electronic device of the first aspect of the present invention, detects the level of collected voice, and controls the volume control means based on the level. As a whole or a part of the voice input level detecting means and at least the recognition result determining means for outputting a voice output control signal for executing the voice command of the speaker based on the recognition result obtained from the voice recognizing means. A program for operating a computer.

【００２１】また、第８の本発明（請求項８に対応）
は、第３の本発明の電子機器の、話者の音声コマンドの
開始を検知する音声検出手段と、収集された音声の音声
区間を検出し、その音声区間の長さに基づいて、前記音
量制御手段を制御する音声区間検出手段と、前記音声認
識手段から得られた認識結果に基づいて、少なくとも、
前記話者の音声コマンドを実行させる音声出力制御信号
を出力する認識結果判定手段との全部または一部として
コンピュータを機能させるためのプログラムである。The eighth invention (corresponding to claim 8)
Is a voice detecting means of the electronic device of the third aspect of the invention for detecting the start of a voice command of a speaker; and a voice section of the collected voice, and based on the length of the voice section, the sound volume. Based on the recognition result obtained from the voice section detection means for controlling the control means and the voice recognition means, at least,
It is a program for causing a computer to function as a whole or a part of a recognition result determining unit that outputs a voice output control signal for executing a voice command of the speaker.

【００２２】また、第９の本発明（請求項９に対応）
は、第１の本発明の電子機器の、話者の音声コマンドの
開始を検知する音声検出手段と、収集された音声のレベ
ルを検出し、そのレベルに基づいて、前記音量制御手段
を制御する音声入力レベル検出手段と、前記音声認識手
段から得られた認識結果に基づいて、少なくとも、前記
話者の音声コマンドを実行させる音声出力制御信号を出
力する認識結果判定手段との全部または一部としてコン
ピュータを機能させるためのプログラムを担持した媒体
であって、コンピュータにより処理可能なことを特徴と
する媒体である。The ninth invention (corresponding to claim 9)
Is a voice detecting means for detecting the start of a voice command of a speaker of the electronic device of the first aspect of the present invention, detects the level of collected voice, and controls the volume control means based on the level. As a whole or a part of the voice input level detecting means and at least the recognition result determining means for outputting a voice output control signal for executing the voice command of the speaker based on the recognition result obtained from the voice recognizing means. A medium carrying a program for causing a computer to function, which is characterized by being processable by the computer.

【００２３】また、第１０の本発明（請求項１０に対
応）は、第３の本発明の電子機器の、話者の音声コマン
ドの開始を検知する音声検出手段と、収集された音声の
音声区間を検出し、その音声区間の長さに基づいて、前
記音量制御手段を制御する音声区間検出手段と、前記音
声認識手段から得られた認識結果に基づいて、少なくと
も、前記話者の音声コマンドを実行させる音声出力制御
信号を出力する認識結果判定手段との全部または一部と
してコンピュータを機能させるためのプログラムを担持
した媒体であって、コンピュータにより処理可能なこと
を特徴とする媒体である。The tenth aspect of the present invention (corresponding to claim 10) is a voice detecting means for detecting the start of a voice command of a speaker of the electronic device according to the third aspect of the invention, and a voice of collected voice. At least a voice command of the speaker is detected based on a voice period detection unit that detects a period and controls the volume control unit based on the length of the voice period, and a recognition result obtained from the voice recognition unit. A medium carrying a program for causing a computer to function as all or a part of a recognition result determination means for outputting a voice output control signal for executing the above, and characterized by being processable by the computer.

【００２４】[0024]

【発明の実施の形態】本発明の一実施の形態について、
図面を用いて説明する。BEST MODE FOR CARRYING OUT THE INVENTION Regarding one embodiment of the present invention,
This will be described with reference to the drawings.

【００２５】本実施の形態の電子機器の構成について、
図１を参照して説明する。Regarding the configuration of the electronic equipment of the present embodiment,
This will be described with reference to FIG.

【００２６】図１に示すように、本電子機器は、音声入
力部１０１と、音声認識部１０５と、認識結果判定部１
０８と、音声出力制御部１０９と、音声出力部１１３
と、映像出力制御部１１４と、映像出力部１１５とで構
成されている。As shown in FIG. 1, the present electronic device has a voice input unit 101, a voice recognition unit 105, and a recognition result determination unit 1
08, a voice output control unit 109, and a voice output unit 113
And a video output control unit 114 and a video output unit 115.

【００２７】前記音声入力部１０１は、音声検出部１０
２とマイク１０３と音声入力レベル検出部１０４とで構
成されている。The voice input unit 101 includes a voice detection unit 10
2, a microphone 103, and a voice input level detection unit 104.

【００２８】音声検出部１０２は、話者が電子機器の操
作を行うために発声する音声コマンドの発声の開始を所
定の方法で検知するための手段である。The voice detection unit 102 is means for detecting the start of utterance of a voice command uttered by a speaker to operate an electronic device by a predetermined method.

【００２９】音声検出部１０２は、例えば人感センサな
どであり、人物がセンサの検知エリア内に入ると、人物
が発する熱（赤外線）を感知してセンサが作動する。こ
れにより、人物が音声入力部１０１に近づいたことを検
出する。また、タッチセンサなどであり、話者が発声を
開始する前に音声入力部の一部または音声入力部１０１
以外の電子機器の一部に触れることによってセンサが作
動し、話者を検知する。また、音声検出部１０２はトー
クスイッチでもよく、話者が発声する前にスイッチを押
下することによって話者の発声の開始を検知する。トー
クスイッチは、スイッチを押し続けて発声するプレスト
ークスイッチと、発声を開始する前に押すプッシュトー
クスイッチがあり、どちらでもよい。The voice detecting section 102 is, for example, a human sensor, and when a person enters the detection area of the sensor, the sensor operates by sensing the heat (infrared ray) generated by the person. As a result, it is detected that the person approaches the voice input unit 101. A touch sensor or the like, which is a part of the voice input unit or the voice input unit 101 before the speaker starts speaking.
The sensor is activated by touching a part of the electronic equipment other than, and detects the speaker. The voice detection unit 102 may be a talk switch, and detects the start of the speaker's utterance by pressing the switch before the speaker speaks. The talk switch includes a press talk switch that presses the switch continuously to make a voice and a push talk switch that presses the switch before starting to make a voice. Either of them may be used.

【００３０】マイク１０３は、音声収集装置であって、
話者が発声した音声が入力される。The microphone 103 is a voice collecting device,
The voice uttered by the speaker is input.

【００３１】音声入力レベル検出部１０４は、マイク１
０３へ入力された音声の音声レベルを検出する。入力さ
れた音声の音声レベルが予め定められたレベルパラメー
タ（αとする、α：実数）よりも大きければ、音量制御
部１１１へ音量制御信号を出力する。入力された音声の
音声レベルがαより小さければ、入力された音声を音声
データとして音声区間検出部１０４へ出力する。つま
り、大きい場合は、電子機器から人の声などが出力さ
れ、重複していると判断し、電子機器の音声出力を低下
させる音量制御信号を出す。The voice input level detecting section 104 includes a microphone 1
The voice level of the voice input to 03 is detected. If the voice level of the input voice is higher than a predetermined level parameter (α, α: real number), a volume control signal is output to the volume control unit 111. If the voice level of the input voice is lower than α, the input voice is output to the voice section detection unit 104 as voice data. In other words, when the volume is large, a voice of a person or the like is output from the electronic device, it is determined that the voices are duplicated, and a volume control signal for reducing the voice output of the electronic device is output.

【００３２】また、音声入力レベル検出部１０４は、パ
ラメータ変更通知部１１２からパラメータ変更通知信号
が入力されると、その信号の内容に基づいてレベルパラ
メータαの値を変更する。When the parameter change notification signal is input from the parameter change notification unit 112, the voice input level detection unit 104 changes the value of the level parameter α based on the content of the signal.

【００３３】音声認識部１０５は、音声区間検出部１０
６と単語認識部１０７とで構成されている。The voice recognition unit 105 includes a voice section detection unit 10
6 and a word recognition unit 107.

【００３４】音声区間検出部１０６は、音声入力レベル
検出部１０４から出力された音声データの音声レベル及
び音声データを周波数解析した結果を用いて音声区間を
検出する。音声データの音声区間が予め定められた区間
検出パラメータ（ｔとする、ｔ：実数）より長ければ、
音量制御部１１１へ音量制御信号を出力する。音声デー
タの音声区間が区間検出パラメータｔより短ければ、音
声データを単語認識部１０７へ出力する。つまり、音声
による命令コマンドの長さはある値以下であるという経
験則があり、音声区間がその一定値より長いということ
は、音声入力手段には話者が発生した音とは別に、電子
機器から出力される音声が同時に入力されていると推定
できるので、電子機器の音声出力を低下させる。The voice section detector 106 detects the voice section using the voice level of the voice data output from the voice input level detector 104 and the result of frequency analysis of the voice data. If the voice section of the voice data is longer than a predetermined section detection parameter (t, t: real number),
A volume control signal is output to the volume control unit 111. If the voice section of the voice data is shorter than the section detection parameter t, the voice data is output to the word recognition unit 107. That is, there is an empirical rule that the length of a command command by voice is less than or equal to a certain value, and the fact that the voice section is longer than the certain value means that the voice input means is separated from the sound generated by the speaker in the electronic device. Since it can be estimated that the voices output from the are input at the same time, the voice output of the electronic device is reduced.

【００３５】また、音声区間検出部１０６は、パラメー
タ変更通知部１１２からパラメータ変更通知信号が入力
されると、その信号の内容に基づいて区間検出パラメー
タｔの値を変更する。When the parameter change notification signal is input from the parameter change notification unit 112, the voice section detection unit 106 changes the value of the section detection parameter t based on the content of the signal.

【００３６】単語認識部１０７は、利用できる音声コマ
ンドを標準音声データとして記憶した認識用辞書（図示
せず）を保持し、認識用辞書を用いて、音声データと標
準音声データとを比較して認識処理を行う。認識処理に
ついては、例えばＣ.シュマントによる"コンピュータと
のヴォイスコミュニケーション−未来のコンピューティ
ングに向けて−"（サイエンス社）に示すように、単語
認識部１０７が、音声区間検出部１０６から入力された
音声データからデジタル信号処理により、例えば２０ミ
リ秒毎のＬＰＣ係数のフレームを抽出する。次に、求め
たＬＰＣ係数のフレームと、標準音声データとして認識
用辞書に保持している各音声コマンドのＬＰＣ係数デー
タとをマッチングアルゴリズムを用いて比較する。マッ
チングアルゴリズムは、例えばＬＰＣ係数の各次元のベ
クトル毎に標準音声データの各次元のベクトルとの距離
を算出し、その総和を求めこれを入力された音声データ
と標準音声データとの距離とする。次に、入力された音
声データと標準音声データとの距離がもっとも小さい場
合を最高点（例えば１００）となるように正規化してこ
れを認識単語の信頼度とし、認識単語の信頼度がもっと
も高い標準音声データを求め、その単語と単語の信頼度
を一組のセットとして、予め定められたセット数だけ認
識結果として出力する。The word recognition unit 107 holds a recognition dictionary (not shown) in which usable voice commands are stored as standard voice data, and compares the voice data with the standard voice data by using the recognition dictionary. Perform recognition processing. Regarding the recognition processing, for example, as shown in "Voice Communication with Computer-For Future Computing-" (Science) by C. Schmann, the word recognition unit 107 is input from the voice section detection unit 106. A frame having an LPC coefficient every 20 milliseconds is extracted from the audio data by digital signal processing. Next, the frame of the obtained LPC coefficient is compared with the LPC coefficient data of each voice command held in the recognition dictionary as standard voice data by using a matching algorithm. The matching algorithm calculates, for example, for each dimensional vector of the LPC coefficient, the distance from the vector of each dimension of the standard voice data, calculates the sum of the distances, and sets this as the distance between the input voice data and the standard voice data. Next, when the distance between the input voice data and the standard voice data is the smallest, it is normalized so that it becomes the highest point (for example, 100), and this is taken as the reliability of the recognition word, and the reliability of the recognition word is the highest. Standard speech data is obtained, and the words and the reliability of the words are set as one set, and a predetermined number of sets are output as recognition results.

【００３７】認識結果判定部１０８は、単語認識部１０
７から出力された認識単語とその単語の信頼度に基づい
て認識の成功／失敗を判定する。単語の信頼度が予め定
められたリジェクトパラメータ（φとする、φ：実数）
より大きければ認識成功と判定し、認識単語の内容を解
析して音声出力制御部１０９へ音声出力制御信号を出力
するか、または映像出力制御部１１４へ映像出力制御信
号を出力する。また、認識単語の信頼度がリジェクトパ
ラメータφより小さければ認識失敗と判定し、認識単語
をリジェクトする。そして、その認識失敗は、電子機器
からの音声が同時に出ていると推定しそれを低下させる
ように、音量制御部１１１へ音量制御信号を出力する。The recognition result determination unit 108 is the word recognition unit 10.
The recognition success / failure is determined based on the recognition word output from 7 and the reliability of the word. Rejection parameter with predetermined word reliability (φ, φ: real number)
If it is larger, it is determined that the recognition is successful, the content of the recognized word is analyzed, and a voice output control signal is output to the voice output control unit 109 or a video output control signal is output to the video output control unit 114. If the reliability of the recognized word is smaller than the reject parameter φ, it is determined that the recognition has failed and the recognized word is rejected. Then, the recognition failure outputs a sound volume control signal to the sound volume control unit 111 so as to estimate that sound from the electronic device is simultaneously output and reduce the sound.

【００３８】音声出力制御部１０９は、チャンネル切換
部１１０と音量制御部１１１とパラメータ変更通知部１
１２とで構成されている。The audio output control unit 109 includes a channel switching unit 110, a volume control unit 111, and a parameter change notifying unit 1.
It is composed of 12 and.

【００３９】チャンネル切換部１１０は、例えばテレビ
の場合ではテレビ音声のチャンネル切換を行うものであ
り、認識が成功した結果、音声出力制御信号が入力され
ると、その内容に基づいて音声出力部１１３へ出力する
音声ソースのチャンネルの切換を行う。For example, in the case of a television, the channel switching section 110 switches the channel of the television sound. When the speech output control signal is input as a result of the successful recognition, the speech output section 113 is based on the content thereof. Switch the audio source channel to be output to.

【００４０】音量制御部１１１は、各部１０４，１０
６，１０８などから、上記音量制御信号が入力される
と、それらの音量制御信号の内容に従って音声出力部１
１３から出力する音声信号の音量の制御を行う。The volume control unit 111 includes the units 104 and 10
When the volume control signals are input from the voice control unit 6, 108, etc., the audio output unit 1 is in accordance with the contents of the volume control signals.
The volume of the audio signal output from 13 is controlled.

【００４１】パラメータ変更通知部１１２は、音声出力
部１１３から出力される音声信号の音量の設定値が変更
されると、音声入力レベル検出部１０４へレベルパラメ
ータαの値を音量の設定値に基づいて変更することを示
すパラメータ変更通知信号を出力する。または、音声区
間検出部１０６へ区間検出パラメータｔの値を音量の設
定値に基づいて変更することを示すパラメータ変更通知
信号を出力する。パラメータ変更通知部１１２は、例え
ば電子機器の電源投入時やチャンネル切換部１１０が音
声出力部１１３へ出力する音声ソースのチャンネルを変
更した時など、音量の制御を行う前、すなわち、音量制
御信号が出力され、音量制御部１１１が音声出力部１１
３から出力される音声の音量を制御する前に、音量制御
部１１１で設定されている音量の設定値に基づいてレベ
ルパラメータαおよび区間検出パラメータｔの両方また
は一方を変更する。レベルパラメータαおよび区間検出
パラメータｔが既に適切な値に設定されている場合は変
更しなくてもよい。When the set value of the volume of the audio signal output from the audio output unit 113 is changed, the parameter change notification unit 112 sends the value of the level parameter α to the audio input level detection unit 104 based on the set value of the volume. And outputs a parameter change notification signal indicating that the parameter is changed. Alternatively, a parameter change notification signal indicating that the value of the section detection parameter t is changed based on the volume setting value is output to the voice section detection unit 106. The parameter change notification unit 112 outputs the volume control signal before performing the volume control, for example, when the power of the electronic device is turned on or when the channel switching unit 110 changes the channel of the audio source output to the audio output unit 113. The sound volume control unit 111 outputs the sound output unit 11
Before controlling the volume of the sound output from the audio output unit 3, both or one of the level parameter α and the section detection parameter t is changed based on the volume setting value set by the volume control unit 111. If the level parameter α and the section detection parameter t have already been set to appropriate values, they need not be changed.

【００４２】つまり、認識の途中ではパラメータは変更
せず、電子機器の電源を投入した時、認識が成功して電
子機器の操作を行った結果、電子機器から出力される音
量が大幅に変わった場合等、以前のままのパラメータを
用いると、正確な判断が出来なくなるため、変更するも
のである。That is, the parameters are not changed during the recognition, and when the power of the electronic device is turned on, the recognition is successful and the electronic device is operated. As a result, the volume output from the electronic device is significantly changed. In such a case, if the same parameters as before are used, it becomes impossible to make an accurate judgment, and therefore the parameters are changed.

【００４３】音声出力部１１３は、スピーカーなどの音
声出力装置であり、音声出力制御部１０９から得られた
音声信号を出力する。The voice output unit 113 is a voice output device such as a speaker, and outputs the voice signal obtained from the voice output control unit 109.

【００４４】映像出力制御部１１４は、例えばテレビの
場合はテレビ映像のチャンネル切換を行うものであり、
映像出力制御信号が入力されると、その内容に基づいて
映像出力部１１５へ出力する映像ソースのチャンネル切
換を行う。The video output control unit 114, for example, in the case of a television, switches the channels of television video.
When the video output control signal is input, the channel of the video source to be output to the video output unit 115 is switched based on its content.

【００４５】映像出力部１１５は、ディスプレイなどの
映像表示装置であり、映像出力制御部１１４から得られ
た映像信号を出力する。The video output unit 115 is a video display device such as a display, and outputs the video signal obtained from the video output control unit 114.

【００４６】以上のように構成された電子機器につい
て、以下その動作を図２のフローチャートを用いて説明
する。The operation of the electronic device configured as described above will be described below with reference to the flowchart of FIG.

【００４７】音声入力待ち状態Ｓ３００では、電子機器
は話者からの音声入力を待っている状態である。この
時、レベルパラメータαおよび区間検出パラメータｔ
は、パラメータ変更通知部１１２が音声出力部１１３か
ら出力される音声信号の音量の設定値に基づいて出力す
るパラメータ変更通知信号に基づいて、適切な値に設定
されているものとする。In the voice input waiting state S300, the electronic device is in a state of waiting for voice input from the speaker. At this time, the level parameter α and the section detection parameter t
Is set to an appropriate value based on the parameter change notification signal output by the parameter change notification unit 112 based on the set value of the volume of the audio signal output from the audio output unit 113.

【００４８】ステップＳ３０１では、音声検出部１０２
が、話者の発声開始を検知し、話者が電子機器の操作を
行うための音声コマンドの発声を開始したと判定した場
合は、話者の音声をマイク１０３へ入力し、ステップＳ
３０２へ遷移する。In step S301, the voice detection unit 102
Detects the start of the speaker's utterance and determines that the speaker has started uttering a voice command for operating the electronic device, the speaker's voice is input to the microphone 103, and step S
Transition to 302.

【００４９】音声検出部１０２が話者の発声開始を検出
する方法は、例えばプッシュトークスイッチを用い、話
者がプッシュトークスイッチを押下することで話者の発
声開始が検知され、話者の発声した音声コマンドがマイ
クへ入力される。話者の発声開始を検知できなかった場
合、すなわち、プッシュトークスイッチを用いた場合で
スイッチが押下されない時は、音声入力待ち状態Ｓ３０
０が継続される。The method for detecting the start of the speaker's utterance by the voice detecting unit 102 is, for example, using a push-talk switch. When the speaker presses the push-talk switch, the talker's utterance start is detected and The voice command is input to the microphone. When it is not possible to detect the start of speaking by the speaker, that is, when the push-talk switch is used and the switch is not pressed, a voice input waiting state S30
0 continues.

【００５０】ステップＳ３０２では、マイク１０３が、
入力された音声を音声データとして音声入力レベル検出
部１０４へ出力し、音声入力レベル検出部１０４が、マ
イク１０３から出力された音声データの音声レベルとα
とを比較する。音声レベルがαより小さい場合には、音
声データを音声区間検出部１０６へ出力し、ステップＳ
３０３へ遷移する。一方、音声レベルがαより大きい場
合には、音量制御部１１１へ音声出力部１１３から出力
される音声信号の音量を小さくすることを示す音量制御
信号を出力し、ステップＳ３１１へ遷移する。At step S302, the microphone 103
The input voice is output as voice data to the voice input level detection unit 104, and the voice input level detection unit 104 outputs the voice level of the voice data output from the microphone 103 and α
Compare with. If the voice level is lower than α, the voice data is output to the voice section detection unit 106, and step S
Transition to 303. On the other hand, when the audio level is higher than α, a volume control signal indicating that the volume of the audio signal output from the audio output unit 113 is reduced is output to the volume control unit 111, and the process proceeds to step S311.

【００５１】ステップＳ３０３では、音声区間検出部１
０６が、音声入力レベル検出部１０４から出力された音
声データから、話者が発声したと思われる音声区間を検
出する。音声区間検出部１０６は、音声区間がｔより短
い場合には、音声データを単語認識部１０７へ出力し、
ステップＳ３０４へ遷移する。一方、音声区間がｔより
大きい場合には、音量制御部１１１へ音声出力部１１３
から出力される音声信号の音量を小さくすることを示す
音量制御信号を出力し、ステップＳ３１１へ遷移する。In step S303, the voice section detection unit 1
06 detects a voice section which is considered to be uttered by the speaker from the voice data output from the voice input level detection unit 104. If the voice section is shorter than t, the voice section detection unit 106 outputs the voice data to the word recognition unit 107,
The process proceeds to step S304. On the other hand, when the voice section is longer than t, the voice output unit 113 is sent to the volume control unit 111.
A volume control signal indicating that the volume of the audio signal output from is reduced is output, and the process proceeds to step S311.

【００５２】ステップＳ３０４では、単語認識部１０７
が、音声区間検出部１０６から出力された音声データを
認識用辞書を用いて音声認識処理を行う。例えば、電子
機器がテレビである場合、認識用辞書には、一例として
「○○チャンネル」や、「ステレオ出力」や、「入力切
換」といったテレビの操作内容を表現した音声コマンド
が標準音声データとして記録されており、単語認識部１
０７は、それら標準音声データそれぞれと音声区間検出
部１０６から出力された音声データとを比較し、話者の
音声データに最も近い標準音声データを求め、その単語
と単語の信頼度を認識結果判定部１０８へ出力する。In step S304, the word recognition unit 107
However, the voice data output from the voice section detection unit 106 is subjected to voice recognition processing using a recognition dictionary. For example, if the electronic device is a television, the recognition dictionary may include, as an example, voice commands that represent the operation content of the television such as “XX channel”, “stereo output”, and “input switching” as standard voice data. It is recorded and word recognition part 1
Reference numeral 07 compares the standard voice data with the voice data output from the voice section detection unit 106 to obtain standard voice data closest to the voice data of the speaker, and determines the reliability of the word and the word as a recognition result. Output to the unit 108.

【００５３】ステップＳ３０５では、認識結果判定部１
０８が、単語認識部１０７から出力された認識単語の信
頼度とφとを比較して、信頼度がφより大きい場合に
は、認識成功と判定し、ステップＳ３０６へ遷移する。
一方、信頼度がφより小さい場合には、認識失敗と判定
し、認識単語をリジェクトし、音量制御部１１１へ音声
出力部１１３から出力される音声信号の音量を小さくす
ることを示す音量制御信号を出力してステップＳ３１１
へ遷移する。In step S305, the recognition result judging unit 1
08 compares the reliability of the recognized word output from the word recognition unit 107 with φ, and if the reliability is greater than φ, it is determined that the recognition is successful, and the process proceeds to step S306.
On the other hand, when the reliability is smaller than φ, it is determined that the recognition has failed, the recognized word is rejected, and the volume control signal indicating that the volume of the voice signal output from the voice output unit 113 to the volume control unit 111 is reduced. Is output and step S311
Transition to.

【００５４】ステップＳ３０６では、認識結果判定部１
０８が、認識単語のコマンドの内容を解析し、その内容
が電子機器の映像出力を制御するコマンドである場合
は、映像出力制御部１１４へ映像出力制御信号を出力し
てステップＳ３０７へ遷移する。一方、その内容が電子
機器の音声出力を制御するコマンドである場合は、音声
出力制御部１０９へ音声出力制御信号を出力してステッ
プＳ３０９へ遷移する。In step S306, the recognition result judging unit 1
08, analyzes the content of the command of the recognition word, and when the content is a command for controlling the video output of the electronic device, outputs a video output control signal to the video output control unit 114 and transitions to step S307. On the other hand, when the content is a command for controlling the audio output of the electronic device, the audio output control signal is output to the audio output control unit 109, and the process proceeds to step S309.

【００５５】ステップＳ３０７では、映像出力制御部１
１４が、認識結果判定部１０８から映像出力制御信号が
入力されると、その内容に従って映像出力部１１５から
出力する映像ソースのチャンネルの切換を行う。In step S307, the video output controller 1
When the video output control signal is input from the recognition result determination unit 108, the switching unit 14 switches the channel of the video source output from the video output unit 115 according to the content.

【００５６】ステップＳ３０８では、映像出力部１１５
が、映像出力制御部１１４で切り換えられた映像ソース
を画面に表示し、音声入力待ち状態Ｓ３００へ戻る。In step S308, the video output section 115
Displays the video source switched by the video output control unit 114 on the screen and returns to the voice input waiting state S300.

【００５７】ステップＳ３０９では、チャンネル切換部
１１０が、認識結果判定部１０８から音声出力制御信号
が入力されると、その内容に従って、音声出力部１１３
から出力する音声ソースのチャンネルの切換を行う。In step S309, when the voice output control signal is input from the recognition result determination unit 108, the channel switching unit 110 outputs the voice output unit 113 according to the content of the voice output control signal.
Switch the channel of the audio source output from.

【００５８】ステップＳ３１０では、音声出力部１１３
が、チャンネル切換部１１０で切り換えられた音声ソー
スを出力し、音声入力待ち状態Ｓ３００へ戻る。In step S310, the voice output unit 113
Outputs the audio source switched by the channel switching unit 110, and returns to the audio input waiting state S300.

【００５９】ステップＳ３１１では、音量制御部１１１
が、音声入力レベル検出部１０４、または音声区間検出
部１０６、または認識結果判定部１０８から音量制御信
号が入力されると、音声出力部１１３から出力される音
声信号の音量を下げ、音声入力待ち状態Ｓ３００へ戻
る。In step S311, the volume control unit 111
However, when the volume control signal is input from the voice input level detection unit 104, the voice section detection unit 106, or the recognition result determination unit 108, the volume of the voice signal output from the voice output unit 113 is reduced, and the voice input waiting is performed. Return to state S300.

【００６０】このように、本発明の実施の形態によれ
ば、音声を出力し、音声認識機能を備えた電子機器を音
声で操作する場合、ステップＳ３０２において、音声入
力レベル検出部１０４で音声の音声レベルが大きすぎる
と判定した場合には、音声レベル検出部１０４が、音量
制御部１１１へ音声出力部１１３から出力される音声信
号の音量を小さくすることを示す音量制御信号を出力す
る。音量制御部１１１は、音声入力レベル検出部１０４
から音量制御信号が入力されると、音声出力部１１３か
ら出力する音声信号の音量を小さくする。つまり、ステ
ップＳ３０３以降の処理を行わずに、迅速に電子機器本
体から出力される音声信号の音量を制御することがで
き、電子機器の操作性を向上させることができる。As described above, according to the embodiment of the present invention, when a voice is output and an electronic device having a voice recognition function is operated by voice, the voice input level detection unit 104 outputs the voice in step S302. When it is determined that the sound level is too high, the sound level detection unit 104 outputs a sound volume control signal to the sound volume control unit 111, which indicates that the sound signal output from the sound output unit 113 is to be decreased in sound volume. The volume control unit 111 includes a voice input level detection unit 104.
When the volume control signal is input from, the volume of the audio signal output from the audio output unit 113 is reduced. That is, it is possible to quickly control the volume of the audio signal output from the electronic device body without performing the processing of step S303 and subsequent steps, and it is possible to improve the operability of the electronic device.

【００６１】また、ステップＳ３０３において、音声区
間検出部１０６で音声区間が長すぎると判定した場合に
は、音声区間検出部１０６が、音量制御部１１１へ音声
出力部１１３から出力される音声信号の音量を小さくす
ることを示す音量制御信号を出力する。音量制御部１１
１は、音声区間検出部１０６から音量制御信号が入力さ
れると、音声出力部１１３から出力する音声信号の音量
を小さくする。つまり、ステップＳ３０４以降の処理を
行わずに、迅速に電子機器本体から出力される音声信号
の音量を制御することができ、電子機器の操作性を向上
させることができる。In step S303, when the voice section detection unit 106 determines that the voice section is too long, the voice section detection unit 106 outputs the voice signal output from the voice output unit 113 to the volume control unit 111. A volume control signal that indicates to reduce the volume is output. Volume control unit 11
When the volume control signal is input from the voice section detection unit 106, 1 reduces the volume of the voice signal output from the voice output unit 113. That is, it is possible to quickly control the volume of the audio signal output from the electronic device body without performing the processing of step S304 and subsequent steps, and it is possible to improve the operability of the electronic device.

【００６２】このように、ステップ３０３或いはステッ
プ３０４以上の処理を行わずに、電子機器の音量を低下
させるから、電子機器の操作性の迅速化を実現できる。As described above, since the volume of the electronic device is lowered without performing the processing of step 303 or step 304 or higher, the operability of the electronic device can be speeded up.

【００６３】また、電子機器の電源投入時やステップＳ
３１０において音声出力部１１３から出力する音声ソー
スが切り換えられた時には、パラメータ変更通知部１１
２が、音量制御部１１１で設定されている音量の設定値
に基づいて、パラメータ変更通知信号を出力する。音声
入力レベル検出部１０４は、パラメータ変更通知信号に
基づいてレベルパラメータを適切に変更する。また、音
声区間検出部１０６は、パラメータ変更通知信号に基づ
いて区間検出パラメータを適切に変更する。Also, when the power of the electronic device is turned on or at step S
When the audio source output from the audio output unit 113 is switched in 310, the parameter change notification unit 11
2 outputs a parameter change notification signal based on the set value of the volume set by the volume control unit 111. The voice input level detection unit 104 appropriately changes the level parameter based on the parameter change notification signal. The voice section detection unit 106 also appropriately changes the section detection parameter based on the parameter change notification signal.

【００６４】これにより、電子機器本体から出力される
音声信号の音量が変化し、音声入力部１０１へ入力され
る音声信号の音量が変化した場合でも、認識処理に明ら
かに不適切な音声信号の検出を正確に行うことができ、
時間のかかる認識処理を行わなくてもよいので、機器の
操作性を向上させることができる。As a result, even if the volume of the voice signal output from the electronic device main body changes and the volume of the voice signal input to the voice input unit 101 changes, the voice signal clearly unsuitable for the recognition process is generated. The detection can be done accurately,
Since the time-consuming recognition process does not have to be performed, the operability of the device can be improved.

【００６５】なお、本発明は、上述した本発明の電子機
器の全部または一部の手段（または、装置、素子、回
路、部等）の機能をコンピュータにより実行させるため
のプログラムであって、コンピュータと協働して動作す
るプログラムである。The present invention is a program for causing a computer to execute the functions of all or part of means (or devices, elements, circuits, units, etc.) of the electronic apparatus of the present invention described above. It is a program that works in cooperation with.

【００６６】また、本発明は、上述した本発明の電子機
器の全部または一部の手段の全部または一部の機能をコ
ンピュータにより実行させるためのプログラムを担持し
た媒体であり、コンピュータにより読み取り可能且つ、
読み取られた前記プログラムが前記コンピュータと協動
して前記機能を実行する媒体である。Further, the present invention is a medium carrying a program for causing a computer to execute all or some of the functions of all or some of the means of the electronic apparatus of the present invention described above, which is readable by the computer and ,
The read program is a medium that executes the function in cooperation with the computer.

【００６７】なお、本発明の一部の手段（または、装
置、素子、回路、部等）とは、それらの複数の手段の内
の、幾つかの手段を意味し、あるいは、一つの手段の内
の、一部の機能を意味するものである。Some means (or devices, elements, circuits, parts, etc.) of the present invention means some of the plurality of means, or one of the means. It means a part of the functions.

【００６８】また、本発明の一部の装置（または、素
子、回路、部等）とは、それらの複数の装置の内の、幾
つかの装置を意味し、あるいは、一つの装置の内の、一
部の手段（または、素子、回路、部等）を意味し、ある
いは、一つの手段の内の、一部の機能を意味するもので
ある。Further, some devices (or elements, circuits, parts, etc.) of the present invention mean some devices out of the plurality of devices, or one device. , Means a part of means (or an element, a circuit, a part, etc.) or means a part of the function of one means.

【００６９】また、本発明のプログラムを記録した、コ
ンピュータに読みとり可能な記録媒体も本発明に含まれ
る。A computer-readable recording medium in which the program of the present invention is recorded is also included in the present invention.

【００７０】また、本発明のプログラムの一利用形態
は、コンピュータにより読み取り可能な記録媒体に記録
され、コンピュータと協働して動作する態様であっても
良い。Further, one usage form of the program of the present invention may be a mode in which the program is recorded in a computer-readable recording medium and operates in cooperation with the computer.

【００７１】また、本発明のプログラムの一利用形態
は、伝送媒体中を伝送し、コンピュータにより読みとら
れ、コンピュータと協働して動作する態様であっても良
い。Further, one usage form of the program of the present invention may be a mode in which the program is transmitted through a transmission medium, read by a computer, and operates in cooperation with the computer.

【００７２】また、本発明のデータ構造としては、デー
タベース、データフォーマット、データテーブル、デー
タリスト、データの種類などを含む。Further, the data structure of the present invention includes a database, a data format, a data table, a data list, a data type and the like.

【００７３】また、記録媒体としては、ＲＯＭ等が含ま
れ、伝送媒体としては、インターネット等の伝送媒体、
光・電波・音波等が含まれる。Further, the recording medium includes a ROM and the like, the transmission medium includes the transmission medium such as the Internet,
Light, radio waves, sound waves, etc. are included.

【００７４】また、上述した本発明のコンピュータは、
ＣＰＵ等の純然たるハードウェアに限らず、ファームウ
ェアや、ＯＳ、更に周辺機器を含むものであっても良
い。Further, the computer of the present invention described above is
The hardware is not limited to pure hardware such as a CPU, and may include firmware, an OS, and peripheral devices.

【００７５】なお、以上説明した様に、本発明の構成
は、ソフトウェア的に実現しても良いし、ハードウェア
的に実現しても良い。As described above, the configuration of the present invention may be realized by software or hardware.

【００７６】[0076]

【発明の効果】以上のように本発明によれば、次に示す
ような効果が得られる。As described above, according to the present invention, the following effects can be obtained.

【００７７】第一に、音声レベルが予め定められた値よ
り大きかった時点で、音声出力手段から出力する音声信
号の音量を小さくするので、それ以降の認識処理を行わ
ずに、前記出力音声の音量を小さくすることができる。
即ち、音声レベルが予め定められた値より大きい場合
に、出力音声の制御を迅速に行うことができる。First, when the voice level is higher than a predetermined value, the volume of the voice signal output from the voice output means is reduced, so that the subsequent recognition processing is not performed and the output voice is output. The volume can be reduced.
That is, when the sound level is higher than a predetermined value, the output sound can be quickly controlled.

【００７８】第二に、音声区間が予め定められた値より
長かった時点で、音声出力手段から出力する音声信号の
音量を小さくするので、それ以降の認識処理を行わず
に、前記出力音声の音量を小さくすることができる。即
ち、音声区間が予め定められた値より大きい場合に、出
力音声の制御を迅速に行うことができる。Secondly, when the voice section is longer than a predetermined value, the volume of the voice signal output from the voice output means is reduced, so that the subsequent recognition processing is not performed and the output voice is output. The volume can be reduced. That is, when the voice section is larger than a predetermined value, the output voice can be quickly controlled.

【００７９】第三に、出力音声の音量が変更された場合
には、パラメータ変更通知手段から出力されるパラメー
タ変更通知信号に基づいてレベルパラメータ、または区
間検出パラメータの一方または両方を適切に変更するの
で、出力音声の音量が変更されても、音声区間や音声入
力レベルを正確に検出することができる。これにより、
出力音声の音量が変更されても、認識処理に明らかに不
適切な音声の検出を正確に行うことができ、出力音声の
制御を迅速に行うことができる。Thirdly, when the volume of the output voice is changed, one or both of the level parameter and the section detection parameter are appropriately changed based on the parameter change notification signal output from the parameter change notifying means. Therefore, even if the volume of the output voice is changed, the voice section and the voice input level can be accurately detected. This allows
Even if the volume of the output voice is changed, it is possible to accurately detect the voice that is obviously inappropriate for the recognition process, and to quickly control the output voice.

[Brief description of drawings]

【図１】本発明の実施の形態の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.

【図２】本発明の実施の形態の処理手順を示すフローチ
ャートFIG. 2 is a flowchart showing a processing procedure according to the embodiment of the present invention.

【図３】本発明の従来例の電子機器を示すブロック図FIG. 3 is a block diagram showing an electronic device of a conventional example of the present invention.

[Explanation of symbols]

１０１音声入力部１０２音声検出部１０３マイク１０４音声入力レベル検出部１０５音声認識部１０６音声区間検出部１０７単語認識部１０８認識結果判定部１０９音声出力制御部１１０チャンネル切換部１１１音量制御部１１２パラメータ変更通知部１１３音声出力部１１４映像出力制御部１１５映像出力部 101 voice input unit 102 voice detector 103 microphone 104 voice input level detector 105 voice recognition unit 106 voice section detector 107 word recognition unit 108 Recognition result determination unit 109 Audio output control unit 110 channel switching unit 111 Volume control unit 112 Parameter change notification section 113 voice output section 114 video output controller 115 Video output section

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/10 15/28 (72)発明者藤▲さき▼ 克巳大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5D015 AA04 BB01 DD02 KK01 LL12Continuation of the front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/10 15/28 (72) Inventor Fuji Saki ▼ Katsumi Katsumi Osaka Prefecture 1006 Kadoma, Kadoma Matsushita Electric Industrial Co., Ltd. Inner F term (reference) 5D015 AA04 BB01 DD02 KK01 LL12

Claims

[Claims]

1. An electronic device equipped with voice recognition means, voice output means for outputting voice, and volume control means for controlling the volume of the voice output means detects the start of a voice command of a speaker. Voice detection means,
A voice collection unit that collects voice, a voice input level detection unit that detects the level of the collected voice, and controls the volume control unit based on the level, and a recognition result obtained from the voice recognition unit. An electronic device including at least a recognition result determining means for outputting a voice output control signal for executing the voice command of the speaker based on the above.

2. The voice input level detecting means holds a predetermined level parameter so as to correspond to a voice command level of an input voice, and if the voice level is higher than the level parameter, The electronic device according to claim 1, wherein a volume control signal for reducing the volume of voice is output to the volume control means.

3. An electronic device equipped with voice recognition means, voice output means for outputting a voice, and volume control means for controlling the volume of the voice output means detects the start of a voice command of a speaker. Voice detection means,
A voice collecting unit that collects voice, a voice section detecting unit that detects a voice section of the collected voice and controls the volume control unit based on the length of the voice section, and a voice section that are obtained from the voice recognizing unit. Based on the recognition results
An electronic device comprising: a recognition result determining unit that outputs a voice output control signal for executing the voice command of the speaker.

4. The voice section detecting means holds a predetermined section detection parameter so as to correspond to a voice section of a voice command of an input voice, and when the voice section is longer than the section detection. The electronic device according to claim 3, wherein a volume control signal for reducing the volume of voice is output to the volume control means.

5. The voice detecting means starts a voice command from a sensor for detecting the presence of a person in a predetermined area based on the voice collecting means, or the speaker itself by a method other than voice. The electronic device according to claim 1 or 3, which is configured by a switch or a sensor capable of notifying the user of the fact.

6. A volume setting value detecting means for detecting a setting value of the volume output by the device, and the level parameter or the section detecting parameter is changed based on the setting value detected by the volume setting value detecting means. The electronic device according to claim 1, further comprising a parameter changing unit that indicates that the electronic device is operated.

7. The electronic device according to claim 1, wherein a voice detecting means for detecting the start of a voice command of a speaker, a level of collected voice are detected, and the volume control means is controlled based on the level. All or one of the voice input level detecting means for controlling and the recognition result judging means for outputting at least a voice output control signal for executing the voice command of the speaker based on the recognition result obtained from the voice recognizing means. A program that causes a computer to function as a department.

8. The electronic device according to claim 3, wherein the voice detection means for detecting the start of a voice command of the speaker, the voice section of the collected voice are detected, and based on the length of the voice section, A voice section detecting means for controlling the volume control means,
A program for causing a computer to function at least as a whole or part of a recognition result determination unit that outputs a voice output control signal for executing a voice command of the speaker based on the recognition result obtained from the voice recognition unit. .

9. The electronic device according to claim 1, wherein a voice detecting means for detecting a start of a voice command of a speaker, a level of collected voice are detected, and the volume control means is controlled based on the level. All or one of the voice input level detecting means for controlling and the recognition result judging means for outputting at least a voice output control signal for executing the voice command of the speaker based on the recognition result obtained from the voice recognizing means. A medium that carries a program for causing a computer to function as a unit, and is characterized by being processable by the computer.

10. The electronic device according to claim 3, wherein the voice detection means for detecting the start of a voice command of the speaker, the voice section of the collected voice are detected, and based on the length of the voice section, A voice section detecting means for controlling the volume control means, and a recognition result judging means for outputting at least a voice output control signal for executing a voice command of the speaker based on a recognition result obtained from the voice recognizing means. A medium carrying a program for causing a computer to function as all or a part of the above, and characterized by being processable by the computer.