JPH11231896A

JPH11231896A - Speech startup system

Info

Publication number: JPH11231896A
Application number: JP10037374A
Authority: JP
Inventors: Masahiro Kamiya; 昌宏神谷; Kazuhiro Sakiyama; 和広崎山; Hideki Kitao; 英樹北尾
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1998-02-19
Filing date: 1998-02-19
Publication date: 1999-08-27
Anticipated expiration: 2018-02-19
Also published as: JP3524370B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech startup system rather free from a concern about a mistake, regarding the startup control of a speech control device using a speech keyword. SOLUTION: Regarding a speech startup system having a keyword judgement means 7 for making judgement as to whether an entered speech is a registered speech keyword, and controlling the startup of a speech control device 8 on the basis of the result of the judgement. The keyword judgement means 7 is constituted so that a similarity value between the entered speech and the speech keyword is measured, and the entered speech is judged as the speech keyword when the measured similarity value is equal to or smaller than a specific value.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声起動システムに
関し、より詳細には、音声認識によって各種機器の操作
制御を行なう、いわゆる音声制御装置の音声起動システ
ムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice activation system, and more particularly, to a voice activation system of a so-called voice control device which controls the operation of various devices by voice recognition.

【０００２】[0002]

【従来の技術】音声認識による機器制御は手を使わず
（ハンズフリー）、また視線移動を必要としない（アイ
ズフリー）ので、別の動作をしながらでも音声によって
機器の操作制御を行なうことができる。例えば、車両運
転中に、前方への視線を移動させたり、ハンドルから手
を離さずとも、機器の操作制御を行なうことができ、車
両運転中の安全性の確保が容易となる。2. Description of the Related Art Since device control by voice recognition does not use hands (hands-free) and does not require movement of eyes (eyes-free), it is possible to control the operation of devices by voice while performing other operations. it can. For example, it is possible to control the operation of the device without moving the line of sight to the front or releasing the hand from the steering wheel while driving the vehicle, and it is easy to ensure safety during driving of the vehicle.

【０００３】上記のような音声制御装置を使う場合、通
常の制御モード（音声認識による操作制御の可能でない
状態）から音声制御モード（音声認識による操作制御の
可能な状態）へ切り替わるのが普通であり、その切換方
法として、通常では音声キーワードが発声された否かと
いった音声認識方法が採用されている。When the above-described voice control device is used, it is normal to switch from a normal control mode (a state in which operation control by voice recognition is not possible) to a voice control mode (a state in which operation control by voice recognition is possible). There is usually a voice recognition method such as whether or not a voice keyword is uttered as the switching method.

【０００４】[0004]

【発明が解決しようとする課題】上記したように、音声
制御装置はその利点（ハンズフリー、アイズフリー）か
ら車両機器制御への利用に大変効果がある。しかしなが
ら、車両内では周囲の物音や、話し声等を登録されてい
る音声キーワードと誤認し、必要のないときに音声制御
モードへ切り替わってしまうといった不具合が生じやす
い状況にある。As described above, the voice control device is very effective for use in controlling vehicle equipment because of its advantages (hands-free and eyes-free). However, in a vehicle, there is a situation in which a problem such as a surrounding sound, a voice of speech, and the like being mistaken for a registered voice keyword and switching to a voice control mode when unnecessary is likely to occur.

【０００５】また、車載用ハンズフリー電話機と音声制
御装置とで音声入力手段、すなわちマイク等を共有する
システムを使用する場合、通話中における音声制御モー
ドへの切り替えは困難を極める。それは、通話中の話し
声を登録されている音声キーワードと誤認してしまう可
能性が極めて高いからである。[0005] Further, when a system for sharing voice input means, ie, a microphone, is used between the in-vehicle hands-free telephone and the voice control device, it is extremely difficult to switch to the voice control mode during a call. This is because there is a very high possibility that a speech during a call is erroneously recognized as a registered voice keyword.

【０００６】本発明は上記課題に鑑みなされたものであ
って、音声キーワードを用いた音声制御装置の起動制御
であったとしても、誤認を招くおそれの少ない音声起動
システムを提供することを目的としている。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a voice activation system which is less likely to cause erroneous recognition even if the activation control of a voice control device uses a voice keyword. I have.

【０００７】[0007]

【課題を解決するための手段及びその効果】上記目的を
達成するために本発明に係る音声起動システム（１）
は、入力された音声が登録されている音声キーワードで
あるか否かを判断するキーワード判断手段を備えると共
に、その判断結果に基づいて音声制御装置の起動制御を
行なう音声起動システムにおいて、入力音声と音声キー
ワードとの類似値を計測し、計測された類似値が所定値
以下であれば、前記入力音声が前記音声キーワードであ
ると判断するように、前記キーワード判断手段が構成さ
れていることを特徴としている。Means for Solving the Problems and Their Effects To achieve the above object, a voice activation system (1) according to the present invention.
Comprises a keyword determining means for determining whether or not the input voice is a registered voice keyword, and a voice activation system for performing activation control of the voice control device based on the determination result. The keyword determining means is configured to measure a similarity value with a voice keyword, and determine that the input voice is the voice keyword if the measured similarity value is equal to or less than a predetermined value. And

【０００８】上記音声起動システム（１）によれば、入
力音声と登録されている音声キーワードとの類似度が低
ければ、前記音声制御装置を起動させないので、周囲の
物音や、話し声等で誤って前記音声制御装置が起動する
可能性、すなわち音声制御モードへ切り替わる可能性を
低減させることができる。従って、従来では通話中にお
いては誤認する可能性の高かった、車載用ハンズフリー
電話機と音声制御装置とでマイク等を共有させたシステ
ムであっても、上記音声起動システム（１）を採用する
ことによって、その誤認の可能性を大幅に低減させるこ
とができる。According to the voice activation system (1), if the similarity between the input voice and the registered voice keyword is low, the voice control device is not activated. It is possible to reduce the possibility that the voice control device is activated, that is, the possibility of switching to the voice control mode. Therefore, even in a system in which a microphone or the like is shared between an in-vehicle hands-free telephone and a voice control device, which has conventionally been highly likely to be mistakenly recognized during a call, the voice activation system (1) is employed. Thereby, the possibility of the misidentification can be greatly reduced.

【０００９】また、本発明に係る音声起動システム
（２）は、上記音声起動システム（１）において、入力
された音声パターンと予め記憶させておいた音素標準パ
ターンとから、パターンマッチング処理によって計測さ
れたパターン間距離に基づいて、前記類似値を決定する
ようになっていることを特徴としている。In the voice activation system (2) according to the present invention, in the voice activation system (1), an input voice pattern and a phoneme standard pattern stored in advance are measured by a pattern matching process. The similarity value is determined based on the inter-pattern distance.

【００１０】上記音声起動システム（２）によれば、パ
ターン間距離に基づいて決定された値を類似値として採
用するので、システムの信頼性を向上させることができ
る。According to the voice activation system (2), the value determined based on the inter-pattern distance is adopted as the similar value, so that the reliability of the system can be improved.

【００１１】また、本発明に係る音声起動システム
（３）は、上記音声起動システム（１）又は（２）にお
いて、入力された音声が音声キーワードであると判断さ
れた場合の過去の最大類似値に基づいて、所定値を設定
する第１の設定手段を備えていることを特徴としてい
る。The voice activation system (3) according to the present invention is characterized in that in the voice activation system (1) or (2), the past maximum similarity value when the input voice is determined to be a voice keyword. And a first setting means for setting a predetermined value on the basis of the first setting value.

【００１２】上記音声起動システム（３）によれば、過
去の判断実績（過去の最大類似値）に基づいて、所定値
を設定するので、より一層、誤認の可能性を低減させる
ことができる。例えば、入力音声が音声キーワードであ
ると判断された場合の類似値（過去の実績）が４０、２
０、３０、５０、２０、１０である場合、過去の実績か
ら入力音声が音声キーワードであると判断する類似値は
５０（最大類似値）以下で良いことが分かる。このとき
に設定されている所定値が２００であれば５０〜２００
は不要であり、むしろ誤認の可能性を高めてしまう。そ
こで、誤認の可能性を低減させるためには、所定値を最
大類似値に基づいて設定し直すのが効果的である。例え
ば、所定値を７０｛＝５０（最大類似値）＋２０
（幅）｝に設定する。これにより、特定の使用者に対し
て、通常の会話音や周囲の物音等で誤って音声制御モー
ドに切り替わることをほとんどなくすことができる。According to the voice activation system (3), since the predetermined value is set based on the past judgment result (past maximum similarity value), the possibility of erroneous recognition can be further reduced. For example, when the similarity value (past performance) when it is determined that the input voice is a voice keyword is 40, 2,
In the case of 0, 30, 50, 20, and 10, it can be understood that the similarity value for determining that the input speech is a speech keyword is 50 or less (maximum similarity value) based on past results. If the predetermined value set at this time is 200, 50 to 200
Is unnecessary, but rather increases the possibility of misperception. Therefore, in order to reduce the possibility of erroneous recognition, it is effective to reset the predetermined value based on the maximum similarity value. For example, the predetermined value is set to 70 ° = 50 (maximum similar value) +20
(Width) Set to｝. As a result, it is possible to almost completely prevent a specific user from erroneously switching to the voice control mode due to a normal conversation sound, a surrounding noise, or the like.

【００１３】また、本発明に係る音声起動システム
（４）は、上記音声起動システム（３）において、前記
第１の設定手段により設定された所定値ではなく、予め
設定しておいた所定値に設定し直す第２の設定手段と、
該第２の設定手段を稼働させるための第１の入力手段と
を備えていることを特徴としている。Further, the voice activation system (4) according to the present invention, in the voice activation system (3), does not use the predetermined value set by the first setting means but the predetermined value set in advance. Second setting means for resetting,
And a first input unit for operating the second setting unit.

【００１４】特定の使用者にとって最適となるように設
定した所定値では、他の使用者にとって不具合が生じる
といった問題が考えられるが、上記音声起動システム
（３）を採用することによって、前記問題は解決され
る。The predetermined value set so as to be optimal for a specific user may cause a problem for other users. However, by employing the voice activation system (3), the problem is solved. Will be resolved.

【００１５】また、本発明に係る音声起動システム
（５）は、上記音声起動システム（１）〜（４）のいず
れかにおいて、所定値をより大きな値に設定し直す第３
の設定手段と、該第３の設定手段を稼働させるための第
２の入力手段とを備えていることを特徴としている。Further, the voice activation system (5) according to the present invention is the third of the above voice activation systems (1) to (4), wherein the predetermined value is reset to a larger value.
And second input means for operating the third setting means.

【００１６】上記音声起動システム（５）によれば、使
用者が前記第２の入力手段を用いて、所定値をより大き
な値に設定することができるので、音声キーワードを何
度発声しても音声制御モードに切り替わらないといった
場合に、大変有効である。According to the voice activation system (5), the user can set the predetermined value to a larger value by using the second input means. This is very effective when the mode is not switched to the voice control mode.

【００１７】また、本発明に係る音声起動システム
（６）は、上記音声起動システム（１）〜（５）のいず
れかにおいて、音声キーワードとの類似値が所定値以下
である入力音声の入力前後の所定時間が無音状態であっ
たか否かを判断する第１の無音状態判断手段を備えると
共に、前記所定時間が無音状態でなかったならば、前記
入力音声が前記音声キーワードでないと判断するよう
に、前記キーワード判断手段が構成されていることを特
徴としている。Further, according to the voice activation system (6) of the present invention, in any one of the voice activation systems (1) to (5), before and after input of an input voice whose similarity value with the voice keyword is equal to or less than a predetermined value. A first silent state determining means for determining whether or not the predetermined time has been a silent state, and if the predetermined time has not been a silent state, determine that the input voice is not the voice keyword, It is characterized in that the keyword determining means is configured.

【００１８】上記音声起動システム（６）によれば、入
力音声と登録されている音声キーワードとの類似値が所
定値以下であっても、前記入力音声における入力前後の
所定時間が無音状態でなければ、前記入力音声は通常の
会話中における音声等であるとみなして、音声制御モー
ドに切り替わらないようにすることができる。従って、
音声キーワード以外の発声音で誤って音声制御モードに
切り替わることをほとんどなくすことができる。According to the voice activation system (6), even if the similarity value between the input voice and the registered voice keyword is equal to or less than the predetermined value, the predetermined time before and after the input in the input voice must be silent. For example, the input voice can be regarded as voice during a normal conversation or the like, so that the input voice is not switched to the voice control mode. Therefore,
The switch to the voice control mode by mistake due to the utterance sound other than the voice keyword can be almost eliminated.

【００１９】また、本発明に係る音声起動システム
（７）は、上記音声起動システム（１）〜（６）のいず
れかにおいて、前記音声制御装置の起動後から引き続い
て所定時間、無音状態であったか否かを判断する第２の
無音状態判断手段を備えると共に、前記所定時間の間が
連続的に無音状態であれば、前記音声制御装置の起動を
解除するように制御する解除制御手段を備えていること
を特徴としている。Further, the voice activation system (7) according to the present invention is the voice activation system according to any one of the voice activation systems (1) to (6), wherein the voice activation device has been in a silence state for a predetermined time after the activation of the voice control device. A second silent state determining unit for determining whether or not the sound control device is activated if the predetermined time period is a continuous silent state; It is characterized by having.

【００２０】上記音声起動システム（７）によれば、前
記音声制御装置が起動したとしても、すなわち音声制御
モードに切り替わったとしても、切り替わった（起動）
後から前記所定時間の間が連続的に無音状態であれば、
前記起動は誤認によるものとみなして、前記音声制御装
置の起動を解除することができる。従って、音声キーワ
ード以外の発声音で誤って音声制御モードに切り替わっ
たとしても、自動的に通常の制御モードに戻すことがで
きる。According to the voice activation system (7), the voice control device is switched (activated) even when the voice control device is activated, that is, when the voice control mode is switched.
If the predetermined time period is continuously silent afterwards,
The activation of the voice control device can be canceled by assuming that the activation is caused by misidentification. Therefore, even if the voice control mode is erroneously switched to the voice control mode by an utterance other than the voice keyword, the control mode can be automatically returned to the normal control mode.

【００２１】また、本発明に係る音声起動システム
（８）は、上記音声起動システム（１）〜（７）のいず
れかにおいて、入力された音声と通話先からの音声とを
比較し、同一と判断すれば、前記入力音声を前記キーワ
ード判断手段に出力しないようにする比較手段を備えて
いることを特徴としている。Further, the voice activation system (8) according to the present invention, in any one of the above-mentioned voice activation systems (1) to (7), compares the input voice with the voice from the called party and determines that they are the same. It is characterized in that a comparison means is provided for preventing the input voice from being output to the keyword judgment means when the judgment is made.

【００２２】上記音声起動システム（８）によれば、通
話先からの音声によって誤って音声制御モードに切り替
わることをなくすことができる。According to the voice activation system (8), it is possible to prevent the voice control mode from being erroneously switched to the voice control mode due to voice from the other party.

【００２３】[0023]

【発明の実施の形態】以下、本発明に係る音声起動シス
テムの実施の形態を図面に基づいて説明する。図１は、
実施の形態に係る音声起動システム（１）の要部を概略
的に示したブロック図である。図中１はアンテナを示
し、アンテナ１は電話本体部２に接続されている。ま
た、スピーカ４、及び車載用ハンドフリー電話機と音声
制御装置９とで共有されるマイク５はアンプ３を介し
て、電話本体部２と音声認識手段６とにそれぞれ接続さ
れている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a voice activation system according to the present invention will be described below with reference to the drawings. FIG.
FIG. 2 is a block diagram schematically showing a main part of a voice activation system (1) according to the embodiment. In the figure, reference numeral 1 denotes an antenna, and the antenna 1 is connected to a telephone main body 2. The speaker 4 and the microphone 5 shared by the in-vehicle hands-free telephone and the voice control device 9 are connected to the telephone body 2 and the voice recognition means 6 via the amplifier 3.

【００２４】キーワード判断手段７を含んで構成された
音声認識手段６は、音声制御装置９の起動制御を行なう
起動制御手段８に接続され、起動制御手段８は音声制御
装置９に接続されている。The speech recognition means 6 including the keyword judgment means 7 is connected to the activation control means 8 for controlling the activation of the speech control device 9, and the activation control means 8 is connected to the speech control device 9. .

【００２５】実施の形態に係る音声起動システム（１）
におけるキーワード判断手段７の動作を、図２に示した
フローチャートに基づいて説明する。まず、ステップ１
において、マイク５より入力された音声を取り込み、次
にステップ２において、入力音声と登録されている音声
キーワードとの類似値ｄを計測し、そしてステップ３に
移る。Voice activation system (1) according to the embodiment
The operation of the keyword judging means 7 will be described based on the flowchart shown in FIG. First, step 1
At step 2, the voice input from the microphone 5 is fetched, and then at step 2, the similarity value d between the input voice and the registered voice keyword is measured.

【００２６】類似値ｄの計測方法としては、音声が入力
されると、入力された音声パターン（以下、入力パター
ンと記す）と予め記憶させておいた音素標準パターンと
のパターン間距離（類似度）をＤＰマッチング法等のパ
ターンマッチングによって計測し、入力パターンが記憶
されている標準パターンのうちのどれに属するかを決定
し、そのときのパターン間距離を類似値ｄとする方法等
がある。また、パターン間距離としてはユークリッド距
離等が挙げられる。As a method of measuring the similarity value d, when a voice is input, the inter-pattern distance (similarity degree) between the input voice pattern (hereinafter referred to as an input pattern) and a phoneme standard pattern stored in advance. ) Is measured by pattern matching such as the DP matching method, the input pattern is determined to which of the stored standard patterns, and the inter-pattern distance at that time is set as a similarity value d. The Euclidean distance or the like is used as the distance between patterns.

【００２７】ステップ３では、類似値ｄが所定値ｔ以下
であるか否かを判断し、所定値ｔ以下であると判断すれ
ば、ステップ４に移って、音声制御装置９を起動させる
ように起動制御手段８に信号を出力し、一方、所定値ｔ
以下でないと判断すれば、ステップ１に戻る。In step 3, it is determined whether or not the similarity value d is equal to or less than a predetermined value t. If it is determined that the similarity value d is equal to or less than the predetermined value t, the process proceeds to step 4, where the voice control device 9 is activated. A signal is output to the activation control means 8, while a predetermined value t
If it is not, the procedure returns to step 1.

【００２８】上記実施の形態に係る音声起動システム
（１）によれば、入力音声と登録されている音声キーワ
ードとの類似度が低ければ、音声制御装置９を起動させ
ないので、周囲の物音や、話し声等で誤って音声制御装
置９が起動する可能性、すなわち音声制御モードへ切り
替わる可能性を低減させることができる。従って、従来
では通話中においては誤認する可能性の高かった、車載
用ハンズフリー電話機と音声制御装置９とでマイク５を
共有させたシステムであっても、その誤認の可能性を大
幅に低減させることができる。According to the voice activation system (1) according to the above-described embodiment, if the similarity between the input voice and the registered voice keyword is low, the voice control device 9 is not activated. It is possible to reduce the possibility that the voice control device 9 is erroneously activated by a voice or the like, that is, the possibility of switching to the voice control mode. Therefore, even in a system in which the microphone 5 is shared between the in-vehicle hands-free telephone and the voice control device 9 which has conventionally been highly likely to be erroneously recognized during a call, the possibility of erroneous recognition is greatly reduced. be able to.

【００２９】図３は、実施の形態に係る音声起動システ
ム（２）の要部を概略的に示したブロック図である。こ
こでは、図１に示した音声起動システムと同様の構成に
ついては、その説明を省略する。FIG. 3 is a block diagram schematically showing a main part of the voice activation system (2) according to the embodiment. Here, the description of the same configuration as the voice activation system shown in FIG. 1 is omitted.

【００３０】キーワード判断手段７には、所定値ｔを設
定し直す第１の設定手段１０、第２の設定手段１１、及
び第３の設定手段１３が接続され、また第２の設定手段
１１、第３の設定手段１３にはそれぞれ第１の入力手段
１２、第２の入力手段１４が接続されている。A first setting means 10, a second setting means 11, and a third setting means 13 for resetting the predetermined value t are connected to the keyword judgment means 7, and the second setting means 11, The first input means 12 and the second input means 14 are connected to the third setting means 13, respectively.

【００３１】実施の形態に係る音声起動システム（２）
における第１の設定手段１０の動作を、図４に示したフ
ローチャートに基づいて説明する。まず、ステップ１１
において、初期設定として最大類似値ｄ_MAX を０とし、
カウンタｎを０とする。次にステップ１２において、キ
ーワード判断手段７によって算出された類似値ｄ（図２
のステップ２参照）を取り込み、そしてステップ１３に
進む。Voice activation system (2) according to the embodiment
The operation of the first setting means 10 will be described based on the flowchart shown in FIG. First, step 11
In the initial setting, the maximum similarity value d _MAX is set to 0,
The counter n is set to 0. Next, at step 12, the similarity value d (FIG.
Step 2), and the process proceeds to step 13.

【００３２】ステップ１３では、類似値ｄが最大類似値
ｄ_MAX よりも大であるか否かを判断し、大であると判断
すれば、ステップ１４に移り、最大類似値ｄ_MAX を類似
値ｄとして、ステップ１５に進む。一方、大でないと判
断すれば、ステップ１４を飛ばしてそのままステップ１
５に進む。[0032] At step 13, the similarity value d is determined whether is larger than the maximum similarity value d _MAX, if it is determined that a large, moves to step 14, the similarity value d of the maximum similarity value d _MAX And go to step 15. On the other hand, if it is determined that it is not large, step 14 is skipped and step 1 is left as it is.
Go to 5.

【００３３】ステップ１５では、カウンタｎに１を加え
て、ステップ１６に進み、ステップ１６では、カウンタ
ｎが所定回数Ｎ以上であるか否かを判断する。所定回数
Ｎ以上であると判断すれば、ステップ１７に移り、最大
類似値ｄ_MAX に所定幅αを加えたものを値ｔ₁ として、
ステップ１８に進む。一方、所定回数Ｎ未満であると判
断すれば、ステップ１２に戻る。ステップ１８では、キ
ーワード判断手段７に設定されている所定値ｔを値ｔ₁
にして、ステップ１２に戻る。In step 15, 1 is added to the counter n, and the process proceeds to step 16. In step 16, it is determined whether or not the counter n is equal to or more than a predetermined number N. If it is determined that the number is equal to or more than the predetermined number N, the process proceeds to step 17, and a value obtained by adding a predetermined width α to the maximum similarity value d _MAX is set as a value t ₁ .
Proceed to step 18. On the other hand, if it is determined that the number is less than the predetermined number N, the process returns to step S12. In step 18, the predetermined value t set in the keyword determination means 7 is changed to a value t ₁
Then, the process returns to step 12.

【００３４】次に、第２の設定手段１１を説明する。第
２の設定手段１１は、第１の設定手段１０により設定さ
れた所定値ｔ₁ を、予め設定しておいた所定値ｔ₀ に設
定し直すものであり、第１の入力手段１２からの信号に
基づいて稼働するように構成されている。Next, the second setting means 11 will be described. The second setting means 11 resets the predetermined value t ₁ set by the first setting means 10 to a predetermined value t ₀ set in advance. It is configured to operate based on a signal.

【００３５】続いて、第３の設定手段１３を説明する。
第３の設定手段１３は、所定距離ｔをより大きな値に設
定するものであり、例えば、所定距離ｔ₀ を１．５倍し
たものを所定距離ｔとするものである。また、第２の入
力手段１４からの信号に基づいて稼働するように構成さ
れている。Next, the third setting means 13 will be described.
The third setting means 13 sets the predetermined distance t to a larger value, and for example, sets the predetermined distance t to 1.5 times the predetermined distance t ₀ . Further, it is configured to operate based on a signal from the second input means 14.

【００３６】上記実施の形態に係る音声起動システム
（２）によれば、過去の判断実績（過去の最大類似値ｄ
_MAX ）に基づいて、所定値ｔを設定するので、より一
層、誤認の可能性を低減させることができる。例えば、
入力音声が音声キーワードであると判断された場合の類
似値が４０、２０、３０、５０、２０、１０である場
合、過去の実績から入力音声が音声キーワードであると
判断する類似値は５０（最大類似値ｄ_MAX ）以下で良い
ことが分かる。このときに設定されている所定値ｔ₀ が
２００であれば５０〜２００は不要であり、むしろ誤認
の可能性を高めてしまう。そこで、誤認の可能性を低減
させるためには所定値ｔ₀ を最大類似値ｄ_MAX に基づい
て、設定し直すのが効果的である。例えば、所定値ｔを
７０｛＝５０（最大類似値ｄ_MAX ）＋２０（幅α）｝に
設定する。これにより、特定の使用者に対して、通常の
会話音や周囲の物音等で誤って音声制御モードに切り替
わることをほとんどなくすことができる。According to the voice activation system (2) according to the above embodiment, the past judgment result (the past maximum similarity value d)
_MAX ), the predetermined value t is set, so that the possibility of erroneous recognition can be further reduced. For example,
When the similarity value when the input voice is determined to be the voice keyword is 40, 20, 30, 50, 20, and 10, the similarity value for determining that the input voice is the voice keyword from the past results is 50 ( It can be seen that a value less than the maximum similarity value d _MAX ) is good. If the predetermined value t ₀ set at this time is 200, 50 to 200 are unnecessary, and the possibility of erroneous recognition is rather increased. Therefore, in order to reduce the possibility of erroneous recognition, it is effective to reset the predetermined value t ₀ based on the maximum similarity value d _MAX . For example, the predetermined value t is set to 70 {= 50 (maximum similar value d _MAX ) +20 (width α)}. As a result, it is possible to almost completely prevent a specific user from erroneously switching to the voice control mode due to a normal conversation sound, a surrounding noise, or the like.

【００３７】また、特定の使用者にとって最適となるよ
うに設定した所定値ｔでは、他の使用者にとって不具合
が生じるといった問題があるが、第２の設定手段１１を
稼働させることによって、前記問題は解決される。The predetermined value t set so as to be optimal for a specific user has a problem that another user may have a problem. However, the operation of the second setting means 11 causes the problem. Is resolved.

【００３８】さらに、使用者が第２の入力手段１４を用
いることによって、所定値ｔを大きく設定することがで
きるので、音声キーワードを何度発声しても、音声制御
装置９が起動しない場合に、すなわち音声制御モードに
切り替わらないといった場合に、大変有効である。Further, since the user can set the predetermined value t to a large value by using the second input means 14, even if the voice control device 9 is not activated no matter how many times the voice keyword is uttered. That is, it is very effective when the mode is not switched to the voice control mode.

【００３９】また、カウンタｎが所定回数Ｎ以上である
か否かを判断する処理（ステップ１５）を行なっている
のは、所定回数Ｎに満たない過去の実績からでは適切な
所定値ｔを設定することが困難であるからである。The processing for determining whether or not the counter n is equal to or more than the predetermined number N (step 15) is performed because an appropriate predetermined value t is set based on past results less than the predetermined number N. It is difficult to do so.

【００４０】図５は、実施の形態に係る音声起動システ
ム（３）の要部を概略的に示したブロック図である。こ
こでは、図１に示した音声起動システムと同様の構成に
ついては、その説明を省略する。FIG. 5 is a block diagram schematically showing a main part of the voice activation system (3) according to the embodiment. Here, the description of the same configuration as the voice activation system shown in FIG. 1 is omitted.

【００４１】第１の無音状態判断手段１５はキーワード
判断手段７に接続され、キーワード判断手段７にて判断
対象となる音声の入力前後における音声情報を取り込ん
で、前記音声の入力前後の所定時間が無音状態であった
か否かを判断し、その判断結果をキーワード判断手段７
に出力するように構成されている。The first silent state judging means 15 is connected to the keyword judging means 7 and fetches the speech information before and after the input of the sound to be judged by the keyword judging means 7 so that the predetermined time before and after the input of the sound is determined. It is determined whether or not there is a silent state, and the result of the determination is determined by the keyword determining means 7.
Is configured to be output.

【００４２】実施の形態に係る音声起動システム（３）
におけるキーワード判断手段７の動作を、図６に示した
フローチャートに基づいて説明する。Voice activation system (3) according to the embodiment
The operation of the keyword judging means 7 will be described with reference to the flowchart shown in FIG.

【００４３】まず、ステップ２１において、マイク５よ
り入力された音声を取り込み、次にステップ２２におい
て、入力音声と登録されている音声キーワードとの類似
値ｄを算出し、そしてステップ２３に移る。ステップ２
３では、類似値ｄが所定値ｔ以下であるか否かを判断
し、所定値ｔ以下であると判断すれば、ステップ２４に
移り、一方、所定値ｔ以下でないと判断すれば、ステッ
プ２１に戻る。First, in step 21, a voice input from the microphone 5 is fetched, and then in step 22, a similarity value d between the input voice and a registered voice keyword is calculated. Step 2
In 3, it is determined whether or not the similarity value d is equal to or less than a predetermined value t. If it is determined that the similarity value d is equal to or less than the predetermined value t, the process proceeds to step 24. Return to

【００４４】ステップ２４では、第１の無音状態判断手
段１５における判断結果を取り込み、そしてステップ２
５に進み、取り込んだ判断結果が前記入力音声の入力前
後が無音状態であったことを示すものであれば、ステッ
プ２６に移り、音声制御装置９を起動させるように起動
制御手段８に信号を出力し、一方、無音状態を示すもの
でなければ、ステップ２１に戻る。In step 24, the result of the judgment made by the first silent state judging means 15 is fetched, and in step 2
If the result of the determination indicates that there is no sound before and after the input of the input voice, the process proceeds to step 26 and a signal is sent to the activation control means 8 so as to activate the voice control device 9. If it does not indicate a silent state, the process returns to step 21.

【００４５】上記実施の形態に係る音声起動システム
（３）によれば、入力音声と登録されている音声キーワ
ードとの類似値ｄが所定値ｔ以下であっても、前記入力
音声における入力前後の所定時間が無音状態でなけれ
ば、前記入力音声は通常の会話中における音声等である
とみなして、音声制御装置９を起動させないように、す
なわち音声制御モードに切り替わらないようにすること
ができる。従って、音声キーワード以外の発声音で誤っ
て音声制御モードに切り替わることをほとんどなくすこ
とができる。According to the voice activation system (3) according to the above-described embodiment, even if the similarity value d between the input voice and the registered voice keyword is equal to or less than the predetermined value t, the input voice before and after the input in the voice is not input. If the predetermined time is not a silent state, the input voice can be regarded as a voice during a normal conversation, and the voice control device 9 can be prevented from being activated, that is, not switched to the voice control mode. Therefore, it is possible to almost completely prevent the user from switching to the voice control mode by mistake due to the utterance sound other than the voice keyword.

【００４６】図７は、実施の形態に係る音声起動システ
ム（４）の要部を概略的に示したブロック図である。こ
こでは、図１に示した音声起動システムと同様の構成に
ついては、その説明を省略する。図中１６は、第２の無
音状態判断手段を示しており、第２の無音状態判断手段
１６は音声認識手段６、及び起動制御手段８に接続され
ている。FIG. 7 is a block diagram schematically showing a main part of the voice activation system (4) according to the embodiment. Here, the description of the same configuration as the voice activation system shown in FIG. 1 is omitted. In the figure, reference numeral 16 denotes a second silence state judging means. The second silence state judging means 16 is connected to the voice recognition means 6 and the activation control means 8.

【００４７】実施の形態に係る音声起動システム（４）
における第２の無音状態判断手段１６の動作を、図８に
示したフローチャートに基づいて説明する。まず、ステ
ップ３１において、マイク５より入力された音声のう
ち、音声制御装置９の起動後から所定時間、音声情報を
取り込み、次にステップ３２において、前記所定時間の
間が無音状態であったか否かを判断する。無音状態であ
ったと判断すれば、ステップ３３に移って、音声制御装
置９の起動を解除するように起動制御手段８に信号を出
力し、一方、無音状態ではなかったと判断すれば、前記
動作は終了する。Voice activation system (4) according to the embodiment
The operation of the second silent state judging means 16 will be described with reference to the flowchart shown in FIG. First, in step 31, voice information is fetched for a predetermined time after activation of the voice control device 9 from among the voices input from the microphone 5, and then in step 32, it is determined whether or not there is a silent state during the predetermined time. Judge. If it is determined that there is no sound, the process proceeds to step 33, where a signal is output to the activation control means 8 so as to release the activation of the voice control device 9. On the other hand, if it is determined that the sound is not silent, the above operation is performed. finish.

【００４８】上記実施の形態に係る音声起動システム
（４）によれば、音声制御装置９が起動したとしても、
すなわち音声制御モードに切り替わったとしても、切り
替わった（起動）後から所定時間の間が連続的に無音状
態であれば、前記起動は誤認によるものとみなして、音
声制御装置９の起動を解除することができる。従って、
音声キーワード以外の発声音で誤って音声制御モードに
切り替わったとしても、自動的に通常の制御モードに戻
すことができる。According to the voice activation system (4) according to the above embodiment, even if the voice control device 9 is activated,
In other words, even if the mode is switched to the voice control mode, if there is a continuous silent state for a predetermined time after the switching (startup), the activation is regarded as being caused by a false recognition, and the activation of the voice control device 9 is released. be able to. Therefore,
Even if the mode is switched to the voice control mode by mistake due to a utterance other than the voice keyword, the mode can be automatically returned to the normal control mode.

【００４９】図９は、実施の形態に係る音声起動システ
ム（５）の要部を概略的に示したブロック図である。こ
こでは、図１に示した音声起動システムと同様の構成に
ついては、その説明を省略する。図中１７は、比較手段
を示している。電話本体部２、及びアンプ３は比較手段
１７に接続され、比較手段１７は音声認識手段６に接続
されている。FIG. 9 is a block diagram schematically showing a main part of the voice activation system (5) according to the embodiment. Here, the description of the same configuration as the voice activation system shown in FIG. 1 is omitted. In the figure, reference numeral 17 denotes a comparing means. The telephone main unit 2 and the amplifier 3 are connected to the comparing means 17, and the comparing means 17 is connected to the voice recognition means 6.

【００５０】比較手段１７は、通話先からの音声と、マ
イク５より入力された音声との相関係数を算出し、相関
係数が所定値以上であれば、マイク５より入力された音
声がスピーカ４より出力された通話先からの音声とみな
して、入力された音声を音声認識手段６に出力しないよ
うにするように構成されている。また、比較手段１７と
しては、エコーキャンセル装置等が挙げられる。The comparing means 17 calculates a correlation coefficient between the voice from the called party and the voice input from the microphone 5, and if the correlation coefficient is equal to or more than a predetermined value, the voice input from the microphone 5 is determined. It is configured such that the input voice is not output to the voice recognition unit 6 by regarding the voice output from the speaker 4 as the voice from the communication destination. In addition, examples of the comparing unit 17 include an echo canceling device.

【００５１】上記実施の形態に係る音声起動システム
（５）によれば、通話先からの音声によって誤って音声
制御モードに切り替わることをなくすことができる。According to the voice activation system (5) according to the above-described embodiment, it is possible to prevent the voice control mode from being erroneously switched to the voice control mode due to the voice from the called party.

[Brief description of the drawings]

【図１】本発明の実施の形態に係る音声起動システム
（１）の要部を概略的に示したブロック図である。FIG. 1 is a block diagram schematically showing a main part of a voice activation system (1) according to an embodiment of the present invention.

【図２】実施の形態に係る音声起動システム（１）にお
けるキーワード判断手段の動作を示したフローチャート
である。FIG. 2 is a flowchart showing an operation of a keyword determination unit in the voice activation system (1) according to the embodiment.

【図３】実施の形態に係る音声起動システム（２）の要
部を概略的に示したブロック図である。FIG. 3 is a block diagram schematically showing a main part of a voice activation system (2) according to the embodiment.

【図４】実施の形態に係る音声起動システム（２）にお
ける第１の設定手段の動作を示したフローチャートであ
る。FIG. 4 is a flowchart showing an operation of a first setting unit in the voice activation system (2) according to the embodiment.

【図５】実施の形態に係る音声起動システム（３）の要
部を概略的に示したブロック図である。FIG. 5 is a block diagram schematically showing a main part of a voice activation system (3) according to the embodiment.

【図６】実施の形態に係る音声起動システム（３）にお
けるキーワード判断手段の動作を示したフローチャート
である。FIG. 6 is a flowchart showing an operation of a keyword determination unit in the voice activation system (3) according to the embodiment.

【図７】実施の形態に係る音声起動システム（４）の要
部を概略的に示したブロック図である。FIG. 7 is a block diagram schematically showing a main part of a voice activation system (4) according to the embodiment.

【図８】実施の形態に係る音声起動システム（４）にお
ける第２の無音状態判断手段の動作を示したフローチャ
ートである。FIG. 8 is a flowchart showing an operation of a second silent state determination unit in the voice activation system (4) according to the embodiment.

【図９】実施の形態に係る音声起動システム（５）の要
部を概略的に示したブロック図である。FIG. 9 is a block diagram schematically showing a main part of a voice activation system (5) according to the embodiment.

[Explanation of symbols]

１アンテナ２電話本体部３アンプ４スピーカ５マイク６音声認識手段８起動制御手段 DESCRIPTION OF SYMBOLS 1 Antenna 2 Telephone main body part 3 Amplifier 4 Speaker 5 Microphone 6 Voice recognition means 8 Activation control means

Claims

[Claims]

1. A voice activation system comprising: keyword determination means for determining whether an input voice is a registered voice keyword; and performing startup control of a voice control device based on the determination result. The keyword determination unit is configured to measure a similarity value between the input voice and the voice keyword, and determine that the input voice is the voice keyword if the measured similarity value is equal to or less than a predetermined value. A voice activation system, characterized in that:

2. The method according to claim 1, wherein the similarity value is determined based on an inter-pattern distance measured by a pattern matching process from an input voice pattern and a phoneme standard pattern stored in advance. The voice activation system according to claim 1, wherein:

3. The apparatus according to claim 1, further comprising a first setting unit configured to set a predetermined value based on a past maximum similarity value when the input voice is determined to be a voice keyword. The voice activation system according to claim 1 or 2.

4. A second setting means for resetting a predetermined value, not a predetermined value set by said first setting means, to a predetermined value, and a second setting means for operating said second setting means. 4. The voice activation system according to claim 3, further comprising a first input unit.

5. A third method for resetting a predetermined value to a larger value.
And a second input means for operating the third setting means.
5. The voice activation system according to any one of Items 4 to 4.

6. A first silent state judging means for judging whether or not a predetermined period of time before and after the input of an input sound whose similarity value with the voice keyword is equal to or less than a predetermined value is not included. The voice according to any one of claims 1 to 5, wherein the keyword determining means is configured to determine that the input voice is not the voice keyword if the voice is not in a silent state. Activation system.

7. A second silent state judging means for judging whether or not a silent state has continued for a predetermined time after the activation of the voice control device, and a continuous silent state during the predetermined time period. The voice activation system according to any one of claims 1 to 6, further comprising a release control unit configured to release the activation of the voice control device, if any.

8. A comparison means for comparing the input voice with the voice from the called party and determining that the input voice is not output to the keyword determining means if the voice is determined to be the same. The voice activation system according to claim 1.