JPH09212192A

JPH09212192A - Speech recognition device

Info

Publication number: JPH09212192A
Application number: JP8045528A
Authority: JP
Inventors: Keiichi Miyamoto; 恵一宮本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1996-02-07
Filing date: 1996-02-07
Publication date: 1997-08-15
Anticipated expiration: 2016-02-07
Also published as: JP3446857B2

Abstract

PROBLEM TO BE SOLVED: To obtain necessary speech recognition precision and power consumption reduction by switching the kind of a speech recognition process according to recognition precision that an application requires and then performing the speech recognition process at a clock frequency corresponding to the throughput required by the actually used speech recognition process. SOLUTION: Under specified recognition conditions, a process control part 16 is placed in operation to places only necessary parts of a speech recognition process part 15 selectively in operation at the clock frequency with the value of a source voltage needed to secure the specified speech recognition precision, thereby recognizing a speech inputted to a microphone 7. At the same time, a motor control part 10 is placed in operation according to this process result to perform operation corresponding to the contents of the speech.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、必要な音声認識の
精度を確保しながら、音声認識処理に要する消費電力を
大幅に低減させるようにした音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus capable of significantly reducing power consumption required for voice recognition processing while ensuring necessary accuracy of voice recognition.

【０００２】[0002]

【従来の技術】音声認識を行なう音声認識装置は、音声
の特徴を抽出する特徴抽出部と、この特徴抽出部で得ら
れた特徴パターンがどの基準パターン（標準パターン）
と類似しているかを見出す類似度演算部と、この類似度
演算部で得られた各類似度に基づき、音声認識対象とな
っている音声がどの単語かを判定する判定部とを主要な
構成要素にしていることが多い。この場合、上記特徴抽
出部は、アナログ的には、バンドパスフィルタなどによ
って音声信号を構成する各周数成分の大きさなどの特徴
情報を抽出して、特徴パターンなどを生成する方法を使
用し、またデジタル的には、Ａ／Ｄコンバータによって
音声信号（アナログ信号）を音声データ（デジタル信
号）に変換した後、ＤＳＰ（デジタルシグナルプロセッ
サ）などによって上記音声データをデジタル的に処理し
て特徴パターンを生成する方法を使用することが多い。
また、類似度演算部や判定部は、各種のデータ処理を行
なうことができるマイクロプロセッサなどによって類似
演算のアルゴリズムを実行することにより、類似度の演
算処理、最も近い単語を選択する判定処理などを行なう
方法、または専用のハードウェアによるロジック回路に
よって類似度の演算処理、最も近い単語を選択する判定
処理などを行なう方法のいずれかの方法を使用すること
が多い。2. Description of the Related Art A voice recognition apparatus for recognizing a voice includes a feature extraction unit for extracting a feature of a voice and a reference pattern (standard pattern) which is a feature pattern obtained by the feature extraction unit.
And a determination unit that determines which word is the voice that is the voice recognition target based on each similarity obtained by the similarity calculation unit. Often used as an element. In this case, the feature extraction unit uses a method of generating feature patterns and the like by analogically extracting feature information such as the size of each frequency component forming an audio signal by a bandpass filter or the like. Also, digitally, after the voice signal (analog signal) is converted into voice data (digital signal) by an A / D converter, the voice data is digitally processed by a DSP (digital signal processor), etc. Often the method of generating is used.
In addition, the similarity calculation unit and the determination unit perform a similarity calculation algorithm and a determination process for selecting the closest word by executing a similarity calculation algorithm by a microprocessor capable of performing various data processing. In many cases, either the method of performing the calculation or the method of calculating the degree of similarity by a logic circuit using dedicated hardware, the determination processing of selecting the closest word, or the like is used.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上述し
た従来の音声認識装置を使用した機器、特に音声認識を
利用した機器の中でも、携帯性を重視したものや、玩具
など、電池駆動を前提としたものでは、音声認識に係わ
る部分の消費電力が大きく、これを常時、動作させてい
ると、すぐに電池が無くなってしまうという問題があっ
た。そこで、このような問題を解決する方法として、従
来、特開昭６２−２４５２９６号公報、特開平３−２０
２８９９号公報に示す技術や特開昭５８−５５９９１号
公報に示す技術が提案されている。これらの各技術のう
ち、特開昭６２−２４５２９６号公報、特開平３−２０
２８９９号公報に示す技術では、音声が入力されている
かどうかを検出し、予め設定されている一定期間以上、
音声が入力されていないことが検出されたとき、回路各
部に対する電源供給を停止して、無駄な電力消費を無く
す。また、特開平３−２０２８９９号公報に示す技術で
は、予め設定されている一定期間以上、音声が入力され
ていないことが検出されたとき、回路各部に対する電源
供給を停止して、無駄な電力消費を無くのみならず、音
声検出部を常時、動作させて、音声の入力が検出された
とき、特徴抽出部などの音声認識部に対する電源供給を
開始して、音声認識処理を再開させることにより、無駄
な電力消費を無くしながら、音声が入力されたとき、音
声認識部をすぐに動作させて、上記音声を認識させる。However, among the devices using the above-mentioned conventional voice recognition device, especially the devices using voice recognition, those which emphasize portability, toys, etc. are assumed to be battery-driven. However, there is a problem in that the power consumption of the part related to the voice recognition is large, and if the part is constantly operated, the battery will be exhausted immediately. Therefore, as a method for solving such a problem, there have been heretofore known Japanese Patent Laid-Open Nos. 62-245296 and 3-20.
A technique disclosed in Japanese Patent No. 2899 and a technique disclosed in Japanese Patent Laid-Open No. 58-55991 have been proposed. Among these techniques, JP-A-62-245296 and JP-A-3-20
In the technology disclosed in Japanese Patent No. 2899, it is detected whether or not voice is input, and a predetermined period of time is exceeded,
When it is detected that no voice is input, the power supply to each part of the circuit is stopped to eliminate unnecessary power consumption. Further, in the technique disclosed in Japanese Patent Application Laid-Open No. 3-202899, when it is detected that no voice is input for a predetermined period of time or longer, the power supply to each circuit unit is stopped to waste power consumption. In addition to the above, by constantly operating the voice detection unit, when voice input is detected, power supply to the voice recognition unit such as the feature extraction unit is started, and the voice recognition process is restarted. When a voice is input, the voice recognition unit is immediately operated to recognize the voice while eliminating unnecessary power consumption.

【０００４】しかしながら、これら特開昭６２−２４５
２９６号公報、特開平３−２０２８９９号公報に示す技
術や特開昭５８−５５９９１号公報に示す技術では、次
に述べるような問題があった。すなわち、特開昭６２−
２４５２９６号公報や特開平３−２０２８９９号公報に
示す技術では、音声が一定時間以上、入力されずに、電
源が切られると、音声認識動作をさせたいとき、その都
度、電源スイッチなどを操作しなければならず、操作が
煩雑になってしまうという問題があった。また、特開昭
５８−５５９９１号公報に示す技術では、一見、このよ
うな問題が無くなっているように見えるが、実際には、
非常に簡単な積分回路によって、低消費電力の音声検出
部を構成していることから、雑音が入力されたとき、こ
れによつて音声検出部が誤動作してしまうという問題が
ある。さらに、音声が入力されてから、音声検出部によ
って音声の入力が検知され、音声認識部で音声の認識処
理が開始されるまで、ある程度の時間がかかってしまう
ことから、音声認識部が音声認識処理を開始したとき、
音声の始端が欠落してしまい、音声認識精度が大幅に低
下してしまうという問題がある。そこで、ＢＢＤ（バケ
ッドブリゲード形素子）などのアナログ遅延素子を音声
認識部の前段側に挿入し、音声検出部によって音声の入
力を検出した後、音声認識部に音声信号公報が入力され
るようにすることにより、上述した問題を解決すること
も考えられるが、このような方法では、もはや低消費電
力を目指した構成とは言えなくなる。However, these JP-A-62-245
The techniques disclosed in Japanese Patent Application Laid-Open No. 296, Heisei 3-202899 and Japanese Patent Application Laid-Open No. 58-55991 have the following problems. That is, JP-A-62-2
In the technology disclosed in Japanese Patent No. 245296 and Japanese Patent Application Laid-Open No. 3-202899, when the power is turned off without inputting voice for a certain time or longer, the power switch is operated each time the voice recognition operation is desired. However, there is a problem that the operation becomes complicated. Further, in the technique disclosed in Japanese Patent Laid-Open No. 58-55991, such a problem seems to have disappeared at first glance, but in reality,
Since the voice detector with low power consumption is configured by a very simple integrating circuit, there is a problem that when noise is input, the voice detector malfunctions due to this. Further, since it takes some time from the input of voice to the detection of voice input by the voice detection unit and the start of the voice recognition processing by the voice recognition unit, the voice recognition unit recognizes the voice. When the process started,
There is a problem in that the start end of the voice is lost and the voice recognition accuracy is significantly reduced. Therefore, an analog delay element such as a BBD (backed brigade type element) is inserted in the front side of the voice recognition unit, and after the voice detection unit detects the voice input, the voice signal bulletin is input to the voice recognition unit. Although it is possible to solve the above-mentioned problem by doing so, such a method can no longer be said to be a configuration aiming at low power consumption.

【０００５】さらに、このような従来の技術を実際の機
器に適用すると、次に述べるような問題があった。すな
わち、従来の技術を玩具などのアプリケーションに使用
する場合、音声認識部の電源が切れている場合でも、何
らかの方法を使用し、なるべく音声で動作させた方が望
ましいことが多いが、上述した特開昭６２−２４５２９
６号公報や特開平３−２０２８９９号公報に示す技術で
は、これに対応することができない。さらに、上述した
特開昭５８−５５９９１号公報に示す技術では、低消費
電力で音声認識部を動作させているとき、通常の消費電
力で音声認識部を動作させているときと同じ精度の高い
認識精度を確保することができない。また、従来の技術
を実際の機器などのアプリケーションに使用する場合、
常に同じ様な認識結果が必要とは限らないことがある。
例えば、クイズの答え（まる、ばつなど）を音声認識で
回答させる場合、非常に高い認識率が要求されるため、
このような要求に耐え得るだけの、認識特徴情報の抽出
やマッチングアルゴリズムの計算が必要になる。またあ
るときには、認識結果に偶然性を取り入れた方が好まし
いときがあり、このようなときには、比較的簡易な特徴
情報やアルゴリズムで十分である。さらに、入力音声の
量だけを判定して、「元気」、「意気消沈」などと判定
するものであれば、もっと少ない処理量ですむ。このよ
うに、音声認識装置を搭載する機器によって、必要とさ
れる認識結果の精度が異なるが、必要とされる認識精度
に応じて、動的に動作を切り換える音声認識装置は未
だ、開発されていない。Further, when such a conventional technique is applied to an actual device, there are the following problems. In other words, when the conventional technology is used for applications such as toys, it is often desirable to use some method and operate by voice as much as possible even when the power of the voice recognition unit is turned off. Kaisho 62-24529
The technology disclosed in Japanese Laid-Open Patent Publication No. 6-62 and Japanese Laid-Open Patent Publication No. 3-202899 cannot handle this. Further, in the technique disclosed in Japanese Patent Laid-Open No. 58-55991 described above, when the voice recognition unit is operated with low power consumption, the accuracy is the same as when the voice recognition unit is operated with normal power consumption. The recognition accuracy cannot be secured. Also, when using conventional technology for applications such as actual equipment,
The same recognition result may not always be necessary.
For example, when answering a quiz answer (maru, bad, etc.) by voice recognition, a very high recognition rate is required, so
It is necessary to extract the recognition feature information and calculate the matching algorithm to withstand such a request. At other times, it is sometimes preferable to incorporate a contingency in the recognition result, and in such a case, relatively simple feature information and algorithms are sufficient. Furthermore, if only the amount of input voice is judged and it is judged to be "energetic", "depressed", etc., a smaller processing amount is required. As described above, the required accuracy of the recognition result varies depending on the device equipped with the voice recognition device, but a voice recognition device that dynamically switches the operation according to the required recognition accuracy has not been developed yet. Absent.

【０００６】本発明は上記の事情に鑑み、請求項１で
は、アプリケーションが要求する認識精度に応じて、音
声認識処理の種類を切り換え、これによって実際に使用
される音声認識処理で必要な処理量に応じたクロック周
波数で音声認識処理を行なわせて、必要な音声認識精度
と、低消費電力化とを達成することができる音声認識装
置を提供することを目的としている。請求項２では、ア
プリケーションが要求する認識精度に応じて、音声認識
処理の種類を切り換え、これによって実際に使用される
音声認識処理で必要な処理量に応じたクロック周波数、
電源電圧で音声認識処理を行なわせて、必要な音声認識
精度と、低消費電力化とを達成することができる音声認
識装置を提供することを目的としている。請求項３で
は、アプリケーションが要求する認識精度に応じて、音
声認識処理の種類を切り換え、これによって実際に使用
される音声認識処理で必要な処理量に応じたクロック周
波数、電源電圧で音声認識処理を行なわせるとともに、
認識処理動作に関与しない部分に対する給電を停止させ
て、必要な音声認識精度と、低消費電力化とを達成する
ことができる音声認識装置を提供することを目的として
いる。In view of the above situation, according to the present invention, in claim 1, the type of voice recognition processing is switched according to the recognition accuracy required by the application, and the processing amount required for the voice recognition processing actually used is thereby changed. It is an object of the present invention to provide a voice recognition device capable of performing necessary voice recognition accuracy and low power consumption by performing voice recognition processing at a clock frequency according to the above. According to the second aspect, the type of the voice recognition process is switched according to the recognition accuracy required by the application, and the clock frequency according to the processing amount required for the voice recognition process actually used,
It is an object of the present invention to provide a voice recognition device that can achieve required voice recognition accuracy and low power consumption by performing voice recognition processing with a power supply voltage. In claim 3, the type of the voice recognition processing is switched according to the recognition accuracy required by the application, whereby the voice recognition processing is performed with the clock frequency and the power supply voltage according to the processing amount required for the voice recognition processing actually used. As well as
An object of the present invention is to provide a voice recognition device capable of achieving required voice recognition accuracy and low power consumption by stopping power supply to a portion not involved in the recognition processing operation.

【０００７】[0007]

【課題を解決するための手段】上記の目的を達成するた
めに本発明による音声認識装置は、請求項１では、入力
された音声を音声信号公報に変換するマイクロホンと、
このマイクロホンから出力される音声信号公報の特徴情
報を相互に異なる抽出方法で抽出する第１〜第Ｎ特徴抽
出部と、これら第１〜第Ｎ特徴抽出部に対して、クロッ
ク信号公報を供給して、指定された速度で動作させるク
ロック制御部と、音声認識の精度に応じて、特徴量を求
め、この特徴量に応じて、上記クロック制御部から出力
されるクロック信号公報の周波数を変化させる特徴量指
定部とを備えたことを特徴としている。また、請求項２
では、請求項１に記載の音声認識装置において、上記第
１〜第Ｎ特徴抽出部に対して、指定された電圧値を持つ
電源電圧を供給する電圧制御部を備え、上記特徴量指定
部で得られた特徴量に応じて、上記電圧制御部から出力
される電源電圧の電圧値を変化させることを特徴として
いる。また、請求項３では、請求項１または２に記載の
音声認識装置において、上記クロック制御部および上記
電圧制御部は、上記第１〜第Ｎ特徴抽出部のうち、上記
特徴量指定部で得られた特徴量に対応し、特徴情報の抽
出に関与しない部分に対するクロック信号公報の供給、
電源電圧の供給を選択的に停止することを特徴としてい
る。In order to achieve the above object, a voice recognition apparatus according to the present invention comprises, in claim 1, a microphone for converting an input voice into a voice signal publication.
The first to Nth feature extraction units that extract the feature information of the audio signal publication output from this microphone by mutually different extraction methods, and the clock signal publication are supplied to these first to Nth feature extraction units. Then, a feature amount is obtained according to the clock control unit that operates at a specified speed and the accuracy of voice recognition, and the frequency of the clock signal publication output from the clock control unit is changed according to the feature amount. It is characterized by having a feature amount designation unit. Claim 2
Then, in the voice recognition device according to claim 1, a voltage control unit that supplies a power supply voltage having a designated voltage value to the first to Nth feature extraction units is provided, and the feature amount designation unit is provided. It is characterized in that the voltage value of the power supply voltage output from the voltage control unit is changed according to the obtained characteristic amount. Further, in claim 3, in the voice recognition device according to claim 1 or 2, the clock control unit and the voltage control unit are obtained by the feature amount designation unit of the first to Nth feature extraction units. Supply of the clock signal bulletin to the portion that does not participate in the extraction of the characteristic information, corresponding to the specified characteristic amount,
The feature is that the supply of the power supply voltage is selectively stopped.

【０００８】上記の構成により、請求項１の音声認識装
置では、音声認識の精度に応じて、特徴量指定部によっ
て音声認識するのに必要な特徴量を求め、この特徴量に
応じて、クロック制御部から出力されるクロック信号公
報の周波数を変化させて、相互に異なる抽出方法で特徴
情報を抽出する第１〜第Ｎ特徴抽出部の動作速度を変化
させて、マイクロホンから出力される音声データの特徴
情報を抽出させることにより、アプリケーションにおい
て、実際に使用される音声認識処理で必要な処理量に応
じた動作速度で、音声認識処理を行なわせ、必要な音声
認識精度と、低消費電力化とを達成する。また、請求項
２では、請求項１に記載の音声認識装置において、上記
特徴量指定部で得られた特徴量に応じて、電圧制御部か
ら第１〜第Ｎ特徴抽出部に供給される電源電圧の電圧値
を変化させることにより、アプリケーションにおいて、
実際に使用される音声認識処理で必要な処理量に応じた
動作速度、電源電圧で、音声認識処理を行なわせ、必要
な音声認識精度と、低消費電力化とを達成する。また、
請求項３では、請求項１または２に記載の音声認識装置
において、上記クロック制御部および上記電圧制御部に
よって、上記第１〜第Ｎ特徴抽出部のうち、上記特徴量
指定部で得られた特徴量に対応し、特徴情報の抽出に関
与しない部分に対するクロック信号公報の供給、電源電
圧の供給を選択的に停止することにより、アプリケーシ
ョンにおいて、実際に使用される音声認識処理で必要な
処理量に応じた動作速度、電源電圧で、特徴情報の抽出
に関与する部分のみを動作させて、音声認識処理を行な
わせ、必要な音声認識精度と、低消費電力化とを達成す
る。With the above arrangement, in the voice recognition device according to the first aspect, the feature quantity necessary for voice recognition is obtained by the feature quantity designating section according to the accuracy of the voice recognition, and the clock is obtained according to the feature quantity. Voice data output from the microphone by changing the frequency of the clock signal publication output from the control unit and changing the operation speed of the first to Nth feature extraction units that extract feature information by mutually different extraction methods. By extracting the feature information of the voice recognition processing, the application performs the voice recognition processing at the operation speed according to the processing amount required for the voice recognition processing actually used, thereby reducing the necessary voice recognition accuracy and reducing the power consumption. And achieve. According to a second aspect, in the voice recognition device according to the first aspect, the power supplied from the voltage control section to the first to Nth feature extraction sections in accordance with the feature quantity obtained by the feature quantity designation section. In the application by changing the voltage value of the voltage,
The voice recognition process is performed at an operating speed and a power supply voltage according to the amount of processing required in the voice recognition process that is actually used, and the required voice recognition accuracy and low power consumption are achieved. Also,
According to a third aspect of the present invention, in the voice recognition device according to the first or second aspect, the clock control section and the voltage control section obtain the feature amount designation section of the first to Nth feature extraction sections. The amount of processing required for the voice recognition process actually used in the application by selectively stopping the supply of the clock signal bulletin and the supply of the power supply voltage to the portion that does not participate in the extraction of the characteristic information, corresponding to the characteristic amount. With the operation speed and the power supply voltage according to the above, only the part related to the extraction of the characteristic information is operated to perform the voice recognition process, and the required voice recognition accuracy and the low power consumption are achieved.

【０００９】[0009]

【発明の実施の形態例】以下、本発明を図面に示した形
態例に基づいて詳細に説明する。図１は本発明による音
声認識装置の一形態例を使用したロボット玩具の一例を
示す正面図である。この図に示すロボット玩具１は、円
柱状に形成される胴体部２と、この胴体部２の下部に可
動自在に取り付けられる２本の脚部３と、上記胴体部２
の上側部に可動自在に取り付けられる２本の腕部４と、
上記胴体部２の上部に固定される首部５と、この首部５
上に固定される頭部６と、この頭部６の側部に固定され
るマイクロホン７と、上記胴体部２内に内蔵される電池
８と、上記胴体部２内に内蔵され、上記電池８によって
得られる電力により、上記マイクロホン７に入力された
音声を認識処理する音声認識装置９と、上記胴体部２内
に内蔵され、上記音声認識装置９から出力される制御信
号公報（駆動信号公報）に応じた駆動電圧を生成するモ
ータ制御部１０と、上記胴体部２内に内蔵され、上記モ
ータ制御部１０から出力される駆動電圧に応じた駆動力
を発生する複数のモータ１１と、上記胴体部２内のう
ち、上記各脚部３、各腕部４の付け根部分に内蔵され、
上記各モータ１１によって得られた駆動力により、脚部
３を動かしたり、腕部４を動かしたりする複数のアクチ
ュエータ１２とを備えている。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will now be described in detail with reference to the embodiments shown in the drawings. FIG. 1 is a front view showing an example of a robot toy using an example of the form of the voice recognition device according to the present invention. A robot toy 1 shown in this figure has a body portion 2 formed in a cylindrical shape, two legs 3 movably attached to a lower portion of the body portion 2, and the body portion 2 described above.
Two arm parts 4 movably attached to the upper part of
A neck portion 5 fixed to the upper portion of the body portion 2, and the neck portion 5
A head 6 fixed on the top, a microphone 7 fixed to a side portion of the head 6, a battery 8 housed in the body 2, and a battery 8 housed in the body 2. A voice recognition device 9 for recognizing the voice input to the microphone 7 by the electric power obtained by the power source, and a control signal gazette (drive signal gazette) incorporated in the body 2 and output from the voice recognition device 9. And a plurality of motors 11 that are built in the body portion 2 and that generate a driving force according to the drive voltage output from the motor control portion 10, and the body. Of the inside of the part 2, it is built in at the base of each of the legs 3 and the arms 4,
A plurality of actuators 12 for moving the leg portion 3 and the arm portion 4 by the driving force obtained by the motors 11 are provided.

【００１０】そして、ロボット玩具１の使用者が予め設
定されている単語のうちのいずれかを発声したとき、音
声認識処理に必要な部分のみに電源を供給ながら、上記
音声を音声認識して、この音声で指定された動作を行な
う。上記音声認識装置９は、図２に示す如く音声認識処
理部１５と、処理制御部１６とを備えており、指定され
た認識条件で、処理制御部１６を動作させて、指定され
た音声認識精度を確保するのに必要なクロック周波数、
電源電圧の値で、音声認識処理部１５の必要な部分のみ
を選択的に動作させて、マイクロホン７に入力された音
声を認識処理させるとともに、この処理結果に基づき、
上記モータ制御部１０を動作させて、上記音声の内容に
対応する動作を行なわせる。音声認識処理部１５は、音
声入力部１７と、音声区間検出部１８と、スペクトル抽
出部１９と、音量検出部２０と、類似度演算部２１と、
音声スペクトル辞書２２と、結果出力部２３とを備えて
おり、上記処理制御部１６から供給されるクロック信号
公報と、電源電圧とに基づき、マイクロホン７から出力
される音声信号公報をデジタル化して音声データにした
後、この音声データに基づき、音声区間情報を抽出しな
がら、スペクトルによる音声認識処理が指定されていれ
ば、上記音声区間情報に基づき、上記音声データのスペ
クトルを抽出して、音声スペクトル辞書２２を用いた類
似度演算処理により、上記音声データで示される単語を
特定し、また上記音声区間情報による音声認識処理が指
定されていれば、上記音声データの音声区間情報に基づ
き、上記音声データで示される単語を特定し、また音量
情報による音声認識処理が指定されていれば、上記音声
データの音量情報に基づき、上記音声データで示される
単語を特定した後、特定された単語に対応する駆動信号
公報を生成し、これを上記モータ制御部１０に供給す
る。Then, when the user of the robot toy 1 utters any one of the preset words, the voice is recognized by voice while the power is supplied only to the portion necessary for the voice recognition processing. Perform the operation specified by this voice. The speech recognition apparatus 9 includes a speech recognition processing section 15 and a processing control section 16 as shown in FIG. 2, and operates the processing control section 16 under designated recognition conditions to perform designated speech recognition. Clock frequency required to ensure accuracy,
Based on the value of the power supply voltage, only the necessary portion of the voice recognition processing unit 15 is selectively operated to recognize the voice input to the microphone 7, and based on the result of this processing,
The motor control unit 10 is operated to perform an operation corresponding to the content of the voice. The voice recognition processing unit 15 includes a voice input unit 17, a voice section detection unit 18, a spectrum extraction unit 19, a volume detection unit 20, a similarity calculation unit 21, and
The voice spectrum dictionary 22 and the result output unit 23 are provided, and the voice signal bulletin output from the microphone 7 is digitized based on the clock signal bulletin supplied from the processing control unit 16 and the power supply voltage. After converting the data into data, if the voice recognition processing by the spectrum is specified while extracting the voice section information based on the voice data, the spectrum of the voice data is extracted based on the voice section information, and the voice spectrum is extracted. If the word indicated by the voice data is specified by the similarity calculation process using the dictionary 22, and if the voice recognition process by the voice segment information is designated, the voice segment based on the voice segment information of the voice segment is used. If the word indicated by the data is specified and the voice recognition processing based on the volume information is specified, the volume information of the voice data above. Based After identifying the words represented by the speech data, it generates a drive signal publication corresponding to a word that has been identified, and supplies it to the motor control unit 10.

【００１１】上記音声入力部１７は、上記マイクロホン
７から出力される音声信号公報を増幅する増幅回路、こ
の増幅回路から出力される音声信号公報をフィルタリン
グしてノイズなどを取り除くフィルタ回路、このフィル
タ回路から出力される音声信号公報を音声データに変換
するＡ／Ｄ変換回路などを備えており、上記マイクロホ
ン７から出力される音声信号公報を取り込んで、これを
増幅するとともに、フィルタリングして、ノイズを取り
除いた後、Ａ／Ｄ変換して音声データにし、これを上記
音量検出部２０と、上記音声区間検出部１８と、スペク
トル抽出部１９とに供給する。音量検出部２０は、上記
音声入力部１７から出力される音声データの音量を検出
して、この音声データの音量を示す音量情報を生成し、
これを上記音声区間検出部１８と、結果出力部２３とに
供給する。The voice input section 17 includes an amplifier circuit for amplifying the voice signal publication output from the microphone 7, a filter circuit for filtering the voice signal publication output from the amplifier circuit to remove noises, and the like. It is equipped with an A / D conversion circuit for converting the audio signal publication output from the device into audio data, and takes in the audio signal publication output from the microphone 7, amplifies this, and filters it to remove noise. After the removal, it is A / D converted into voice data, which is supplied to the sound volume detection unit 20, the voice section detection unit 18, and the spectrum extraction unit 19. The sound volume detection unit 20 detects the sound volume of the sound data output from the sound input unit 17 and generates sound volume information indicating the sound volume of the sound data,
This is supplied to the voice section detection unit 18 and the result output unit 23.

【００１２】音声区間検出部１８は、上記音声入力部１
７の出力と、上記音量検出部２０の出力とに基づき、上
記マイクロホン７に音声が入力されたとき、これを検知
して音声検知情報を生成するとともに、この音声データ
中の無音区間、例えば音声と音声との境や「ストップ」
などの「スト」と「プ」との間にある「ッ」などの促音
などを検出して、これらの検出結果を無音区間情報と
し、これら音声検知情報、無音区間情報をスペクトル抽
出部１９と、結果出力部２３とに供給する。スペクトル
抽出部１９は、上記無音区間検出部１８から出力される
音声検知情報、無音区間情報に基づき、上記音声入力部
１７から出力される音声データを音声区間毎に分けて周
波数分析し、タイムスペクトラムパターン、ＬＰＣケプ
ストラム係数、零クロス数、パワー値などの特徴情報を
抽出した後、これらの特徴情報に基づき、上記音声デー
タの特徴を示す特徴パターンを生成し、これを類似度演
算部２１と、結果出力部２３とに供給する。The voice section detection unit 18 is the voice input unit 1 described above.
When a voice is input to the microphone 7 based on the output of 7 and the output of the volume detector 20, the voice detection information is generated by detecting the voice and the silent section in the voice data, for example, voice. And the boundary between voice and "stop"
Such as “t” between “st” and “p” is detected, and these detection results are used as silent section information, and the voice detection information and the silent section information are provided to the spectrum extraction unit 19 , And the result output unit 23. The spectrum extracting unit 19 divides the voice data output from the voice input unit 17 into voice segments for frequency analysis based on the voice detection information and the silence period information output from the silence period detecting unit 18, and analyzes the time spectrum. After extracting the characteristic information such as the pattern, the LPC cepstrum coefficient, the number of zero crosses, and the power value, the characteristic pattern indicating the characteristic of the audio data is generated based on the characteristic information, and the characteristic pattern is generated by the similarity calculation unit 21. And the result output unit 23.

【００１３】また、音声スペクトル辞書２２は、認識対
象となる単語、例えば「めざめよ」、「バスター」、
「進め」、「やっつけろ」などの特徴を示す標準パター
ンが格納されており、上記類似度演算部２１から読み出
し指示が出されたとき、指定された単語の標準パターン
を読み出し、これを上記類似度演算部２１に供給する。
類似度演算部２１は、上記スペクトル抽出部１９から上
記音声データの特徴を示す特徴パターンが出力されたと
き、この特徴パターンと、上記音声スペクトル辞書２２
に格納されている各標準パターンとをパターンマッチン
グ処理して、これらの類似度を演算し、この演算処理に
よって、上記特徴パターンに対して、最も距離が小さ
く、かつ予め設定されている基準値より小さい距離を持
つ標準パターンがあれば、これを上記音声データに対応
する単語と判定し、この判定結果を結果出力部２３に供
給する。Further, the speech spectrum dictionary 22 includes words to be recognized, for example, "Mezameyo", "Buster",
A standard pattern indicating characteristics such as “advance” and “kill” is stored. When a read instruction is issued from the similarity calculation unit 21, the standard pattern of the designated word is read and the standard pattern of the specified word is read. It is supplied to the calculation unit 21.
When the spectrum extraction unit 19 outputs a feature pattern indicating the feature of the voice data, the similarity calculation unit 21 outputs the feature pattern and the voice spectrum dictionary 22.
Pattern matching processing with each standard pattern stored in, to calculate the degree of similarity between the standard patterns stored in the If there is a standard pattern having a small distance, it is determined as a word corresponding to the voice data, and the determination result is supplied to the result output unit 23.

【００１４】結果出力部２３は、上記処理制御部１６か
ら出力される音声認識指示に基づき、上記音量検出部２
０、音声区間検出部１８、スペクトル抽出部１９、類似
度演算部２１、音声スペクトル辞書２２のうち、必要な
部分を動作状態にして、音声認識処理を行なわせるとと
もに、この音声認識処理で得られた認識結果を取り込ん
で、上記音声データに対応する単語を決定し、この決定
結果に応じた駆動信号公報を生成し、これを上記モータ
制御部１０に供給する。The result output unit 23, based on the voice recognition instruction output from the processing control unit 16, outputs the sound volume detection unit 2 described above.
0, the voice section detection unit 18, the spectrum extraction unit 19, the similarity calculation unit 21, and the voice spectrum dictionary 22 are set to the required state to perform the voice recognition process, and the voice recognition process is performed. The recognition result is taken in, the word corresponding to the voice data is determined, a drive signal bulletin is generated according to the determination result, and this is supplied to the motor control unit 10.

【００１５】上記処理制御部１６から出力される音声認
識指示によってスペクトル抽出処理を使用した音声認識
が指定されていれば、上記音量検出部２０、音声区間検
出部１８、スペクトル抽出部１９、類似度演算部２１、
音声スペクトル辞書２２の全てが動作状態にされて、図
３に示す如く上記音声データが「めざめよ」、「バスタ
ー」、「進め」、「やっつけろ」のいずれであるか判定
され、また上記音声認識指示によって無音区間数を使用
した音声認識が指定されていれば、上記音量検出部２
０、音声区間検出部１８、スペクトル抽出部１９、類似
度演算部２１、音声スペクトル辞書２２のうち、上記音
量検出部２０、音声区間検出部１８のみが動作状態にさ
れて、上記音声データが「ター」、「パッパッパ」のい
ずれであるか判定される。また、上記音声認識指示によ
って音量を使用した音声認識が指定されていれば、上記
音量検出部２０、音声区間検出部１８、スペクトル抽出
部１９、類似度演算部２１、音声スペクトル辞書２２の
うち、上記音量検出部２０のみが動作状態にされて、上
記音声データが「大きな声」、「小さな声」のいずれで
あるか判定される。If the voice recognition using the spectrum extraction processing is designated by the voice recognition instruction output from the processing control unit 16, the volume detection unit 20, the voice section detection unit 18, the spectrum extraction unit 19, the similarity degree. Computing unit 21,
All of the voice spectrum dictionary 22 is put into an operating state, and it is determined whether the voice data is "awake,""buster,""advance," or "kill" as shown in FIG. 3, and the voice recognition instruction is given. If the voice recognition using the number of silent sections is specified by, the above-mentioned volume detection unit 2
0, the voice section detector 18, the spectrum extractor 19, the similarity calculator 21, and the voice spectrum dictionary 22, only the volume detector 20 and the voice section detector 18 are activated, and the voice data is “ It is determined whether it is a “ter” or a “pappappa”. Further, if the voice recognition using the sound volume is designated by the voice recognition instruction, among the sound volume detection unit 20, the voice section detection unit 18, the spectrum extraction unit 19, the similarity calculation unit 21, and the voice spectrum dictionary 22, Only the sound volume detection unit 20 is activated, and it is determined whether the sound data is “loud voice” or “loud voice”.

【００１６】また、処理制御部１６は、ロボット玩具１
の使用者によって指定された音声認識の指示に基づき、
指定された音声認識指示を出すとともに、この音声認識
指示に応じてクロック周波数指示信号公報と電圧値指示
信号公報とを生成する特徴量指定部２４と、この特徴量
指定部２４から出力されるクロック周波数指示信号公報
に応じた周波数のクロック信号公報を生成するクロック
制御部２５と、上記特徴量指定部２４から出力される電
圧値指示信号公報に応じた電圧値を持つ電源電圧を生成
する電圧制御部２６とを備えている。ロボット玩具１の
使用者によってスペクトル抽出を用いた音声認識が指示
されているときには、上記音声認識処理部１５に対し
て、スペクトル抽出を用いて音声認識処理を行なうこと
を示す音声認識指示を出すとともに、最も高い周波数を
持つクロック信号公報、最も高い電圧値を持つ電源電圧
を生成し、これを上記音声認識処理部１５に供給する。Further, the processing control unit 16 is provided with the robot toy 1.
Based on the voice recognition instructions specified by the user of
A feature amount designating section 24 that issues a designated voice recognition instruction and generates a clock frequency instruction signal bulletin and a voltage value instruction signal bulletin according to the voice recognition instruction, and a clock output from the feature amount designating section 24. A clock control unit 25 for generating a clock signal publication having a frequency according to the frequency indication signal publication, and a voltage control for producing a power supply voltage having a voltage value according to the voltage value indication signal publication output from the feature quantity designating unit 24. And a section 26. When voice recognition using spectrum extraction is instructed by the user of the robot toy 1, voice recognition instruction indicating that voice recognition processing is to be performed using spectrum extraction is issued to the voice recognition processing unit 15. , A clock signal bulletin having the highest frequency, a power supply voltage having the highest voltage value are generated and supplied to the voice recognition processing unit 15.

【００１７】また、ロボット玩具１の使用者によって無
音区間を用いた音声認識が指示されているときには、上
記音声認識処理部１５に対して、無音区間を用いて音声
認識処理を行なうことを示す音声認識指示を出すととも
に、通常の周波数を持つクロック信号公報、通常の電圧
値を持つ電源電圧を生成し、これを上記音声認識処理部
１５に供給する。これによって、音声認識装置９を構成
している各素子、例えばＣＭＯＳ回路など、クロック信
号公報が切り替わる際に、電力を消費する素子の消費電
力が低く押さえられるとともに、各素子で使用される電
力（電流×電圧）が低く押さえられて、消費電力の低減
が図られる。また、ロボット玩具１の使用者によって音
量区間を用いた音声認識が指示されているときには、上
記音声認識処理部１５に対して、音量を用いて音声認識
処理を行なうことを示す音声認識指示を出すとともに、
最も低い周波数を持つクロック信号公報、最も低い電圧
値を持つ電源電圧を生成し、これを上記音声認識処理部
１５に供給する。これによって、音声認識装置９を構成
している各素子、例えばＣＭＯＳ回路など、クロック信
号公報が切り替わる際に、電力を消費する素子の消費電
力が低く押さえられるとともに、各素子で使用される電
力（電流×電圧）が低く押さえられて、さらに消費電力
の低減が図られる。When the user of the robot toy 1 gives an instruction for voice recognition using a silent section, the voice recognition processing unit 15 is instructed to perform voice recognition processing using the silent section. While issuing a recognition instruction, a clock signal bulletin having a normal frequency and a power supply voltage having a normal voltage value are generated and supplied to the voice recognition processing unit 15. As a result, the power consumption of each element constituting the voice recognition device 9, for example, a CMOS circuit or the like, which consumes power when the clock signal publication is switched, is kept low, and the power used by each element ( (Current x voltage) is kept low, and power consumption is reduced. Further, when the user of the robot toy 1 gives an instruction for voice recognition using a volume section, the voice recognition processing unit 15 is given a voice recognition instruction indicating that the voice recognition processing is performed using the volume. With
The clock signal bulletin having the lowest frequency and the power supply voltage having the lowest voltage value are generated and supplied to the voice recognition processing unit 15. As a result, the power consumption of each element constituting the voice recognition device 9, for example, a CMOS circuit or the like, which consumes power when the clock signal publication is switched, is kept low, and the power used by each element ( (Current x voltage) is suppressed to a low level, further reducing power consumption.

【００１８】このように、この形態例では、指定された
認識条件で、処理制御部１６を動作させて、指定された
音声認識精度を確保するのに必要なクロック周波数、電
源電圧の値で音声認識処理部１５の必要な部分のみを選
択的に動作させて、マイクロホン７に入力された音声を
認識処理させるとともに、この処理結果に基づき、上記
モータ制御部１０を動作させて、上記音声の内容に対応
する動作を行なわせるようにしたので、アプリケーショ
ンが要求する認識精度に応じて、音声認識処理の種類を
切り換え、これによって実際に使用される音声認識処理
で必要な処理量に応じたクロック周波数、電源電圧で音
声認識処理を行なわせるとともに、認識処理動作に関与
しない部分に対する給電を停止させ、これによって必要
な音声認識精度と、低消費電力化とを達成することがで
きる。As described above, in this embodiment, the processing control unit 16 is operated under the specified recognition condition, and the sound is generated at the clock frequency and the value of the power supply voltage required to secure the specified sound recognition accuracy. Only the necessary portion of the recognition processing unit 15 is selectively operated to recognize the voice input to the microphone 7, and the motor control unit 10 is operated based on the result of the processing to cause the content of the voice to be recognized. Since the operation corresponding to the above has been performed, the type of voice recognition processing is switched according to the recognition accuracy required by the application, and the clock frequency corresponding to the amount of processing required for the voice recognition processing actually used is thereby changed. , The voice recognition process is performed with the power supply voltage, and the power supply to the part that is not involved in the recognition process operation is stopped. It can be achieved with lower power consumption.

【００１９】また、上述した形態例においては、音声認
識処理の種類に応じて、クロック周波数、電源電圧、電
源の供給対象となる回路とを全て切り換えるようにして
いるが、これらのうちのいずれか、例えばクロック周波
数のみを切り換えたり、クロック周波数および電源電圧
のみを切り換えたりするようにしても、上述した形態例
とほぼ同様な効果を得ることができる。また、上述した
形態例においては、ロボット玩具１に対して、音声認識
装置９を装着するようにしているが、携帯性を重視した
電子機器、例えば電子手帳、ハンディターミナルなどに
搭載するようにしても良い。このような電子機器に音声
認識装置９を搭載すれば、必要な音声認識の精度を確保
するのに必要な最も低い消費電力で、音声認識を行なう
ことができ、これによって電池の消耗を最低にしなが
ら、長い時間、稼動させることができる。In the above-described embodiment, the clock frequency, the power supply voltage, and the circuit to which power is supplied are all switched depending on the type of voice recognition processing. Even if, for example, only the clock frequency is switched or only the clock frequency and the power supply voltage are switched, it is possible to obtain substantially the same effect as that of the above-described embodiment. Further, in the above-described embodiment, the voice recognition device 9 is attached to the robot toy 1. However, the voice recognition device 9 is attached to an electronic device that emphasizes portability, such as an electronic notebook and a handy terminal. Is also good. If the voice recognition device 9 is installed in such an electronic device, the voice recognition can be performed with the lowest power consumption required to ensure the required accuracy of the voice recognition, thereby minimizing the battery consumption. However, it can be operated for a long time.

【００２０】[0020]

【発明の効果】以上説明したように本発明によれば、請
求項１では、アプリケーションが要求する認識精度に応
じて、音声認識処理の種類を切り換え、これによって実
際に使用される音声認識処理で必要な処理量に応じたク
ロック周波数で音声認識処理を行なわせて、必要な音声
認識精度と、低消費電力化とを達成することができる。
請求項２では、アプリケーションが要求する認識精度に
応じて、音声認識処理の種類を切り換え、これによって
実際に使用される音声認識処理で必要な処理量に応じた
クロック周波数、電源電圧で音声認識処理を行なわせ
て、必要な音声認識精度と、低消費電力化とを達成する
ことができる。請求項３では、アプリケーションが要求
する認識精度に応じて、音声認識処理の種類を切り換
え、これによって実際に使用される音声認識処理で必要
な処理量に応じたクロック周波数、電源電圧で音声認識
処理を行なわせるとともに、認識処理動作に関与しない
部分に対する給電を停止させて、必要な音声認識精度
と、低消費電力化とを達成することができる。As described above, according to the present invention, in claim 1, the type of voice recognition processing is switched according to the recognition accuracy required by the application, and the voice recognition processing actually used is thereby changed. By performing the voice recognition processing at the clock frequency according to the required processing amount, it is possible to achieve the required voice recognition accuracy and low power consumption.
In claim 2, the type of the voice recognition processing is switched according to the recognition accuracy required by the application, whereby the voice recognition processing is performed with the clock frequency and the power supply voltage according to the processing amount required for the voice recognition processing actually used. The required voice recognition accuracy and low power consumption can be achieved. In claim 3, the type of the voice recognition processing is switched according to the recognition accuracy required by the application, whereby the voice recognition processing is performed with the clock frequency and the power supply voltage according to the processing amount required for the voice recognition processing actually used. It is possible to achieve the required voice recognition accuracy and low power consumption by stopping the power supply to the portion not involved in the recognition processing operation.

[Brief description of drawings]

【図１】本発明による音声認識装置の一形態例を使用し
たロボット玩具の一例を示す正面図である。FIG. 1 is a front view showing an example of a robot toy using an example of the form of a voice recognition device according to the present invention.

【図２】図１に示す音声認識装置の詳細な回路構成例を
示すブロック図である。FIG. 2 is a block diagram showing a detailed circuit configuration example of the voice recognition device shown in FIG.

【図３】図２に示す音声認識装置の処理例を示す表であ
る。FIG. 3 is a table showing a processing example of the voice recognition device shown in FIG.

[Explanation of symbols]

１…ロボット玩具、２…胴体部、３…脚部、４…腕部、
５…首部、６…頭部、７…マイクロホン、８…電池、９
…音声認識装置、１０…モータ制御部、１１…モータ、
１２…アクチュエータ、１５…音声認識処理部、１６…
処理制御部、１７…音声入力部、１８…音声区間検出部
（第１〜第Ｎ特徴抽出部）、１９…スペクトル抽出部
（第１〜第Ｎ特徴抽出部）、２０…音量検出部（第１〜
第Ｎ特徴抽出部）、２１…類似度演算部（第１〜第Ｎ特
徴抽出部）、２２…音声スペクトル辞書（第１〜第Ｎ特
徴抽出部）、２３…結果出力部、２４…特徴量指定部、
２５…クロック制御部、２６…電圧制御部1 ... Robot toy, 2 ... Body part, 3 ... Leg part, 4 ... Arm part,
5 ... neck, 6 ... head, 7 ... microphone, 8 ... battery, 9
... voice recognition device, 10 ... motor control unit, 11 ... motor,
12 ... Actuator, 15 ... Voice recognition processing section, 16 ...
Processing control unit, 17 ... Voice input unit, 18 ... Voice section detection unit (first to Nth feature extraction unit), 19 ... Spectrum extraction unit (first to Nth feature extraction unit), 20 ... Volume detection unit (first 1 to
Nth feature extraction unit), 21 ... Similarity calculation unit (first to Nth feature extraction unit), 22 ... Speech spectrum dictionary (first to Nth feature extraction unit), 23 ... Result output unit, 24 ... Feature amount Designated part,
25 ... Clock control unit, 26 ... Voltage control unit

Claims

[Claims]

1. A microphone for converting input voice into a voice signal, first to Nth feature extraction units for extracting feature information of a voice signal output from the microphone by mutually different extraction methods, and A clock control unit that supplies a clock signal to the 1st to Nth feature extraction units and operates at a specified speed, and a feature amount is obtained according to the accuracy of voice recognition, and according to this feature amount, A voice recognition device, comprising: a feature amount designation unit that changes a frequency of a clock signal output from the clock control unit.

2. The voice recognition device according to claim 1, further comprising a voltage control unit that supplies a power supply voltage having a specified voltage value to the first to Nth feature extraction units, A voice recognition device characterized in that a voltage value of a power supply voltage output from the voltage control unit is changed in accordance with a characteristic amount obtained by a designation unit.

3. The voice recognition device according to claim 1, wherein the clock control unit and the voltage control unit are the first
To selectively stop the supply of the clock signal and the supply of the power supply voltage to the portion of the Nth feature extraction unit that corresponds to the feature amount obtained by the feature amount designation unit and is not involved in the extraction of the feature information. Characteristic voice recognition device.