JP2000284794A

JP2000284794A - Device and method for voice recognition and voice recognition system

Info

Publication number: JP2000284794A
Application number: JP11093490A
Authority: JP
Inventors: Kenichiro Nakagawa; 賢一郎中川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1999-03-31
Filing date: 1999-03-31
Publication date: 2000-10-13

Abstract

PROBLEM TO BE SOLVED: To provide a device and a method for voice recognition and a voice recognizing system an analog command inputting is made possible employing a voice input only. SOLUTION: If the recognition result of input voice indicates a bearing in a voice recognition device 101, a voice parameter obtaining section 108 obtains elements such as length, volume and interval of the input voice other than vocabulary contents as parameters and an analog amount corresponding to the parameters is determined. Then, in an extended voice command generating section 110, the analog amount is added to a voice command and an extended voice command is generated. In a camera section 112, vocabulary contents of the inputted expanded voice command and its analog amount are extracted and an imaging section 115 is controlled by the portion of the analog amount based on the command.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声認識装置及び方
法、及び音声認識システムに関し、例えば音声によるコ
マンド入力を可能とする音声認識装置及び方法、及び音
声認識システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition apparatus and method, and a voice recognition system, and more particularly to a voice recognition apparatus and method capable of inputting a command by voice, and a voice recognition system.

【０００２】[0002]

【従来の技術】一般に、装置又はシステムに対してアナ
ログ的な入力を行いたい場合、ダイヤルやバー，マウス
といったアナログ入力を補助するデバイスによって入力
を行うべきである。しかしながら、入力対象となる装置
が極めて小型であるために手動入力が困難である等、ユ
ーザによる手操作入力が期待できない場合もある。2. Description of the Related Art Generally, when it is desired to perform analog input to a device or a system, the input should be performed by a device such as a dial, a bar, or a mouse which assists analog input. However, there are cases where manual input by the user cannot be expected, for example, it is difficult to perform manual input because the device to be input is extremely small.

【０００３】このような不具合を解消するために、装置
に対してマイクロフォン等の音声入力デバイスを備える
ことによって、ユーザが発声した音声を入力して認識
し、該認識結果をコマンドとして活用する音声認識装置
が知られている。In order to solve such a problem, the apparatus is provided with a voice input device such as a microphone, so that a voice uttered by a user is input and recognized, and the recognition result is utilized as a command. Devices are known.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記従
来の音声認識装置に対するコマンド入力を音声入力によ
って実現する場合、該音声入力のみによってアナログ的
な微妙な操作を行うことは困難であった。However, when a command input to the conventional voice recognition device is realized by voice input, it is difficult to perform a delicate analog operation only by the voice input.

【０００５】例えば、音声認識を可能とする撮像装置に
対して、左右の方向指示コマンドを入力する場合につい
て考える。この場合、「右」及び「左」の基本コマンド
に対して、「少し右」や「大きく左」のように、コマン
ド自体に副詞を付けて発声することによって、アナログ
的な操作を可能とすることも考えられる。[0005] For example, consider a case where a left and right direction instruction command is input to an imaging device capable of voice recognition. In this case, analog operations can be performed by adding an adverb to the command itself, such as "slightly right" or "largely left", for the basic commands "right" and "left". It is also possible.

【０００６】しかしながらこの方法では、予め登録され
た副詞に対応する変化量しか制御できない。また、ユー
ザにとっても、どの副詞が音声コマンドとして入力可能
であるかを判断することはできない。However, according to this method, only a change amount corresponding to a pre-registered adverb can be controlled. Also, the user cannot determine which adverb can be input as a voice command.

【０００７】本発明は上述した問題点を解決するために
なされたものであり、音声入力のみによって、アナログ
的なコマンド入力を可能とする音声認識装置及び方法、
及び音声認識システムを提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has a voice recognition apparatus and method which enables analog command input by voice input only.
And a speech recognition system.

【０００８】また、ユーザ毎の発声の特徴を考慮して、
アナログ的な音声コマンド入力を可能とする音声認識装
置及び方法、及び音声認識システムを提供することを目
的とする。Also, taking into account the characteristics of the utterance for each user,
It is an object of the present invention to provide a voice recognition device and method and a voice recognition system that enable analog voice command input.

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するため
の一手段として、本発明の音声認識装置は以下の構成を
備える。As one means for achieving the above object, the speech recognition apparatus of the present invention has the following arrangement.

【００１０】即ち、音声信号を入力する入力手段と、前
記音声信号の示す語彙内容を認識する音声認識手段と、
前記音声信号における特徴量を取得する特徴量取得手段
と、前記音声信号の語彙内容と特徴量に基づいてコマン
ドを作成するコマンド作成手段と、を有することを特徴
とする。That is, input means for inputting a voice signal, voice recognition means for recognizing the vocabulary content indicated by the voice signal,
It is characterized by having feature amount obtaining means for obtaining a feature amount in the audio signal, and command creating means for creating a command based on the vocabulary content and the feature amount of the audio signal.

【００１１】例えば、前記コマンド作成手段は、前記語
彙内容の示す操作の程度を前記特徴量で示すように、コ
マンドを作成することを特徴とする。For example, the command creating means creates a command so that the degree of operation indicated by the vocabulary content is indicated by the feature amount.

【００１２】例えば、前記特徴量は、前記音声信号にお
けるアナログ的な変化量を示すことを特徴とする。For example, the characteristic amount indicates an analog change amount in the audio signal.

【００１３】[0013]

【発明の実施の形態】以下、本発明に係る一実施形態に
ついて、図面を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment according to the present invention will be described below in detail with reference to the drawings.

【００１４】図１は、本実施形態におけるカメラシステ
ムの構成を示すブロック図である。本実施形態のカメラ
システムは、ユーザからの音声入力によりカメラのパン
・チルトが制御可能である。FIG. 1 is a block diagram showing a configuration of a camera system according to the present embodiment. In the camera system according to the present embodiment, the pan / tilt of the camera can be controlled by voice input from a user.

【００１５】図１において、１０１は本実施形態の特徴
をなす音声認識装置、１０２は該音声認識装置１０１に
よって認識された音声コマンドに従って実際の撮影を行
なうカメラ部である。音声認識装置１０１への音声入力
は、マイクロフォン１０４等の音声入力装置から音声波
形を入力することによって行われる。または、一般公衆
回線１０５を介して、電話機１０２や携帯端末１０３等
から音声を入力してもよい。In FIG. 1, reference numeral 101 denotes a voice recognition device which is a feature of the present embodiment, and reference numeral 102 denotes a camera unit which performs actual photographing in accordance with a voice command recognized by the voice recognition device 101. The voice input to the voice recognition device 101 is performed by inputting a voice waveform from a voice input device such as the microphone 104. Alternatively, voice may be input from the telephone 102, the portable terminal 103, or the like via the general public line 105.

【００１６】音声認識装置１０１において、１０はＣＰ
Ｕ１０ａ，ＲＯＭ１０ｂおよびＲＡＭ１０ｃなどからな
る制御部であり、後述する各構成の動作および処理を制
御する。特に、ＣＰＵ１０ａは、ＲＯＭ１０ｂに予め格
納されプログラムに従い、ＲＡＭ１０ｃをワークメモリ
として後述する多重化およびその関連処理の制御などを
実行する。In the speech recognition apparatus 101, 10 is a CP
The control unit includes a U10a, a ROM 10b, a RAM 10c, and the like, and controls the operation and processing of each component described later. In particular, the CPU 10a performs control of multiplexing and related processes to be described later using the RAM 10c as a work memory according to a program stored in the ROM 10b in advance.

【００１７】音声認識装置１０１に入力された音声は、
まず音声取り込み部１０６に取り込まれ、その音声波形
が音声認識部１０７と音声パラメータ取得部１０８に送
られる。The voice input to the voice recognition device 101 is
First, the voice is captured by the voice capturing unit 106, and the voice waveform is sent to the voice recognition unit 107 and the voice parameter acquiring unit 108.

【００１８】音声認識部１０７においては、入力波形を
音声認識し、入力された音声がどの音声コマンドに相当
するかを決定する。ここで決定された認識結果は、音声
コマンドとして拡張音声コマンド作成部１１０に送られ
るほか、音声パラメータ取得部１０８にも送られる。The voice recognition unit 107 recognizes the input waveform by voice and determines which voice command the input voice corresponds to. The recognition result determined here is sent to the extended voice command creation unit 110 as a voice command, and is also sent to the voice parameter acquisition unit 108.

【００１９】また音声パラメータ取得部１０８において
は、音声認識部１０７における認識結果が方位を示す場
合に、音声の長さや音量，音程といった語彙的内容以外
の要素（アナログ要素）をパラメータとして取得する。
ここで取得されたパラメータはアナログ量決定部１０９
に送られ、パラメータに見合うアナログ量が決定され
る。決定されたアナログ量は、拡張音声コマンド作成部
１１０に送られる。The speech parameter acquiring unit 108 acquires, as a parameter, an element (analog element) other than lexical contents, such as a speech length, a volume and a pitch, when the recognition result of the speech recognizing unit 107 indicates an azimuth.
The parameter obtained here is used as the analog amount determination unit 109.
To determine the analog amount that matches the parameter. The determined analog amount is sent to the extended voice command creation unit 110.

【００２０】拡張音声コマンド作成部１１０において
は、音声認識部１０７における認識結果である音声コマ
ンドに対して、アナログ量決定部１０９において決定さ
れた各音声パラメータに応じたアナログ量を付加する。
以下、このアナログ量が付加された音声コマンドを拡張
音声コマンドと称する。この拡張音声コマンドは拡張音
声コマンド出力部１１１に送られ、音声認識装置１０１
における認識結果として、カメラ部１１２へ出力され
る。The extended voice command creating section 110 adds an analog amount corresponding to each voice parameter determined by the analog amount determining section 109 to the voice command which is the recognition result of the voice recognition section 107.
Hereinafter, the voice command to which the analog amount is added is referred to as an extended voice command. The extended voice command is sent to the extended voice command output unit 111, and the voice recognition device 101
Is output to the camera unit 112 as the recognition result in.

【００２１】カメラ部１１２の拡張音声コマンド入力部
１１３には、一般的な「露光」，「写せ」，「シャッタ
ー」，「ハイ、チーズ」等の音声コマンドの他に、
「右」や「上」等の方位を示すコマンドが拡張音声コマ
ンドとして入力される。これら方位を示す拡張音声コマ
ンドは方位パラメータ制御部１１４に送られ、ここでコ
マンドの語彙的な内容と、そのアナログ量の各パラメー
タに分解される。この語彙的な内容が即ち操作内容を示
し、アナログ量が該操作の程度を示す。The extended voice command input unit 113 of the camera unit 112 includes general voice commands such as "exposure", "copy", "shutter", "high, cheese", and the like.
Commands indicating directions such as “right” and “up” are input as extended voice commands. These extended voice commands indicating the azimuth are sent to the azimuth parameter control unit 114, where they are decomposed into lexical contents of the command and parameters of the analog amount. The lexical content indicates the operation content, and the analog amount indicates the degree of the operation.

【００２２】これら各パラメータは撮像部１１５に送ら
れ、不図示のアクチュエータ等を制御することにより、
撮像部１１５は例えばそのパン・チルト角を、入力され
たアナログ量の分だけ変化させる。These parameters are sent to the imaging unit 115, and by controlling an actuator (not shown),
The imaging unit 115 changes the pan / tilt angle, for example, by the input analog amount.

【００２３】本実施形態によれば即ち、音声認識装置１
０１に対して例えば「右」という方位を示す音声コマン
ドを入力した場合、例えばその音声がより長い、又はよ
り大音量である、又はより高音であれば、カメラ部１１
２の撮像部１１５をより右へ移動させるように操作する
ことが可能となる。According to the present embodiment, namely, the speech recognition device 1
For example, when a voice command indicating a direction of “right” is input to the camera unit 11, if the voice is longer, louder, or treble, for example,
It is possible to operate the second imaging unit 115 to move it to the right.

【００２４】図２は、本実施形態における音声認識装置
１０１の動作を示すフローチャートである。該処理の制
御プログラムは、例えばＲＯＭ１０ｂに格納されてお
り、ＣＰＵ１０ａによってＲＡＭ１０ｃ上に読み出され
て実行される。以下、同図を参照して、本実施形態にお
ける拡張音声コマンド出力処理について詳細に説明す
る。FIG. 2 is a flowchart showing the operation of the speech recognition apparatus 101 in the present embodiment. The control program for this processing is stored in, for example, the ROM 10b, and is read out and executed on the RAM 10c by the CPU 10a. Hereinafter, the extended voice command output processing according to the present embodiment will be described in detail with reference to FIG.

【００２５】まず、音声認識処理が実行されると音声検
出のループに入り（Ｓ２０１）、音声が入力されると、
音声認識部１０７においてその音声波形を用いた音声認
識を行う（Ｓ２０２）。ここで、発声された内容が曖昧
であった等の理由により、音声認識の結果が出力されな
い場合が考えられる。従って、音声認識部１０７より音
声認識結果が出るか否かを判定し（Ｓ２０３）、認識結
果が出ない場合には該認識処理を中断する。First, when the voice recognition process is executed, the process enters a voice detection loop (S201).
The voice recognition unit 107 performs voice recognition using the voice waveform (S202). Here, it is conceivable that the result of voice recognition is not output because the uttered content is ambiguous or the like. Therefore, it is determined whether or not a voice recognition result is obtained from the voice recognition unit 107 (S203). If no recognition result is obtained, the recognition process is interrupted.

【００２６】音声認識結果が出力されれば、音声パラメ
ータ取得部１０８は該認識結果が方位を示す単語である
か否かを判定する（Ｓ２０４）。方位を示す単語であれ
ば、入力された音声波形から音声パラメータを取得する
（Ｓ２０５）。ここでは、音声の発声時間（又は発声速
度）や音量，音程情報等のアナログ要素が、音声パラメ
ータとして取得される。次にアナログ量決定部１０９に
おいて、これら音声パラメータの値に基づき、音声コマ
ンドのアナログ量を決定する（Ｓ２０６）。When the speech recognition result is output, the speech parameter acquiring unit 108 determines whether the recognition result is a word indicating a direction (S204). If it is a word indicating an azimuth, voice parameters are obtained from the input voice waveform (S205). Here, analog elements such as the utterance time (or utterance speed), volume, and pitch information of the voice are acquired as voice parameters. Next, the analog amount determining unit 109 determines the analog amount of the voice command based on the values of the voice parameters (S206).

【００２７】このアナログ量の決定方法としては、例え
ば図４に示すような、音声パラメータとアナログ量が単
純に対応するテーブルを予め用意しておくことが考えら
れる。図４に示す例によれば、発声時間、音量、音程の
各音声パラメータの状況に応じて、アナログ量が決定さ
れる。もちろん、このテーブルは図４に示す例に限定さ
れるものではなく、例えば各パラメータの組み合わせに
よってアナログ量を設定しても良いし、また、設定され
るアナログ量も、大小の２種類のみならず、複数段階を
備えることが可能である。As a method of determining the analog amount, it is conceivable to prepare a table, for example, as shown in FIG. According to the example shown in FIG. 4, the analog amount is determined according to the state of each voice parameter such as the utterance time, the volume, and the pitch. Of course, this table is not limited to the example shown in FIG. 4. For example, an analog amount may be set by a combination of parameters, and the set analog amount is not limited to two types, large and small. , It is possible to have multiple stages.

【００２８】また、アナログ量を決定する他の方法とし
て、以下に示す演算式により、音声パラメータからアナ
ログ量を直接算出することも可能である。As another method of determining the analog amount, it is possible to directly calculate the analog amount from the voice parameter by the following arithmetic expression.

【００２９】アナログ量＝α×音声パラメータ＋ｂ
（α，ｂは定数）このようにステップＳ２０６で得られたアナログ量が、
拡張音声コマンド作成部１１０において音声認識の結果
と組み合わされることにより、拡張音声コマンドが作成
される（Ｓ２０７）。この拡張音声コマンドは、拡張音
声コマンド出力部１１１を介して、外部のカメラ部１１
２へ出力される（Ｓ２０８）。Analog amount = α × voice parameter + b
(Α and b are constants) As described above, the analog amount obtained in step S206 is
The extended voice command is created by being combined with the result of the voice recognition in the extended voice command creating unit 110 (S207). This extended voice command is output to the external camera unit 11 via the extended voice command output unit 111.
2 (S208).

【００３０】一方、ステップＳ２０４において音声認識
結果が方位を示す単語でなかった場合、拡張音声コマン
ド作成部１１０は該認識結果のみに基づいて音声コマン
ドを作成し（Ｓ２０９）、カメラ部１１２へ出力する
（Ｓ２１０）。On the other hand, if the speech recognition result is not a word indicating an orientation in step S204, the extended speech command creation unit 110 creates a speech command based on only the recognition result (S209) and outputs it to the camera unit 112. (S210).

【００３１】図３は、上述した様にして作成された拡張
音声コマンドに対応したカメラ部１１２の動作を示すフ
ローチャートである。上述した図２のフローチャートに
従って音声認識装置１０１から出力された拡張音声コマ
ンドまたは音声コマンドがカメラ部１１２に入力される
と、カメラ部１１２では、音声コマンド待ちのループ
（Ｓ２１１）から抜け出る。そして、入力されたコマン
ドが一般的な音声コマンドであるか、又は拡張音声コマ
ンドであるかの判定を行う（Ｓ２１２）。一般的な音声
コマンドであれば、そのまま撮像部１１５の制御を行う
（Ｓ２１４）。FIG. 3 is a flowchart showing the operation of the camera unit 112 corresponding to the extended voice command created as described above. When an extended voice command or a voice command output from the voice recognition device 101 is input to the camera unit 112 according to the flowchart of FIG. 2 described above, the camera unit 112 exits from a voice command waiting loop (S211). Then, it is determined whether the input command is a general voice command or an extended voice command (S212). If it is a general voice command, the control of the imaging unit 115 is performed as it is (S214).

【００３２】一方、ステップＳ２１２において入力され
たコマンドが拡張音声コマンドであれば、方位パラメー
タ制御部１１４において該コマンドの示す方位、及びア
ナログ量を取得（Ｓ２１３）した後、それらの値を用い
て撮像部１１５を制御する（Ｓ２１４）。On the other hand, if the command input in step S212 is an extended voice command, the azimuth parameter control unit 114 acquires the azimuth indicated by the command and the analog amount (S213), and then uses these values to perform imaging. The unit 115 is controlled (S214).

【００３３】以上説明したように本実施形態によれば、
ユーザの発声におけるアナログ要素を検出し、それに応
じたアナログ量をコマンドに付加して転送することによ
り、音声入力のみによってアナログ的な繰作を行うこと
が可能になる。As described above, according to the present embodiment,
By detecting an analog element in the utterance of the user and adding an analog amount corresponding to the detected element to the command, the analog operation can be performed only by voice input.

【００３４】＜変形例１＞本実施形態を、ユーザの認証
を行うシステムに対して適用することも可能である。こ
の場合、ユーザ登録時に該ユーザのデフォルトの発声を
予め登録しておく必要がある。具体的には、ユーザのデ
フォルト発声を解析することにより、発声時間や音量，
音程等の音声パラメータの形式として、例えばＲＡＭ１
０ｃ等に保持しておけば良い。<Modification 1> The present embodiment can be applied to a system for performing user authentication. In this case, it is necessary to register the default utterance of the user at the time of user registration. Specifically, by analyzing the user's default utterance, the utterance time, volume,
As a format of a voice parameter such as a pitch, for example, RAM1
It may be held at 0c or the like.

【００３５】システムは、そのサービス開始時にまずユ
ーザの認証を行う。そして、以後のユーザ操作において
は、該ユーザの発声を、予め登録された該ユーザのデフ
ォルト発声と比較することによって、拡張音声コマンド
を作成する。The system first authenticates the user when the service starts. Then, in subsequent user operations, an expanded voice command is created by comparing the utterance of the user with a default utterance of the user registered in advance.

【００３６】この場合のアナログ量の決定は、現在の発
声における音声パラメータとデフォルトの音声パラメー
タとの差分に基づいて行われる。例えば、次式に従って
アナログ量が決定される。In this case, the determination of the analog amount is performed based on the difference between the voice parameter in the current utterance and the default voice parameter. For example, the analog amount is determined according to the following equation.

【００３７】アナログ量＝α×(デフォルト音声パラメ
ータ−現音声パラメータ)＋ｂ（ａ，ｂは定数）これにより、ユーザ毎の発声の特徴を考慮してアナログ
量を決定することができる。従って、例えば普段から声
の高いユーザや、発声速度が速いユーザ等、ユーザ毎に
発声状態が異なっていても、ユーザ毎に適切な拡張音声
コマンドを設定することができるため、システムを適切
に操作することが可能となる。Analog amount = α × (default voice parameter−current voice parameter) + b (a and b are constants). Thus, the analog amount can be determined in consideration of the utterance characteristics of each user. Therefore, even if the utterance state is different for each user, for example, a user who usually has a high voice or a user who has a high utterance speed, an appropriate extended voice command can be set for each user. It is possible to do.

【００３８】＜変形例２＞また、本実施形態をユーザの
認証を行わないシステムに対して適用する場合であって
も、ユーザ毎の発声の特徴を考慮した拡張音声コマンド
入力を行なうことが可能である。<Modification 2> Further, even when the present embodiment is applied to a system in which user authentication is not performed, it is possible to input an extended voice command in consideration of the utterance characteristics of each user. It is.

【００３９】例えば、ユーザの一発声前の音声パラメー
タを保持しておき、現在の発声と比較することによっ
て、拡張音声コマンドのアナログ量を決定するように構
成すれば良い。For example, it is sufficient to hold the speech parameters before the user's first utterance, and to determine the analog amount of the extended voice command by comparing with the current utterance.

【００４０】この場合のアナログ量の決定は、現在の発
声における音声パラメータと一発声前の音声パラメータ
との差分に基づいて行われる。例えば、次式に従ってア
ナログ量が決定される。In this case, the determination of the analog amount is performed based on the difference between the speech parameter in the current utterance and the speech parameter before the first utterance. For example, the analog amount is determined according to the following equation.

【００４１】アナログ量＝α×(一発声前の音声パラメ
ータ−現音声パラメータ)＋ｂ（α，ｂは定数）これにより、ユーザ毎のデフォルトの発声を予め登録し
ておかなくても、上記変形例１と同様に、ユーザ毎の発
声の特徴を考慮してアナログ量を決定することができ
る。Analog amount = α × (speech parameter before one utterance−current speech parameter) + b (α and b are constants). Thus, even if the default utterance for each user is not registered in advance, Similar to 1, the analog amount can be determined in consideration of the utterance characteristics of each user.

【００４２】[0042]

【他の実施形態】なお、本発明は、複数の機器（例えば
ホストコンピュータ、インタフェイス機器、リーダ、プ
リンタなど）から構成されるシステムに適用しても、一
つの機器からなる装置（例えば、複写機、ファクシミリ
装置など）に適用してもよい。[Other Embodiments] The present invention can be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), and can be applied to a single device (for example, a copying machine). Machine, facsimile machine, etc.).

【００４３】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体（または記録媒体）を、システムあるい
は装置に供給し、そのシステムあるいは装置のコンピュ
ータ（またはCPUやMPU）が記憶媒体に格納されたプログ
ラムコードを読み出し実行することによっても、達成さ
れることは言うまでもない。この場合、記憶媒体から読
み出されたプログラムコード自体が前述した実施形態の
機能を実現することになり、そのプログラムコードを記
憶した記憶媒体は本発明を構成することになる。また、
コンピュータが読み出したプログラムコードを実行する
ことにより、前述した実施形態の機能が実現されるだけ
でなく、そのプログラムコードの指示に基づき、コンピ
ュータ上で稼働しているオペレーティングシステム(OS)
などが実際の処理の一部または全部を行い、その処理に
よって前述した実施形態の機能が実現される場合も含ま
れることは言うまでもない。An object of the present invention is to supply a storage medium (or a recording medium) in which program codes of software for realizing the functions of the above-described embodiments are recorded to a system or an apparatus, and to provide a computer (a computer) of the system or the apparatus. It is needless to say that the present invention can also be achieved by a CPU or an MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention. Also,
When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also the operating system (OS) running on the computer based on the instructions of the program code.
It is needless to say that a case in which the functions of the above-described embodiments are implemented by performing part or all of the actual processing.

【００４４】さらに、記憶媒体から読み出されたプログ
ラムコードが、コンピュータに挿入された機能拡張カー
ドやコンピュータに接続された機能拡張ユニットに備わ
るメモリに書込まれた後、そのプログラムコードの指示
に基づき、その機能拡張カードや機能拡張ユニットに備
わるCPUなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the program code is read based on the instruction of the program code. Needless to say, the CPU included in the function expansion card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００４５】本発明を上記記憶媒体に適用する場合、そ
の記憶媒体には、先に説明した（図２および図３に示
す）フローチャートに対応するプログラムコードが格納
されることになる。When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the above-described flowcharts (shown in FIGS. 2 and 3).

【００４６】[0046]

【発明の効果】以上説明したように本発明によれば、音
声入力のみによって、アナログ的なコマンド入力が可能
となる。As described above, according to the present invention, an analog command can be input only by voice input.

【００４７】また、ユーザ毎の発声の特徴を考慮した、
アナログ的な音声コマンド入力が可能となる。Further, the characteristics of the utterance for each user are taken into consideration.
Analog voice command input becomes possible.

[Brief description of the drawings]

【図１】本発明に係る一実施形態におけるカメラシステ
ムの構成を示すブロック図、FIG. 1 is a block diagram showing a configuration of a camera system according to an embodiment of the present invention;

【図２】本実施形態における拡張音声コマンド出力処理
を示すフローチャート、FIG. 2 is a flowchart illustrating an extended voice command output process according to the embodiment;

【図３】本実施形態における拡張音声コマンドへの対応
動作を示すフローチャート、FIG. 3 is a flowchart showing an operation corresponding to an extended voice command according to the embodiment;

【図４】本実施形態におけるアナログ量決定テーブルの
一例を示す図、である。FIG. 4 is a diagram illustrating an example of an analog amount determination table according to the embodiment.

[Explanation of symbols]

１０１音声認識装置１０２携帯端末１０３電話機１０４マイク１０５一般公衆回線１０６音声取り込部１０７音声認識部１０８音声パラメータ取得部１０９アナログ量決定部１１０拡張音声コマンド作成部１１１拡張音声コマンド出力部１１２カメラ部１１３拡張音声コマンド入力部１１４方位パラメータ制御部１１５撮像部 DESCRIPTION OF SYMBOLS 101 Speech recognition apparatus 102 Mobile terminal 103 Telephone 104 Microphone 105 General public line 106 Speech acquisition part 107 Speech recognition part 108 Speech parameter acquisition part 109 Analog quantity determination part 110 Extended speech command creation part 111 Extended speech command output part 112 Camera part 113 Extended voice command input unit 114 Azimuth parameter control unit 115 Imaging unit

Claims

[Claims]

An input unit for inputting a voice signal; a voice recognition unit for recognizing vocabulary content indicated by the voice signal; a feature value obtaining unit for obtaining a feature value in the voice signal; a vocabulary content of the voice signal And a command creating means for creating a command based on the feature amount.

2. A speech recognition apparatus according to claim 1, wherein said command creating means creates a command such that a degree of operation indicated by said vocabulary content is indicated by said characteristic amount.

3. The speech recognition apparatus according to claim 2, wherein the feature quantity indicates an analog variation in the speech signal.

4. The speech recognition apparatus according to claim 3, wherein the feature quantity indicates utterance speed information.

5. The speech recognition apparatus according to claim 3, wherein the feature quantity indicates volume information.

6. The speech recognition apparatus according to claim 3, wherein the feature quantity indicates pitch information.

7. The speech recognition apparatus according to claim 3, wherein the feature quantity acquiring unit acquires the feature quantity as a difference from a previously input speech signal.

8. The speech recognition apparatus according to claim 7, wherein said feature quantity acquiring means acquires the feature quantity as a difference from a speech signal input before one utterance.

9. The speech recognition apparatus according to claim 3, wherein said feature quantity acquiring means acquires said feature quantity as a difference from a predetermined speech signal.

10. The voice recognition device according to claim 9, wherein the predetermined voice signal is a default voice signal by a user who has uttered the voice signal input by the input unit.

11. The method according to claim 11, further comprising:
11. The speech recognition device according to claim 10, further comprising a holding unit for holding a plurality of users.

12. The speech recognition according to claim 1, wherein the feature quantity acquiring means acquires the feature quantity when the vocabulary content recognized by the speech recognition means is a predetermined content. apparatus.

13. The speech recognition apparatus according to claim 1, wherein said input means inputs said speech signal via a telephone line.

14. The speech recognition apparatus according to claim 1, wherein said input means is a microphone.

15. An inputting step of inputting a voice signal; a voice recognizing step of recognizing a vocabulary content indicated by the voice signal; a feature obtaining step of obtaining a feature of the voice signal; and a vocabulary content of the voice signal. And a command creation step of creating a command based on the feature amount.

16. A speech recognition system in which a speech recognition device and an operation device are connected, wherein the speech recognition device comprises: a speech input unit for inputting a speech signal; and speech recognition for recognizing vocabulary content indicated by the speech signal. Means, a feature amount obtaining means for obtaining a feature amount in the audio signal, a command creating means for creating a command based on the vocabulary content and the feature amount of the audio signal, and a command output for outputting the command to the operation device Means, the operation device comprises: a command input means for inputting the command; an operation content acquisition means for acquiring operation content and degree information of the operation based on the command; and A voice recognition system comprising: a control unit that controls an operation unit based on operation degree information.

17. A recording medium on which a program code for a speech recognition process is recorded, wherein the program code includes at least a code for an input step of inputting a speech signal, and speech recognition for recognizing vocabulary contents indicated by the speech signal. A code of a process, a code of a feature amount acquiring step of acquiring a feature amount in the audio signal, and a code of a command creating step of creating a command based on the vocabulary content and the feature amount of the audio signal. Recording medium.