JP2006023773A

JP2006023773A - Voice processing system

Info

Publication number: JP2006023773A
Application number: JP2005248484A
Authority: JP
Inventors: Shinichi Tanaka; 信一田中; Yoichi Takebayashi; 洋一竹林; Hiroshi Kanazawa; 博史金澤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-08-29
Filing date: 2005-08-29
Publication date: 2006-01-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice processing system which is simple and low-power consuming, and recognizes voices of a wearer without disturbing wearer's action. <P>SOLUTION: The voice processing system includes a head set with a wireless communication function, and an external device which can wireless communicate with the head set, and the head set with the wireless communication function is provided with a microphone for detecting the voice of the head set wearer and generating a voice signal, a voice recognition means for recognizing the voice signal and generating a discrimination signal corresponding to the content of the recognized voice signal, and a recognition result transmission means for sending the recognition signal generated by the voice recognition means to the external device by the wireless communication, and the external device performs an action corresponding to the received discrimination signal. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声処理システムに関し、特に音声認識機能や音声伝送機能を搭載しつつ、これら機能の操作の簡便化と消費電力の低減を実現できる無線通信機能付きヘッドセットと、このようなヘッドセットと音声認識機能を搭載した機器との間で必要とされる音声処理技術に関する。 The present invention relates to a voice processing system, and in particular, a headset with a wireless communication function capable of simplifying operation of these functions and reducing power consumption while mounting a voice recognition function and a voice transmission function, and such a headset. The present invention relates to a voice processing technology required between a computer and a device equipped with a voice recognition function.

従来、機器を操作するには、スイッチやキーボード等の操作を当然に必要としていた。機器の操作が複雑になるほど、スイッチの個数が増える、操作シーケンスが複雑になるなど、操作性の低下を引き起こすという問題があった。また、両手がふさがっている場合に、スイッチやキーボードの操作ができないという不便もあった。 Conventionally, in order to operate a device, it has been naturally necessary to operate a switch, a keyboard, and the like. As the operation of the device becomes more complicated, the number of switches increases and the operation sequence becomes complicated. In addition, there is an inconvenience that the switch and keyboard cannot be operated when both hands are occupied.

近年、これらの問題を解決するための有力な手段として、音声認識技術が利用され始めている。 In recent years, speech recognition technology has begun to be used as an effective means for solving these problems.

音声認識技術を用いた機器は、機器のユーザが発した音声の内容に呼応して機器の動作を制御できるため、機器の操作を大幅に簡略化できる。さらには、音声により、離れた位置にある家電機器や機械、ロボットなどを制御することが、いつでもどこでも可能になり、機械的（物理的）スイッチを低減できるので、その経済的効果が大きく、ユビキタス時代の重要技術として注目されてきた。 The device using the voice recognition technology can control the operation of the device in response to the content of the voice uttered by the user of the device, so that the operation of the device can be greatly simplified. In addition, it is possible to control home appliances, machines, robots, etc. at remote locations by voice anytime and anywhere, and the mechanical (physical) switches can be reduced, so the economic effect is great and ubiquitous. Has attracted attention as an important technology of the times.

一般に、入力音声を認識する音声認識機能を搭載した機器では、機器に備え付けられたマイクや、ケーブルで接続されたマイクを用いて、ユーザの音声を採取する。機器には、その機器で認識対象となる語彙(認識語彙)の読みが保持されており、その読みに基づいて対応する認識語彙を構成する単語音声モデルをあらかじめ作成し、入力音声の認識のために記憶しておく。この種の音声認識装置での入力音声の認識は、次のように行われる。 Generally, in a device equipped with a voice recognition function for recognizing input voice, a user's voice is collected using a microphone provided in the device or a microphone connected by a cable. The device holds a reading of the vocabulary (recognition vocabulary) that is to be recognized by the device, and based on the reading, a word speech model that composes the corresponding recognition vocabulary is created in advance to recognize the input speech. Remember it. Recognition of input speech by this type of speech recognition apparatus is performed as follows.

まずマイクで検出した音声信号を音響分析して、特徴パラメータ系列を求める。次に、求めた音声信号の特徴パラメータ系列を、あらかじめ作成しておいた各認識語彙を構成する単語音声モデルと照合して、入力音声を認識する。 First, the sound signal detected by the microphone is acoustically analyzed to obtain a feature parameter series. Next, the input speech is recognized by comparing the feature parameter series of the obtained speech signal with a word speech model that forms each recognized vocabulary.

音声認識装置において、機器自体にマイクが設置されている場合、ユーザが機器から離れたままで発声すると、マイクで検出した音声信号に雑音が重畳し、認識性能が低下してしまう。したがって、高精度で認識させるためには、ユーザは機器に近づいて発声しなければならない。マイクがケーブルで機器に接続されている場合も、ユーザから離れた場所にマイクが設置されている場合は、結局マイクロホンまで近づいて発声しなければならない。 In the speech recognition apparatus, when a microphone is installed in the device itself, if the user utters while leaving the device, noise is superimposed on the speech signal detected by the microphone, and the recognition performance deteriorates. Therefore, in order to recognize with high accuracy, the user must approach the device and speak. Even when the microphone is connected to the device via a cable, if the microphone is installed at a location away from the user, the microphone must be approached to speak after all.

機器に接続したマイクが、ユーザの口近くに配置される接話型マイクもあるが、機器とマイクを接続するケーブルがユーザの行動範囲を狭めてしまうという問題がある。ワイヤレス型の接話マイクを使用した場合には、ユーザの行動は制限されないが、マイクロホンで検出した音声信号に電気的ノイズが重畳してしまい、音声認識性能が低下する。 Although there is a close-talking microphone in which a microphone connected to the device is arranged near the user's mouth, there is a problem that a cable connecting the device and the microphone narrows the user's action range. When a wireless close-talking microphone is used, the user's behavior is not limited, but electrical noise is superimposed on the voice signal detected by the microphone, and voice recognition performance is degraded.

通常、音声認識技術では、多量の信号処理と照合処理を行った後に、認識結果が出力される。これらの処理をほぼリアルタイムで行わなければ、機器はユーザの発声完了後に速やかに対応の動作を行うことができない。このため、音声認識技術を搭載した機器は十分な計算能力を持っている必要があり、安価な機器や小型化が必要な機器には搭載しにくいという問題もある。 Usually, in the speech recognition technique, a recognition result is output after performing a large amount of signal processing and collation processing. If these processes are not performed almost in real time, the device cannot perform a corresponding operation promptly after the user's utterance is completed. For this reason, it is necessary for a device equipped with a speech recognition technology to have sufficient calculation capability, and there is a problem that it is difficult to install in a device that is inexpensive or needs to be downsized.

近年、携帯型電子録音装置が利用され始めている。これは、装置が内蔵する音声信号を装置内の記憶領域に保存し、保存した音声を再生するものであり、メモ代わりに音声を記録する用途等に用いられている。また保存した音声を、パーソナルコンピュータ等の機器にケーブルを介して転送して、パーソナルコンピュータに搭載された大容量のハードディスクに音声データを蓄積することができる。 In recent years, portable electronic recording devices have begun to be used. This is for storing an audio signal built in the apparatus in a storage area in the apparatus and reproducing the stored audio, and is used for recording audio instead of a memo. Further, the stored voice can be transferred to a device such as a personal computer via a cable, and the voice data can be stored in a large-capacity hard disk mounted on the personal computer.

パーソナルコンピュータに音声認識機能が搭載されている場合には、蓄積した音声データを音声認識技術で認識して、テキストファイルに変換できる。 When the personal computer has a voice recognition function, the stored voice data can be recognized by voice recognition technology and converted into a text file.

音声メモにおいて、発声された文章の音声認識は、上述した通常の音声認識技術で行われる。すなわち、あらかじめ文章で使用される可能性のある単語を選択しておき、これらの単語を認識語彙とする。このような単語として、数万〜１０万単語程度を選択することが多いが、話題が限定される場合は、これより少なくても構わない。認識語彙の読みから対応する単語の音声モデルをあらかじめ作成しておき、入力音声の認識のために記憶しておく。さらに、これらの単語間のつながりやすさをあらわす言語モデルをあらかじめ作成しておき、入力音声の認識のために記憶しておく。 In a voice memo, voice recognition of a spoken sentence is performed by the normal voice recognition technique described above. That is, words that may be used in a sentence are selected in advance, and these words are used as a recognition vocabulary. In many cases, tens of thousands to 100,000 words are selected as such words, but if the topic is limited, the number may be smaller. A speech model of the corresponding word is created in advance from the reading of the recognized vocabulary and stored for the recognition of the input speech. Furthermore, a language model representing the ease of connection between these words is created in advance and stored for recognition of the input speech.

音声認識は、蓄積された音声データを音響分析して特徴パラメータ系列を求める。次に、求めた音声の特徴パラメータ系列をあらかじめ作成しておいた各認識単語の単語音声モデル及び言語モデルと照合して、入力音声を認識する。 In speech recognition, the accumulated speech data is acoustically analyzed to obtain a feature parameter series. Next, the input speech is recognized by comparing the obtained speech feature parameter series with the word speech model and language model of each recognition word created in advance.

しかし、携帯型電子録音装置では、携帯性を高めるために、内部の記憶領域は半導体メモリで構成されていることが多く、内部に保存できる音声の量は制限される。また、保存された音声をパーソナルコンピュータ等に転送する際には、ケーブルで接続するか、取り外し可能な記録メディアを経由する必要があり、リアルタイムで他機器に音声情報を転送することはできない。 However, in the portable electronic recording device, in order to improve portability, the internal storage area is often configured by a semiconductor memory, and the amount of audio that can be stored inside is limited. In addition, when the stored voice is transferred to a personal computer or the like, it is necessary to connect with a cable or via a removable recording medium, and voice information cannot be transferred to other devices in real time.

また、手がふさがった状態で装置を使用する場合には、ヘッドセット型マイクロホンやクリップ付きマイクロホンを、ケーブルで携帯型電子録音装置に接続する必要がある。ケーブルは行動の妨げになるうえに、その都度の接続が面倒である。 Further, when the device is used in a state where the hand is blocked, it is necessary to connect a headset type microphone or a microphone with a clip to the portable electronic recording device with a cable. In addition to obstructing behavior, cables are troublesome to connect each time.

このように、従来の音声認識技術を用いた機器では、正確に音声を認識するために、常にユーザとマイクの位置関係に注意して使用し、必要に応じてマイクに近寄って発声する必要があった。 As described above, in a device using the conventional voice recognition technology, it is necessary to always pay attention to the positional relationship between the user and the microphone in order to recognize the voice accurately, and to speak near the microphone as necessary. there were.

また、ヘッドセット型マイクロホンを使用する場合には、マイクロホンと機器を接続するケーブルで行動が妨げられるという問題があった。音声認識技術が必要とする計算容量を持たないヘッドセットでは、音声による操作そのものが不可能である。 In addition, when using a headset type microphone, there is a problem that the action is hindered by a cable connecting the microphone and the device. With a headset that does not have the computational capacity required by voice recognition technology, voice operation itself is not possible.

また、携帯型の電子録音装置では、内部に保存できる音声データの量が制限され、保存したデータをリアルタイムで他機器に転送できない。また、マイクをケーブルで接続する必要があり、ケーブルが行動の妨げになる、接続が面倒であるなどの問題があった。 Moreover, in the portable electronic recording device, the amount of audio data that can be stored inside is limited, and the stored data cannot be transferred to other devices in real time. In addition, it is necessary to connect the microphone with a cable, which causes problems such as the cable obstructing the action and the connection is troublesome.

本発明は、上述した問題を克服するために、ユーザの行動を妨げることなく高精度な音声認識技術を実現することのできる音声処理システムを提供する。 The present invention provides a speech processing system capable of realizing a highly accurate speech recognition technique without interfering with user behavior in order to overcome the above-described problems.

また、音声データをリアルタイムで他機器に転送することのできる無線通信機能付きヘッドセットを含む音声処理システムを提供する。 Also provided is a voice processing system including a headset with a wireless communication function that can transfer voice data to other devices in real time.

さらに、機能選択手段によって不要なときに音声認識機能や音声伝達機能を停止する手段を設け、消費電力を低減することのできる無線通信機能付きヘッドセットを含む音声処理システムを提供する。 Furthermore, a voice processing system including a headset with a wireless communication function capable of reducing power consumption by providing means for stopping the voice recognition function and voice transmission function when unnecessary by the function selection means is provided.

さらに、ヘッドセットから音声データをリアルタイムで第２の装置に転送して、第２の装置でその音声を認識することのできる音声処理システムを提供する。さらに第２の装置から第３の装置へと音声認識結果を無線送信することによって、第３の装置の動作を制御する音声処理システムを提供する。 Furthermore, the present invention provides an audio processing system capable of transferring audio data from a headset to a second device in real time and recognizing the audio by the second device. Furthermore, a speech processing system for controlling the operation of the third device by wirelessly transmitting the speech recognition result from the second device to the third device is provided.

上記課題を達成するために、本発明の第１の側面では、無線機能付きヘッドセットは、
（ａ）音声を検出して音声信号を生成するマイクロホン
（ｂ）生成された音声信号を認識する音声認識手段
（ｃ）音声認識手段による認識結果を、無線通信により外部の機器へ送出する認識結果伝送手段
（ｄ）生成された音声信号を音声認識手段で処理するか否かを切り替える機能選択手段
を備える。 In order to achieve the above object, according to the first aspect of the present invention, a headset with a wireless function includes:
(A) Microphone for detecting voice and generating a voice signal (b) Voice recognition means for recognizing the generated voice signal (c) Recognition result for sending the recognition result by the voice recognition means to an external device by wireless communication Transmission means (d) comprises function selection means for switching whether or not the generated voice signal is processed by the voice recognition means.

ヘッドセットと他の機器とをケーブル等で接続する必要がないので、ユーザの行動が制限されることはない。また、ユーザは機能選択手段により、任意で音声認識処理を選択することができる。音声認識処理が選択された場合は、無線通信機能付きヘッドセット内で、簡便かつ低消費電力で認識処理を行う。ヘッドセットと無線通信できる外部の機器に音声認識技術を搭載しなくとも、これらの機器をたとえば音声コマンドにより操作することが可能となる。また、ヘッドセット内部において、簡単な話者認識、文認識、対話理解等を行うことが可能になる。 Since there is no need to connect the headset to another device with a cable or the like, the user's action is not limited. Further, the user can arbitrarily select the voice recognition process by the function selection means. When the voice recognition process is selected, the recognition process is performed easily and with low power consumption in the headset with the wireless communication function. Even if voice recognition technology is not installed in an external device that can communicate wirelessly with the headset, these devices can be operated by voice commands, for example. In addition, simple speaker recognition, sentence recognition, dialogue understanding, and the like can be performed inside the headset.

本発明の第２の側面では、無線通信機能付きヘッドセットは、
（ａ）音声を検出して音声信号を生成するマイクロホン
（ｂ）生成された音声信号を認識する音声認識手段
（ｃ）音声認識手段による認識結果を無線通信により外部の機器へ送出する認識結果伝送手段
（ｄ）生成された音声信号を、無線通信により外部の機器へ送信する音声伝送手段
（ｅ）音声信号を、音声認識手段と音声伝送手段のいずれで処理するかを選択する機能選択手段
を備える。 In the second aspect of the present invention, a headset with a wireless communication function is:
(A) Microphone for detecting voice and generating a voice signal (b) Voice recognition means for recognizing the generated voice signal (c) Recognition result transmission for sending the recognition result by the voice recognition means to an external device by wireless communication Means (d) Voice transmission means for transmitting the generated voice signal to an external device by wireless communication (e) Function selection means for selecting whether the voice signal is processed by the voice recognition means or the voice transmission means Prepare.

好ましくは、機能選択手段は、音声信号を、音声認識手段と音声伝送手段のいずれでも処理しないモードと、音声認識手段と音声伝送手段の双方で処理するモードの少なくとも一方をさらに有する。 Preferably, the function selection unit further includes at least one of a mode in which the voice signal is not processed by either the voice recognition unit or the voice transmission unit, and a mode in which both the voice recognition unit and the voice transmission unit are processed.

ユーザは、機能選択手段を操作することによって、音声認識処理と音声伝送処理を任意で選択することができる。音声認識を選択した場合は、第１の側面で説明したヘッドセットと同様に、ヘッドセット内で少ない演算量で簡便に音声を認識し、たとえば認識した音声コマンドによって遠隔の機器を操作する、音声を文章として認識する、等を行うことができる。一方、音声伝送を選択した場合は、マイクロホンで検出した音声信号を無線伝送した後に、伝送先の機器において詳細な音声認識を行うことができる。この場合、より正確な文認識や、意図理解、話者認識、対話理解を行うことができる。また、音声データの送信先の機器が大容量の記憶装置を有する場合、長時間にわたる音声データを常時蓄積し、それを再生することができ、有用性が増す。 The user can arbitrarily select voice recognition processing and voice transmission processing by operating the function selection means. When voice recognition is selected, the voice is simply recognized with a small amount of calculation in the headset, for example, operating the remote device by the recognized voice command, as in the headset described in the first aspect. Can be recognized as a sentence. On the other hand, when audio transmission is selected, detailed audio recognition can be performed in the transmission destination device after the audio signal detected by the microphone is wirelessly transmitted. In this case, more accurate sentence recognition, intent understanding, speaker recognition, and dialogue understanding can be performed. In addition, when the audio data transmission destination device has a large-capacity storage device, the audio data over a long period of time can be constantly stored and played back, increasing usefulness.

本発明の第３の側面では、無線通信機能付きヘッドセットと、このヘッドセットと無線通信可能な外部装置とを含む音声処理システムを提供する。このシステムを構成する無線通信機能付きヘッドセットは、ヘッドセット装着者の音声を検出して音声信号を生成するマイクロホンと、生成された音声信号を認識し、認識した音声信号の内容に対応する識別信号を生成する音声認識手段と、音声認識手段によって生成された識別信号を無線通信により前記外部装置へ送出する認識結果伝送手段とを備える。一方、外部装置は、ヘッドセットから識別信号を受信したときに、この識別信号に対応する動作を開始する。 According to a third aspect of the present invention, there is provided a voice processing system including a headset with a wireless communication function and an external device capable of wireless communication with the headset. The headset with a wireless communication function constituting this system includes a microphone that detects a voice of a headset wearer and generates an audio signal, an identification corresponding to the content of the recognized audio signal, recognizing the generated audio signal Voice recognition means for generating a signal, and recognition result transmission means for sending the identification signal generated by the voice recognition means to the external device by wireless communication. On the other hand, when the external device receives the identification signal from the headset, the external device starts an operation corresponding to the identification signal.

外部装置は、例えば、複数の識別信号と、これらの識別信号のそれぞれに対応する動作とを関連づけて格納するテーブルを有し、このテーブルを検索することによって、所望の動作を開始する。 The external device has, for example, a table that stores a plurality of identification signals and operations corresponding to the respective identification signals in association with each other, and starts a desired operation by searching this table.

この音声処理システムにより、ヘッドセットと無線通信可能な外部装置は、対応テーブルを格納するだけでよく、構成的な変更をほとんど要さない。ヘッドセットを装着したユーザは、音声コマンドにより、外部装置を操作することができる。 With this voice processing system, an external device capable of wireless communication with the headset need only store the correspondence table, and hardly requires a structural change. A user wearing the headset can operate the external device using a voice command.

本発明の第４の側面では、音声処理システムは、無線通信機能付きヘッドセットと、音声認識機能を有しヘッドセットと無線通信可能な外部装置とを含む。無線通信機能付きヘッドセットは、ヘッドセットの装着者の音声を検出して音声信号を生成するマイクロホンと、音声信号を無線通信により外部装置器へ送信する音声伝送手段とを備える。一方、外部装置は、ヘッドセットから送信された音声信号を受信する音声受信手段と、受信した音声信号を認識する音声認識手段とを備える。 In a fourth aspect of the present invention, a speech processing system includes a headset with a wireless communication function and an external device having a speech recognition function and capable of wireless communication with the headset. The headset with a wireless communication function includes a microphone that detects a voice of a wearer of the headset and generates an audio signal, and an audio transmission unit that transmits the audio signal to an external device by wireless communication. On the other hand, the external device includes a voice receiving unit that receives a voice signal transmitted from the headset, and a voice recognition unit that recognizes the received voice signal.

外部装置の音声認識手段は、たとえば、受信した音声信号の内容に対応する識別信号を生成し、外部装置は、生成された識別信号に対応する動作を行う。 For example, the voice recognition unit of the external device generates an identification signal corresponding to the content of the received voice signal, and the external device performs an operation corresponding to the generated identification signal.

あるいは、音声認識手段は、生成した識別信号を文字列に変換して出力する。この場合、外部装置は、表示部をさらに有し、音声認識結果としての文字列を表示する。 Alternatively, the voice recognition means converts the generated identification signal into a character string and outputs it. In this case, the external device further includes a display unit, and displays a character string as a voice recognition result.

このシステムでは、外部装置に音声認識機能を持たせる。外部装置が十分な容量と演算能力を有する場合、より難易度の高い音声認識を行うことが可能になる。 In this system, an external device has a voice recognition function. When the external device has a sufficient capacity and computing capacity, it is possible to perform voice recognition with a higher degree of difficulty.

また、外部装置にテキスト変換機能と表示機能を持たせることにより、ヘッドセットからの受信信号を受信しながら、ほとんどリアルタイムで音声を文字認識し、認識結果を画面に表示することが可能になる。 Further, by providing the external device with a text conversion function and a display function, it is possible to recognize characters in speech almost in real time while receiving a reception signal from the headset and display the recognition result on the screen.

本発明の第５の側面では、音声処理システムは、無線通信機能付きヘッドセットと、音声認識機能を有してヘッドセットと無線通信可能な第１の外部装置と、第１の外部装置と無線通信可能な第２の外部装置とを含む。無線通信機能付きヘッドセットは、ヘッドセットの装着者の音声を検出して音声信号を生成するマイクロホンと、この音声信号を無線通信により第１の外部装置へ送信する音声伝送手段とを備える。第１の外部装置は、ヘッドセットから送信された音声信号を受信する音声受信手段と、受信した音声を認識し、認識した音声信号の内容に対応する識別信号を特定する音声認識手段と、特定した識別信号を無線通信により第２の外部装置へ送信する認識結果伝送手段とを備える。第２の外部装置は、第１の外部装置から受信した単語ＩＤに対応する動作を行う。 In a fifth aspect of the present invention, a speech processing system includes a headset with a wireless communication function, a first external device having a speech recognition function and capable of wireless communication with the headset, and a wireless communication with the first external device. A second external device capable of communication. The headset with a wireless communication function includes a microphone that detects a voice of a wearer of the headset and generates an audio signal, and an audio transmission unit that transmits the audio signal to the first external device by wireless communication. The first external device includes: a voice receiving unit that receives a voice signal transmitted from the headset; a voice recognition unit that recognizes the received voice and identifies an identification signal corresponding to the content of the recognized voice signal; And a recognition result transmission means for transmitting the identified signal to the second external device by wireless communication. The second external device performs an operation corresponding to the word ID received from the first external device.

このシステムによれば、ヘッドセットで採取したユーザの音声を、容量と演算能力の高い第１の外部装置を用いて音声認識し、この第１の外部装置を介して、第２の外部装置の操作を制御する。これにより、より複雑な音声処理が可能になる。 According to this system, the user's voice collected by the headset is recognized by using the first external device having a high capacity and computing capacity, and the second external device is connected via the first external device. Control the operation. As a result, more complicated audio processing can be performed.

本発明によれば、無線通信機能付きヘッドセットに、音声認識手段、音声伝送手段、それらを切り替えるための機能選択手段を備えることによって、ユーザの行動を妨げることなく、ユーザの意図に応じた音声認識をすることのできるヘッドセットが提供される。 According to the present invention, the headset with wireless communication function is provided with voice recognition means, voice transmission means, and function selection means for switching between them, so that the voice according to the user's intention can be obtained without disturbing the user's action. A headset capable of recognition is provided.

ヘッドセット内部において、簡便で低消費電力の音声認識を行うとともに、ヘッドセット外部の機器に音声データを伝送した場合は、難易度の高いより正確な音声認識を行うことができる。 In the headset, simple and low power consumption voice recognition is performed, and when voice data is transmitted to a device outside the headset, more accurate voice recognition with high difficulty can be performed.

また、音声認識処理機能と、音声伝送処理機能をユーザの選択により任意で一時停止することができ、無線通信機能付きヘッドセットの消費電力を節減することが可能となる。 Further, the voice recognition processing function and the voice transmission processing function can be arbitrarily paused by the user's selection, and the power consumption of the headset with the wireless communication function can be reduced.

さらに、ヘッドセットから音声データを大容量の第２の装置に転送した場合は、第２の装置においてリアルタイムで受信音声を認識し、テキスト変換、編集、保存、再生などを可能にする。これにより、システムの利便性がいっそう向上する。 Furthermore, when the voice data is transferred from the headset to the second device having a large capacity, the second device recognizes the received voice in real time and enables text conversion, editing, storage, reproduction, and the like. This further improves the convenience of the system.

本発明では音声認識機能を搭載した無線機能付ヘッドセットをウェアラブルおよびユビキタス時代最も人間に身近な機器として位置付けており、音声認識の高性能化と応用を拡大するとともに、ヘッドセットの小型低価格化を可能とする。 In the present invention, a wireless headset equipped with a voice recognition function is positioned as the most human-friendly device in the wearable and ubiquitous era, expanding the performance and application of voice recognition, and reducing the size and cost of the headset. Is possible.

また、人間にとって最も身近なヘッドセットと音声入力を利用することにより、高齢者や障害者の情報機器システムやネットワーク利用が加速され、さらには、各種機器システムとのインタラクションや、各種サービス・コンテンツとの利用が可能となる。結果として、各種機器システム産業、情報通信メディア産業、サービス産業の活性化に貢献できる。 In addition, the use of headsets and voice input that are most familiar to humans will accelerate the use of information equipment systems and networks for the elderly and persons with disabilities, as well as interaction with various equipment systems and various services and content. Can be used. As a result, it can contribute to the activation of various equipment system industries, information and communication media industries, and service industries.

以下、本発明の実施形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

(第1実施形態)
図１および２は、本発明の第１実施形態に係る無線通信機能付きヘッドセット１０の外観と、その概略システム構成を示す。無線通信機能付きヘッドセット１０は、ヘッドセット１０の装着者（ユーザ）の発する音声を検出して電気的な音声信号を生成するマイクロホン１３と、この音声信号をデジタル変換を経て音声認識する音声認識部２３と、音声認識部２３による認識結果を無線通信モジュール１７から外部の機器に送信する認識結果伝送手段２５と、マイクロホン１３で検出した音声信号を音声認識処理するか否かを選択する機能選択手段２０を備える。機能選択手段は機能選択スイッチ１４を含み、ユーザは、機能選択スイッチ１４を操作することによって、任意で音声認識処理を選択できる。 (First embodiment)
1 and 2 show an appearance of a headset 10 with a wireless communication function according to a first embodiment of the present invention and a schematic system configuration thereof. The headset 10 with a wireless communication function includes a microphone 13 that detects an audio generated by a wearer (user) of the headset 10 and generates an electrical audio signal, and an audio recognition that recognizes the audio signal through digital conversion. Unit 23, recognition result transmission means 25 for transmitting the recognition result by the voice recognition unit 23 from the wireless communication module 17 to an external device, and function selection for selecting whether or not the voice signal detected by the microphone 13 is subjected to voice recognition processing. Means 20 are provided. The function selection means includes a function selection switch 14, and the user can arbitrarily select a voice recognition process by operating the function selection switch 14.

無線通信機能付きヘッドセット（以下、場合に応じて単に「ヘッドセット」と称する）１０は、左右の耳あて１１を柔軟なフレームで接続した形状をしており、ユーザの頭部に装着して使用する。一方の耳あてからはアーム１５が伸びており、その先端にマイクロホン１３がついている。マイクロホンは、ユーザがヘッドセット１０を装着したときに、ユーザのほぼ口元に位置し、周囲ノイズの重畳が少ない音声を検出する。 A headset with wireless communication function (hereinafter, simply referred to as “headset” in some cases) 10 has a shape in which left and right earpieces 11 are connected by a flexible frame, and is attached to a user's head. use. An arm 15 extends from one of the ears, and a microphone 13 is attached to the tip of the arm 15. When the user wears the headset 10, the microphone is located almost at the mouth of the user and detects voice with little superposition of ambient noise.

耳あて１１の中には、スピーカ(左右)１７、ＣＰＵボード１６、無線通信モジュール１７、バッテリー１２が内蔵されている。いずれか一方の耳あての外側に機能選択スイッチ１４が配置され、上述したように、ユーザの意思で音声認識処理を行うか否かを選択できる構成となっている。なお、図示はしないが各要素は必要に応じてケーブルで接続されている。 In the ear pad 11, speakers (left and right) 17, a CPU board 16, a wireless communication module 17, and a battery 12 are incorporated. The function selection switch 14 is arranged outside either one of the ear contacts, and as described above, it is possible to select whether or not to perform the voice recognition processing with the intention of the user. Although not shown, each element is connected by a cable as necessary.

ＣＰＵボード１６には、ＣＰＵとその周辺回路、メモリ（不図示）、Ａ／Ｄ変換器２１、機能選択部１９などが搭載されている。Ａ／Ｄ変換器２１は、マイクロホン１３で検出したアナログ音声信号をデジタル音声信号に変換し、変換結果をＣＰＵに入力する。機能選択部１９は、機能選択スイッチ１４の状態を検出してＣＰＵに通知する。 The CPU board 16 includes a CPU and its peripheral circuits, a memory (not shown), an A / D converter 21, a function selection unit 19, and the like. The A / D converter 21 converts the analog audio signal detected by the microphone 13 into a digital audio signal, and inputs the conversion result to the CPU. The function selection unit 19 detects the state of the function selection switch 14 and notifies the CPU.

無線通信モジュール１７は、外部の機器とデジタル無線通信を行う。より具体的には、ＣＰＵボード１６から送られてきた信号を、外部の他の機器（不図示）に送信し、他の機器から発信された信号を受信してＣＰＵボード１６に転送する送受信機能を持つ。 The wireless communication module 17 performs digital wireless communication with an external device. More specifically, a transmission / reception function for transmitting a signal sent from the CPU board 16 to another external device (not shown), receiving a signal transmitted from the other device, and transferring the signal to the CPU board 16 have.

音声認識手段はＣＰＵボード１６上のＡ／Ｄ変換器２１および音声認識部２３を含む。音声伝送手段２５は、ＣＰＵボード１６上のＣＰＵ及びその周辺回路と、無線通信モジュール１７とで実現される。機能選択手段２０は機能選択スイッチ１４と、ＣＰＵボード１６上のＣＰＵ及び周辺回路で実現され、その出力が音声認識部２３に接続される。上述したように、ユーザが機能選択スイッチ１４を操作することにより、音声認識部の処理動作を制御することができる。 The voice recognition means includes an A / D converter 21 and a voice recognition unit 23 on the CPU board 16. The audio transmission means 25 is realized by the CPU on the CPU board 16 and its peripheral circuits and the wireless communication module 17. The function selection means 20 is realized by the function selection switch 14, the CPU and peripheral circuits on the CPU board 16, and its output is connected to the voice recognition unit 23. As described above, when the user operates the function selection switch 14, the processing operation of the voice recognition unit can be controlled.

図１および２に示すヘッドセット１０の概観およびシステム構成は本発明の技術思想を実現するための一例に過ぎず、このような構成に限定されるわけではない。例えば、音声認識手段として、専用の音声認識処理を行う回路を備えていてもよい。また、例えば、信号処理を高速で行うためのＤＳＰを備えていてもよい。さらに、例えば、機能選択スイッチ１４は２個に分割して両耳あてに配置してもよい。 The overview and system configuration of the headset 10 shown in FIGS. 1 and 2 are merely examples for realizing the technical idea of the present invention, and are not limited to such a configuration. For example, a circuit for performing dedicated voice recognition processing may be provided as voice recognition means. Further, for example, a DSP for performing signal processing at high speed may be provided. Further, for example, the function selection switch 14 may be divided into two and arranged at both ears.

図３は、機能選択スイッチ１４の一例を示す。ユーザは必要に応じて、機能選択スイッチ１４を操作して、２つの状態を切り替えることができる。ここでは、ユーザが、マイクロホン１３で検出した音声信号を音声認識部２３で処理することを選択した場合には状態１、処理しないことを選択した場合には状態２とする。 FIG. 3 shows an example of the function selection switch 14. The user can switch between the two states by operating the function selection switch 14 as necessary. Here, when the user selects to process the voice signal detected by the microphone 13 with the voice recognition unit 23, the state 1 is set, and when the user selects not to process, the state 2 is set.

機能選択スイッチ１４は、たとえば２個の押しボタンスイッチを有し、常にいずれか一方のみがＯＮになるタイプのスイッチとする。ユーザが押しボタンスイッチ３１を押してＯＮにした場合には、機能選択スイッチ１４は状態１になる。これに連動して、押しボタンスイッチ３２は自動的にＯＦＦになる。逆に、ユーザが押しボタンスイッチ３２を押してＯＮにした場合には、機能選択スイッチ１４は状態２になり、他方の押しボタンスイッチ３１は自動的にＯＦＦになる。機能選択部２０は機能選択スイッチ１４の状態に応じて、状態１であれば音声認識動作信号を音声認識部２３に出力し、状態２であれば音声認識停止信号を音声認識部２３に出力する。 The function selection switch 14 is, for example, a switch of a type that has two push button switches and only one of them is always ON. When the user pushes the push button switch 31 to turn it on, the function selection switch 14 is in the state 1. In conjunction with this, the push button switch 32 is automatically turned OFF. Conversely, when the user presses the push button switch 32 to turn it on, the function selection switch 14 is in the state 2 and the other push button switch 31 is automatically turned off. The function selection unit 20 outputs a speech recognition operation signal to the speech recognition unit 23 in the state 1 according to the state of the function selection switch 14, and outputs a speech recognition stop signal to the speech recognition unit 23 in the state 2. .

音声認識部２３は、機能選択部１９の出力が音声認識動作信号の場合には、マイクロホンで検出した音声信号を認識して、その出力を認識結果伝送手段２５に送る。機能選択部１９の出力が音声認識停止信号の場合には、その動作を停止する。 When the output of the function selection unit 19 is a voice recognition operation signal, the voice recognition unit 23 recognizes the voice signal detected by the microphone and sends the output to the recognition result transmission unit 25. When the output of the function selection unit 19 is a voice recognition stop signal, the operation is stopped.

図４は、音声認識部２３の内部構成を示す。Ａ／Ｄ変換器２１の出力は、まず認識用信号遮断機４１に入力される。認識用信号遮断機４１の動作は、機能選択部１９の出力信号によって制御される。機能選択部１９の出力が音声認識動作信号である場合は、Ａ／Ｄ変換器２１から出力される信号を音響分析部に入力する。機能選択部の出力信号が音声認識停止信号の場合には、Ａ／Ｄ変換器２１からの出力を遮断する。 FIG. 4 shows the internal configuration of the voice recognition unit 23. The output of the A / D converter 21 is first input to the recognition signal blocker 41. The operation of the recognition signal blocker 41 is controlled by the output signal of the function selection unit 19. When the output of the function selection unit 19 is a voice recognition operation signal, the signal output from the A / D converter 21 is input to the acoustic analysis unit. When the output signal of the function selection unit is a voice recognition stop signal, the output from the A / D converter 21 is cut off.

より具体的には、機能選択部１９の出力が音声認識動作信号である場合、認識用信号遮断機４１が閉じられ、Ａ／Ｄ変換器２１から出力されるデジタル音声信号は、音響分析部４３に入力される。音響分析部４３は、入力された音声を特徴パラメータに変換する。音声認識に使用される代表的な特徴パラメータとしては、バンドパスフィルタやフーリエ変換で求めることができるパワースペクトルや、ＬＰＣ(線形予測)分析によって求めたケプストラム係数などがよく用いられるが、ここではその特徴パラメータの種類は問わない。音響分析部４３は、一定時間ごとに入力音声を特徴パラメータに変換する。したがってその出力は特徴パラメータの時系列(特徴パラメータ系列)となる。この特徴パラメータ系列はモデル照合部４５に供給される。 More specifically, when the output of the function selection unit 19 is a speech recognition operation signal, the recognition signal blocker 41 is closed, and the digital speech signal output from the A / D converter 21 is the acoustic analysis unit 43. Is input. The acoustic analysis unit 43 converts the input voice into feature parameters. As typical characteristic parameters used for speech recognition, a power spectrum that can be obtained by a band pass filter or Fourier transform, a cepstrum coefficient obtained by LPC (linear prediction) analysis, etc. are often used. The type of feature parameter does not matter. The acoustic analysis unit 43 converts the input speech into feature parameters at regular time intervals. Therefore, the output is a time series of characteristic parameters (characteristic parameter series). The feature parameter series is supplied to the model matching unit 45.

一方、認識語彙記憶部４７には、認識語彙を構成する各単語の音声モデルを作成するために必要な単語の読み情報と、各単語が認識されたときに認識結果に対応する識別子、たとえばコマンドＩＤが記憶されている。なお、本実施形態では、ヘッドセット内の音声認識として、単語認識による音声制御を例にとって説明するが、本発明はこれに限定されるものではない。ヘッドセット内の音声認識部２３は、連続単語認識、文認識、単語スポッティング、音声意図理解など、演算量、メモリ容量、消費電力が少ない音声認識を行い、その結果を無線通信により外部機器システムに伝送することができる。 On the other hand, in the recognized vocabulary storage unit 47, reading information of words necessary for creating a speech model of each word constituting the recognized vocabulary and an identifier corresponding to the recognition result when each word is recognized, for example, a command ID is stored. In the present embodiment, voice control by word recognition is described as an example of voice recognition in the headset, but the present invention is not limited to this. The voice recognition unit 23 in the headset performs voice recognition with a small amount of computation, memory capacity and power consumption, such as continuous word recognition, sentence recognition, word spotting, and voice intention understanding, and the result is transmitted to an external device system by wireless communication. Can be transmitted.

認識モデル作成・記憶部４９は、認識語彙記憶部４７に記憶された認識語彙にしたがって、各単語の音声モデルと、各単語が認識結果となったときに認識結果として照合部４５から出力される識別信号としての単語ＩＤをあらかじめ記憶しておく。もちろん、単語認識以外の認識を行う場合は、それに応じた識別信号を格納する。 The recognition model creation / storage unit 49 outputs the speech model of each word according to the recognition vocabulary stored in the recognition vocabulary storage unit 47 and the collation unit 45 as a recognition result when each word becomes a recognition result. A word ID as an identification signal is stored in advance. Of course, when recognition other than word recognition is performed, an identification signal corresponding to the recognition is stored.

モデル照合部４５は、音声モデル作成・記憶部４９に記憶しておいた認識対象とする単語の各音声モデルと、上記入力音声の特徴パラメータ系列との類似度あるいは距離を求め、類似度が最大(あるいは距離が最小)の音声モデルと対応付けられた単語ＩＤを認識結果として出力する。 The model matching unit 45 obtains the similarity or distance between each speech model of the word to be recognized stored in the speech model creation / storage unit 49 and the feature parameter series of the input speech, and the similarity is maximum. The word ID associated with the speech model (or the smallest distance) is output as a recognition result.

モデル照合部４５の照合方法としては、音声モデルも特徴パラメータ系列で表現しておき、ＤＰ(動的計画法)で音声モデルの特徴パラメータ系列と入力音声の特徴パラメータ系列の距離を求める方法や、ＨＭＭ(隠れマルコフモデル)を用いて音声モデルを表現しておき、入力音声の特徴パラメータ系列が入力されたときの各音声モデルの確率を計算する手法などが広く使用されているが、特に手法は問わない。 As a matching method of the model matching unit 45, a voice model is also expressed by a feature parameter series, and a distance between the feature parameter series of the voice model and the feature parameter series of the input voice is obtained by DP (dynamic programming), A method of expressing a speech model using an HMM (Hidden Markov Model) and calculating the probability of each speech model when a feature parameter sequence of the input speech is input is widely used. It doesn't matter.

モデル照合部４５から出力された単語ＩＤは、そのまま音声認識部２３の出力となり、認識結果伝送手段２５（図２参照）に入力される。認識結果伝送部２５は、無線通信モジュール１７の送信機能を用いて、他の機器に単語ＩＤを無線送信する。 The word ID output from the model matching unit 45 is directly output from the speech recognition unit 23 and input to the recognition result transmission unit 25 (see FIG. 2). The recognition result transmission unit 25 wirelessly transmits the word ID to another device using the transmission function of the wireless communication module 17.

機能選択部１９の出力が音声認識停止信号である場合は、認識用信号遮断機４１は開いており、Ａ／Ｄ信号は音響分析部４３に入力されない。したがって、音響分析部４３からの出力はない。同様に、モデル照合部４５への入力も無いため、モデル照合部４５からの出力もない。 When the output of the function selection unit 19 is a voice recognition stop signal, the recognition signal blocker 41 is open, and the A / D signal is not input to the acoustic analysis unit 43. Therefore, there is no output from the acoustic analysis unit 43. Similarly, since there is no input to the model matching unit 45, there is no output from the model matching unit 45.

このように、ヘッドセット１０のユーザが、音声認識手段で処理をしないことを選択した場合（すなわち機能選択スイッチ１４の状態が状態２の場合）、音響分析部４３、モデル照合部４５、認識結果伝送手段２５による一連の処理は行われない。この場合、演算量は大きく減少する。音響分析部４３、モデル照合部４５、認識結果伝送手段２５を実現しているＣＰＵが演算能力および使用電力を一時的に低減する省電力モードを持っている場合には、機能選択スイッチ１４の状態が状態２になったとき、あるいは音声認識停止信号を検出したときに、ＣＰＵを省電力モードに移行させることが可能である。ユーザが音声信号を音声認識手段で処理しないことを選択している間は、ＣＰＵが省電力モードで動作するため、バッテリーに対する負荷が減少し、無線通信機能付きヘッドセットの動作時間を延長することができる。機能選択スイッチ１４が状態２を脱した時（すなわち音声認識動作信号が出力されたとき）には、速やかにＣＰＵを通常モードに移行させ、本来の演算能力が発揮できる状態とする。 As described above, when the user of the headset 10 selects not to perform processing by the voice recognition means (that is, when the function selection switch 14 is in the state 2), the acoustic analysis unit 43, the model matching unit 45, and the recognition result. A series of processing by the transmission means 25 is not performed. In this case, the calculation amount is greatly reduced. When the CPU realizing the acoustic analysis unit 43, the model matching unit 45, and the recognition result transmission unit 25 has a power saving mode for temporarily reducing the computing capacity and power consumption, the state of the function selection switch 14 When the state becomes the state 2 or when the voice recognition stop signal is detected, the CPU can be shifted to the power saving mode. While the user chooses not to process the voice signal with the voice recognition means, the CPU operates in the power saving mode, so the load on the battery is reduced and the operating time of the headset with wireless communication function is extended. Can do. When the function selection switch 14 leaves the state 2 (that is, when a voice recognition operation signal is output), the CPU is immediately shifted to the normal mode so that the original computing ability can be exhibited.

図５は、ヘッドセット内に設けられた認識語彙記憶部４７の記憶内容の一例を示す。この例では、ヘッドセット１０を装着したユーザが、音声コマンドでエアコンの制御を行う。従って、ユーザの発した音声を音声認識部２３が認識した結果は、無線通信によりエアコンに送信される。 FIG. 5 shows an example of the stored contents of the recognized vocabulary storage unit 47 provided in the headset. In this example, the user wearing the headset 10 controls the air conditioner with a voice command. Accordingly, the result of the voice recognition unit 23 recognizing the voice uttered by the user is transmitted to the air conditioner by wireless communication.

図５の例では、認識語彙として、「えあこんつける」、「えあこんとめる」、「おんどあげる」、「おんどさげる」を格納し、各語彙にそれぞれ「０１」、「０２」、「０３」、「０４」の単語ＩＤが与えられている。ユーザが発した「エアコンつける」という音声がヘッドセット１０の音声認識部２３で認識された場合、ＩＤ「０１」がエアコンに対して無線送信されることになる。 In the example of FIG. 5, “Eakontsu”, “Eatontome”, “Ondo Rae”, and “Ondo Sagaru” are stored as recognition vocabulary, and “01” and “02” are stored in each vocabulary. , “03”, “04”. When the voice recognition unit 23 of the headset 10 recognizes the voice “turn on air conditioner” issued by the user, the ID “01” is wirelessly transmitted to the air conditioner.

認識語彙記憶部４７の記憶内容にしたがって、音声モデル作成・記憶部４９の記憶内容が作成される。図５の記憶内容の例では、「えあこんつける」、「えあこんとめる」、「おんどあげる」、「おんどさげる」の各言葉に対応する音響モデルが作成され、それぞれの言葉の識別信号（単語ＩＤ）と組になって記憶される。 In accordance with the stored content of the recognized vocabulary storage unit 47, the stored content of the speech model creation / storage unit 49 is created. In the example of stored contents in Fig. 5, acoustic models corresponding to the words "Eakontsu", "Eatontome", "Ondo Rae", and "Ondo Saeru" are created. A pair with an identification signal (word ID) is stored.

一方、エアコンは、図６に示すように、それぞれの単語ＩＤを、それに対応する動作と組にして記憶している。したがって、ヘッドセットから音声認識結果（すなわち単語ＩＤ）を受信すると、その単語ＩＤに対応した動作を行う。 On the other hand, as shown in FIG. 6, the air conditioner stores each word ID in combination with the corresponding operation. Therefore, when a voice recognition result (ie, word ID) is received from the headset, an operation corresponding to the word ID is performed.

図７（ａ）は、ヘッドセットのユーザが、機能切り替えスイッチ１４によって音声認識処理モードを選択している状態で、「エアコンつける」と発声したところを示している。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。機能選択スイッチ１４の状態が状態１であるため、機能選択手段１９は音声認識動作信号を出力している。したがって認識用信号遮断機４１は閉になっており、デジタル信号は音声分析部４３に入力されて特徴量パラメータ系列に変換され、照合部４５に入力される。照合部４５は入力された特徴パラメータ系列と、音響モデル作成・記憶部４９に記憶された各単語の音声モデルを照合する。その結果、「えあこんつける」に対応する音声モデルの類似度がもっとも高くなった場合には、照合部４５は認識結果として単語ＩＤ「０１」を出力する。 FIG. 7A shows a case where the user of the headset utters “turn on the air conditioner” in a state where the voice recognition processing mode is selected by the function changeover switch 14. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. Since the state of the function selection switch 14 is state 1, the function selection unit 19 outputs a voice recognition operation signal. Accordingly, the recognition signal blocker 41 is closed, and the digital signal is input to the voice analysis unit 43, converted into a feature parameter series, and input to the collation unit 45. The collation unit 45 collates the input feature parameter series with the speech model of each word stored in the acoustic model creation / storage unit 49. As a result, when the similarity of the speech model corresponding to “Eakontsu” is the highest, the collation unit 45 outputs the word ID “01” as the recognition result.

単語ＩＤ「０１」は認識結果伝送手段２５に入力され、エアコンに単語ＩＤ「０１」が送信される。 The word ID “01” is input to the recognition result transmission means 25, and the word ID “01” is transmitted to the air conditioner.

エアコンは単語ＩＤ「０１」を受信すると、図６の対応テーブルにしたがって、エアコン機能の動作を開始する。図７（ｂ）は、ヘッドセットのユーザが、機能切り替えスイッチ１４で音声認識処理しないモードを選択している状態で、「エアコンつける」と発声したところを示している。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。機能選択スイッチ１４が状態２であるため、機能選択手段１９は音声認識停止信号を出力している。したがって認識用信号遮断機４１は開になっており、デジタル信号は音声分析部４３に入力されない。この場合、認識結果は得られず、エアコンに認識結果は送信されない。エアコンは動作を開始しない。 When the air conditioner receives the word ID “01”, the operation of the air conditioner function is started according to the correspondence table of FIG. FIG. 7B shows a case where the user of the headset utters “turn on air conditioner” in a state where the function changeover switch 14 selects a mode in which voice recognition processing is not performed. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. Since the function selection switch 14 is in the state 2, the function selection means 19 outputs a voice recognition stop signal. Therefore, the recognition signal blocker 41 is open, and the digital signal is not input to the voice analysis unit 43. In this case, the recognition result is not obtained and the recognition result is not transmitted to the air conditioner. The air conditioner does not start operation.

上述した無線通信機能付きヘッドセット１０は、付属のマイクロホン１３を使ってユーザの音声を検出する。付属マイクロホン１３は、ユーザの口付近に配置されるため、マイクロホンで検出した音声信号は周辺ノイズの重畳が少なく、その音声を認識する場合に高い認識性能を得ることができる。 The above-described headset 10 with the wireless communication function detects the user's voice using the attached microphone 13. Since the attached microphone 13 is disposed in the vicinity of the user's mouth, the sound signal detected by the microphone has little superposition of ambient noise, and high recognition performance can be obtained when recognizing the sound.

認識された音声コマンドを無線通信により他の機器に送信するので、ケーブルを必要とせず、ユーザの行動が妨げられることはない。 Since the recognized voice command is transmitted to another device by wireless communication, a cable is not required and the user's action is not hindered.

ヘッドセット１０の側で音声の認識を行うため、このヘッドセットと無線通信できる機能を持つ機器は、音声認識技術を搭載しなくても、ユーザが発する音声で操作することが可能になる。 Since voice recognition is performed on the headset 10 side, a device having a function capable of wireless communication with the headset can be operated with voice generated by the user without installing voice recognition technology.

さらに、音声認識手段で処理するか否かを選択する機能選択手段を備えているため、ユーザは自分の意思で、自分が発した音声を音声認識処理しないことが選択できる。音声認識手段の動作中は大量の計算をリアルタイムで行って検出した音声信号を処理するために、高速な動作クロックで演算装置を駆動する必要があるが、音声認識手段で音声を処理しない場合には音声認識にかかわる計算をする必要がなくなり、演算装置の動作クロックを低下させることが可能である。演算装置は、動作クロックが高いほどその消費電力が高くなるため、音声認識手段での処理を停止させることによって、無線通信機能付きヘッドセットの消費電力を大幅に低下させることが可能となる。無線通信機能付きヘッドセットは、外部から電力の供給を受けられず、電池もしくは蓄電池により動作する。したがって、消費電力が低下することは、無線通信機能付きヘッドセットの動作時間を延長できることになり、無線通信機能付きヘッドセットの有用性が向上する。 Further, since the function selection means for selecting whether or not to process by the voice recognition means is provided, the user can select not to perform the voice recognition process on the voice he / she made by his / her own intention. While the voice recognition means is operating, it is necessary to drive the arithmetic unit with a high-speed operation clock in order to process a large amount of calculation in real time and process the detected voice signal, but when the voice recognition means does not process the voice Can eliminate the need for calculations related to speech recognition, and can reduce the operation clock of the arithmetic unit. Since the arithmetic device has higher power consumption as the operation clock is higher, it is possible to significantly reduce the power consumption of the headset with the wireless communication function by stopping the processing in the voice recognition means. The headset with a wireless communication function cannot be supplied with power from the outside, and operates with a battery or a storage battery. Therefore, when the power consumption is reduced, the operation time of the headset with the wireless communication function can be extended, and the usefulness of the headset with the wireless communication function is improved.

(第２実施形態)
図８は本発明の第２実施形態に係るヘッドセットのシステム構成例を示す。第１実施形態では、音声信号は、音声認識部で簡便に分析、照合され、ユーザが発した語彙に対応する識別（ＩＤ）信号が、制御対象である外部の機器に無線送信される構成を示した。第２実施形態では、ヘッドセット内での音声認識に加え、音声認識前の音声データをリアルタイムで他の機器に無線送信する構成例を説明する。 (Second embodiment)
FIG. 8 shows a system configuration example of a headset according to the second embodiment of the present invention. In the first embodiment, a voice signal is simply analyzed and verified by a voice recognition unit, and an identification (ID) signal corresponding to a vocabulary issued by a user is wirelessly transmitted to an external device to be controlled. Indicated. In the second embodiment, a configuration example in which voice data before voice recognition is wirelessly transmitted to other devices in real time in addition to voice recognition in the headset will be described.

まず、マイクロホン１３で検出した音声信号は、Ａ／Ｄ変換器２１に入力され、アナログ信号からデジタル音声信号に変換される。デジタル音声信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３に入力される。 First, an audio signal detected by the microphone 13 is input to the A / D converter 21 and converted from an analog signal to a digital audio signal. The digital voice signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択手段５０は、機能選択スイッチ５１と機能選択部１９とで構成される。機能選択スイッチ５１を操作して、ユーザは必要に応じて２つの状態を切り替えることができる。ここでは、マイクロホンで検出した音声信号を音声認識部２３で処理することを選択した場合には状態１、マイクロホンで検出した音声信号を音声伝送手段５３で処理することを選択した場合には状態２となることにする。 The function selection unit 50 includes a function selection switch 51 and a function selection unit 19. By operating the function selection switch 51, the user can switch between two states as necessary. Here, state 1 is selected when the speech signal detected by the microphone is processed by the speech recognition unit 23, and state 2 is selected when the speech signal detected by the microphone is processed by the speech transmission means 53. Will be.

図９は、機能選択スイッチ５１の一例を示す。機能選択スイッチ５１には、２個の押しボタンスイッチがついている。この２個の押しボタンスイッチは常にいずれか一方のみがＯＮになるようになっている。ユーザが押しボタンスイッチ５１を押してＯＮにした場合には、機能選択スイッチは状態1になる。これに連動して押しボタンスイッチ１０１は自動的にＯＦＦになる。ユーザが押しボタンスイッチ１０２を押してＯＮにした場合には、機能選択スイッチは状態２になる。これに連動して押しボタンスイッチ１０１は自動的にＯＦＦになる。機能選択部１９は、機能選択スイッチ５１が状態１にある場合は、音声認識部２３に音声認識動作信号を出力すると同時に、音声伝送手段５３に対しては音声伝送停止信号を出力する。機能選択スイッチ５１が状態２の場合は、音声認識部２３に音声認識停止信号を出力すると同時に、音声伝送手段５３に音声伝送動作信号を出力する。音声認識部２３の動作は、第1実施形態で説明したのと同様である。 FIG. 9 shows an example of the function selection switch 51. The function selection switch 51 has two push button switches. Only one of these two push button switches is always ON. When the user presses the push button switch 51 to turn it on, the function selection switch is in the state 1. In conjunction with this, the push button switch 101 is automatically turned OFF. When the user presses the push button switch 102 to turn it ON, the function selection switch is in the state 2. In conjunction with this, the push button switch 101 is automatically turned OFF. When the function selection switch 51 is in the state 1, the function selection unit 19 outputs a voice recognition operation signal to the voice recognition unit 23 and simultaneously outputs a voice transmission stop signal to the voice transmission unit 53. When the function selection switch 51 is in the state 2, a voice recognition stop signal is output to the voice recognition unit 23, and at the same time, a voice transmission operation signal is output to the voice transmission unit 53. The operation of the voice recognition unit 23 is the same as that described in the first embodiment.

図１０は、音声伝送部手段５３の内部構成を示す。 FIG. 10 shows an internal configuration of the voice transmission unit 53.

Ａ／Ｄ変換器２１でデジタル信号に変換された音声信号は、まず伝送用信号遮断機５５に入力される。伝送用信号遮断機５５は、機能選択部１９から出力信号が伝送動作信号の場合には閉じられ、Ａ／Ｄ変換器２１から出力される信号を、音声符号化部５７に入力する。機能選択部１９の出力信号が伝送停止信号の場合には、伝送信号遮断器５５は開き、Ａ／Ｄ変換器２１からの出力を遮断する。 The audio signal converted into a digital signal by the A / D converter 21 is first input to the transmission signal blocker 55. The transmission signal blocker 55 is closed when the output signal from the function selection unit 19 is a transmission operation signal, and the signal output from the A / D converter 21 is input to the speech encoding unit 57. When the output signal of the function selection unit 19 is a transmission stop signal, the transmission signal breaker 55 is opened and the output from the A / D converter 21 is blocked.

音声符号化部５７は、伝送用遮断器５５を介して入力されたデジタル音声信号を、あらかじめ定められた方法で符号化する。デジタル音声信号を符号化するための処理として、ＡＤＰＣＭ等による圧縮処理、符号化パラメータや伝送誤りを訂正するための情報付加などが考えられるが、ここでは具体的な処理内容は問わない。 The voice encoding unit 57 encodes the digital voice signal input via the transmission circuit breaker 55 by a predetermined method. As processing for encoding a digital audio signal, compression processing by ADPCM or the like, information addition for correcting encoding parameters and transmission errors, and the like can be considered, but the specific processing contents are not limited here.

符号化されたデータは、音声伝送部５９へ入力される。音声伝送部５９は無線モジュール１７（図１）の送信機能を利用して、符号化データを他機器へ無線送信する。 The encoded data is input to the audio transmission unit 59. The audio transmission unit 59 wirelessly transmits the encoded data to another device using the transmission function of the wireless module 17 (FIG. 1).

図１１は、第２実施形態に係る無線通信機能付きヘッドセットの具体的動作を示す。ここでは、ユーザが無線通信機能付きヘッドセットを使用して、室内にあるエアコンとパーソナルコンピュータの双方を無線制御する例を説明する。マイクロホンで採取されたユーザの音声は、ひとつには、ヘッドセットの認識結果送信手段２５の出力としてエアコンに無線送信され、他方では、音声伝送手段５３の出力（符号化データ）としてパーソナルコンピュータに無線送信される。 FIG. 11 shows a specific operation of the headset with a wireless communication function according to the second embodiment. Here, an example will be described in which a user wirelessly controls both an air conditioner and a personal computer in a room using a headset with a wireless communication function. The user's voice collected by the microphone is wirelessly transmitted to the air conditioner as an output of the headset recognition result transmission means 25, and on the other hand, to the personal computer as an output (encoded data) of the voice transmission means 53. Sent.

ヘッドセット内の音声認識部２３の認識語彙記憶部４７と音声モデル作成・記憶部４９の記憶内容、およびエアコン側の設定記憶内容は、第１実施形態と同様のものとする。また、パーソナルコンピュータには、大容量のハードディスクが接続されており、無線通信機能付きヘッドセットから受信した音声データは、すべてこのハードディスクに蓄積されるものとする。 The storage contents of the recognition vocabulary storage unit 47 and the speech model creation / storage unit 49 of the speech recognition unit 23 in the headset and the setting storage content on the air conditioner side are the same as those in the first embodiment. Further, it is assumed that a large-capacity hard disk is connected to the personal computer, and all audio data received from the headset with the wireless communication function is stored in this hard disk.

図１１（ａ）の例では、ユーザが、機能切り替えスイッチ５１によって音声認識モードに設定した状態で、「えあこんつける」と音声コマンドを発声したところを示している。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、上述したように、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 In the example of FIG. 11A, the user has uttered a voice command “Electrify” with the function changeover switch 51 set to the voice recognition mode. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. The digital signal is divided into two, and as described above, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

このとき、機能選択スイッチ５１の状態１であるため、機能選択部１９は音声認識動作信号を音声認識部２３に出力し、また、音声伝送停止信号を音声伝送手段５３に出力する。 At this time, since the function selection switch 51 is in state 1, the function selection unit 19 outputs a voice recognition operation signal to the voice recognition unit 23 and outputs a voice transmission stop signal to the voice transmission means 53.

音声認識部２３に入力されるデジタル信号は、まず認識用信号遮断機４１に入力される。機能選択部１９からの音声認識動作信号によって認識用信号遮断機４１が閉になっているため、デジタル信号はそのまま音響分析部４３に入力される。照合以降の処理は第１実施形態と同様である。すなわち、モデル照合部４５から認識結果として識別信号「０１」が出力され、認識結果伝送手段２５から信号「０１」がエアコンに無線送信される。 The digital signal input to the voice recognition unit 23 is first input to the recognition signal blocker 41. Since the recognition signal blocker 41 is closed by the voice recognition operation signal from the function selection unit 19, the digital signal is input to the acoustic analysis unit 43 as it is. Processing after the verification is the same as that in the first embodiment. That is, the identification signal “01” is output as the recognition result from the model matching unit 45, and the signal “01” is wirelessly transmitted from the recognition result transmission unit 25 to the air conditioner.

一方、音声伝送手段５３に入力されるデジタル信号は、伝送用信号遮断機５５に入力される。機能選択部１９が音声伝送停止信号を出力しているため、伝送用信号遮断機は開である。したがって、デジタル信号は音声符号化部に入力されず、以降の処理は行われない。 On the other hand, the digital signal input to the voice transmission means 53 is input to the transmission signal blocker 55. Since the function selection unit 19 outputs a voice transmission stop signal, the transmission signal breaker is open. Therefore, the digital signal is not input to the speech encoding unit, and the subsequent processing is not performed.

図１１（ｂ）は、ユーザが、機能切り替えスイッチ５１で声伝送手段処理モードを選択している状態で、「今日は音楽について話します」と発声したところである。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 FIG. 11 (b) shows that the user uttered “I will talk about music today” in the state where the voice transmission means processing mode is selected with the function changeover switch 51. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. The digital signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択スイッチ５１が状態２であるため、機能選択部１９は音声認識停止信号を音声認識部２３に出力し、また、音声伝送動作信号を音声伝送手段５３に出力する。 Since the function selection switch 51 is in the state 2, the function selection unit 19 outputs a voice recognition stop signal to the voice recognition unit 23 and outputs a voice transmission operation signal to the voice transmission unit 53.

音声認識部２３に入力されるデジタル信号は、まず認識用信号遮断機４１に入力されるが、機能選択部１９が音声認識停止信号を出力しているため、認識用信号遮断機４１は開である。したがって、デジタル信号は音響分析部４３には入力されず、以降の処理は行われない。 The digital signal input to the voice recognition unit 23 is first input to the recognition signal blocker 41. However, since the function selection unit 19 outputs a voice recognition stop signal, the recognition signal blocker 41 is open. is there. Therefore, the digital signal is not input to the acoustic analysis unit 43, and the subsequent processing is not performed.

一方、音声伝送手段５３に入力されるデジタル信号は、まず伝送用信号遮断機５５に入力される。機能選択部が音声伝送動作信号を出力しているため、伝送用信号遮断機５５は閉である。したがって、デジタル信号は音声符号化部５７で符号化され、音声伝送部５９から無線通信モジュール１７を介して、パーソナルコンピュータに無線送信される。 On the other hand, the digital signal input to the audio transmission means 53 is first input to the transmission signal blocker 55. Since the function selection unit outputs the audio transmission operation signal, the transmission signal blocker 55 is closed. Therefore, the digital signal is encoded by the audio encoding unit 57 and wirelessly transmitted from the audio transmission unit 59 to the personal computer via the wireless communication module 17.

パーソナルコンピュータは、ヘッドセットから送られてきた符号化音声を復号して、デジタル音声信号に戻し、ハードディスクに記録する。すなわち、ユーザが喋った内容が、ヘッドセットから無線通信により、パーソナルコンピュータに記録される。パーソナルコンピュータの容量は十分にあるので、ユーザの話した内容は、音声としてでも、テキスト変換した状態ででも格納することができる。また、記録された音声は、適宜検索、再生することができる。 The personal computer decodes the encoded audio sent from the headset, returns it to a digital audio signal, and records it on the hard disk. That is, the content that the user has spoken is recorded in the personal computer by wireless communication from the headset. Since the capacity of the personal computer is sufficient, the content spoken by the user can be stored as speech or in a text-converted state. The recorded voice can be searched and reproduced as appropriate.

また、後述するように、パーソナルコンピュータに音声認識機能を設けた場合は、ヘッドセットから送信された音声信号により難易度の高い正確な音声認識処理を施すことができる。 Further, as will be described later, when a voice recognition function is provided in a personal computer, it is possible to perform an accurate voice recognition process with a high degree of difficulty using a voice signal transmitted from a headset.

このような構成により、無線機能付きヘッドセットを着用したユーザは、ハンズフリーの状態で、自己の選択に応じて、複数の機器を対象に、音声の処理を行うことができる。たとえば、音声コマンドによる他の機器の制御のみならず、自分が話した内容をリアルタイムで記録することも可能になる。 With such a configuration, a user wearing a headset with a wireless function can perform audio processing on a plurality of devices in a hands-free state according to his / her selection. For example, it is possible not only to control other devices by voice commands, but also to record the content of what you have spoken in real time.

(第３の実施形態)
図１２および１３は、本発明の第３実施形態に係る無線機能付きヘッドセットのシステム構成の概略を示す。 (Third embodiment)
12 and 13 schematically show the system configuration of a headset with a wireless function according to the third embodiment of the present invention.

第３実施形態では、第２実施形態同様に、音声信号は、音声コマンドのための音声認識処理と、音声データの無線送信のための伝送処理の双方で処理可能である。第３実施形態では、機能選択スイッチにこれらの２つの処理モードに加え、どちらでも処理しないＯＦＦモードを追加する。 In the third embodiment, as in the second embodiment, the voice signal can be processed by both voice recognition processing for voice commands and transmission processing for wireless transmission of voice data. In the third embodiment, in addition to these two processing modes, an OFF mode that does not process either is added to the function selection switch.

図１２および１３に示すように、機能選択手段６０は、機能選択スイッチ６１と機能選択部１９とで構成される。ユーザは必要に応じて、機能選択スイッチ６１で３つの状態を切り替えることができる。ユーザが、自分が発した音声の音声認識処理を選択した場合には状態１、音声を音声伝送処理することを選択した場合は状態２、音声を音声認識手段でも音声伝送手段でも処理しないことを選択した場合は状態３とする。 As shown in FIGS. 12 and 13, the function selection means 60 includes a function selection switch 61 and a function selection unit 19. The user can switch between the three states with the function selection switch 61 as necessary. State 1 when the user has selected speech recognition processing of the voice he / she uttered, state 2 when the user has selected speech transmission processing, and speech is not processed by the speech recognition means or the speech transmission means. If selected, state 3 is assumed.

機能選択スイッチ６１の一例を図１３に示す。機能選択スイッチ６１には、３つの押しボタンスイッチが設けられており、これら３つのボタンは、常にいずれか１つだけがＯＮ状態であるように構成される。ユーザが押しボタンスイッチ１０１を押して音声認識をＯＮにした場合、機能選択スイッチ６１は状態１になる。これに連動して押しボタンスイッチ１０２、１０３は自動的にＯＦＦになる。ユーザが押しボタンスイッチ１０２を押して音声伝送をＯＮにした場合には、機能選択スイッチ６１は状態２になり、これに連動して押しボタンスイッチ１０１、１０３は自動的にＯＦＦになる。押しボタンスイッチ１０３が押された時は、機能選択スイッチ６１は状態３になり、これに連動して、押しボタンスイッチ１０１、１０２は自動的にＯＦＦになる。 An example of the function selection switch 61 is shown in FIG. The function selection switch 61 is provided with three push button switches, and only one of these three buttons is always in an ON state. When the user presses the push button switch 101 to turn on the voice recognition, the function selection switch 61 is in the state 1. In conjunction with this, the push button switches 102 and 103 are automatically turned off. When the user presses the push button switch 102 to turn on the voice transmission, the function selection switch 61 is in the state 2, and the push button switches 101 and 103 are automatically turned off in conjunction with this. When the push button switch 103 is pressed, the function selection switch 61 is in the state 3, and in conjunction with this, the push button switches 101 and 102 are automatically turned off.

機能選択部１９は、機能選択スイッチ６１の状態が状態1の場合には、音声認識部２３に音声認識動作信号を出力すると同時に、音声伝送手段５３に音声伝送停止信号を出力する。機能選択スイッチ６１の状態が状態２の場合には、音声認識部２３に音声認識停止信号を出力すると同時に、音声伝送手段５３に音声伝送動作信号を出力する。機能選択スイッチ６１の状態が状態３の場合には、音声認識部２３に音声認識停止信号を出力すると同時に、音声伝送手段５３にも音声伝送停止信号を出力する。 When the state of the function selection switch 61 is state 1, the function selection unit 19 outputs a voice recognition operation signal to the voice recognition unit 23 and simultaneously outputs a voice transmission stop signal to the voice transmission unit 53. When the state of the function selection switch 61 is state 2, a voice recognition stop signal is output to the voice recognition unit 23, and at the same time, a voice transmission operation signal is output to the voice transmission unit 53. When the state of the function selection switch 61 is state 3, a voice recognition stop signal is output to the voice recognition unit 23, and at the same time, a voice transmission stop signal is output to the voice transmission means 53.

音声認識部２３の動作は、第１および第２実施形態と同様であり、音声伝送手段５３の動作は、第２実施形態と同様である。 The operation of the voice recognition unit 23 is the same as in the first and second embodiments, and the operation of the voice transmission unit 53 is the same as in the second embodiment.

ユーザが、音声認識部２３でも音声伝送手段５３でも処理をしないことを選択した場合、すなわち機能選択スイッチ６１が状態３の場合、音声認識停止信号及び音声伝送停止信号によって、認識用遮断機４１、伝送用遮断機５５の双方が開になっている。したがって、音響分析部４３、モデル照合部４５、認識結果伝送手段２５、音声符号化部５７、音声伝送部５９の処理は行われず、演算量は大きく低減する。 When the user selects not to perform processing in either the voice recognition unit 23 or the voice transmission means 53, that is, when the function selection switch 61 is in the state 3, the recognition breaker 41, the voice recognition stop signal, and the voice transmission stop signal, Both transmission circuit breakers 55 are open. Therefore, the processing of the acoustic analysis unit 43, the model matching unit 45, the recognition result transmission unit 25, the speech encoding unit 57, and the speech transmission unit 59 is not performed, and the amount of calculation is greatly reduced.

音響分析部４３、モデル照合部４５、音声符号化部５７、音声伝送部５９を実現するＣＰＵが省電力モードを有する場合には、ユーザがＯＦＦモードを選択した場合（すなわち、機能選択スイッチ６１が状態３になったとき、もしくは音声認識停止信号と音声伝送停止信号が検出されたとき）、ＣＰＵを省電力モードに移行させることが可能である。省電力モードでは、ＣＰＵの演算能力と使用電力を低減させて電力を節約することができる。したがって、バッテリーに対する負荷が減少し、ヘッドセットの動作時間を延長することができる。機能選択スイッチ６１が状態３から脱したとき、あるいは音声認識動作信号と音声伝送動作信号の少なくとも一方が出力されたときは、速やかにＣＰＵを本来の演算能力が発揮できる通常モードに移行させればよい。 When the CPU that implements the acoustic analysis unit 43, the model matching unit 45, the voice encoding unit 57, and the voice transmission unit 59 has the power saving mode, the user selects the OFF mode (that is, the function selection switch 61 is When the state 3 is entered, or when a voice recognition stop signal and a voice transmission stop signal are detected, the CPU can be shifted to the power saving mode. In the power saving mode, it is possible to save power by reducing the computing capacity and power consumption of the CPU. Therefore, the load on the battery is reduced, and the operation time of the headset can be extended. When the function selection switch 61 is removed from the state 3 or when at least one of the voice recognition operation signal and the voice transmission operation signal is output, the CPU can be promptly shifted to the normal mode in which the original computing ability can be exhibited. Good.

図１４および１５は、第３実施形態に係る無線通信機能付きヘッドセットの具体的動作を例示する。第２実施形態と同様に、ヘッドセットを着用したユーザが、室内のエアコンとパーソナルコンピュータに対して、音声コマンドによる制御、または音声データの伝送を行う場面を想定する。 14 and 15 illustrate specific operations of the headset with the wireless communication function according to the third embodiment. As in the second embodiment, it is assumed that a user wearing a headset performs control by voice commands or transmission of voice data to an indoor air conditioner and a personal computer.

音声認識部２３の認識語彙記憶部４７と音声モデル作成・記憶部４９の記憶内容およびエアコンのテーブル設定は、第１、第２の実施形態と同様である。また、第２実施形態と同様に、パーソナルコンピュータには大容量のハードディスクが接続されており、無線通信機能付きヘッドセットから受信した音声データはすべてこのハードディスクに蓄積されるものとする。 The storage contents of the recognition vocabulary storage unit 47 and the speech model creation / storage unit 49 of the speech recognition unit 23 and the table setting of the air conditioner are the same as in the first and second embodiments. Similarly to the second embodiment, a large-capacity hard disk is connected to the personal computer, and all audio data received from the headset with the wireless communication function is stored in this hard disk.

図１４（ａ）は、ユーザが機能選択スイッチ６１で音声認識モードを選択して、マイクロホンに向かって「えあこんつける」と音声コマンドを発声したところを示す。ユーザの音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。機能選択スイッチ６１が状態１であるため、機能選択部１９は音声認識動作信号を音声認識部２３に出力し、音声伝送停止信号を音声伝送手段５３に出力する。この場合、第２実施形態（図１１（ａ））と同様に、エアコンに対してコマンド「０１」が無線送信され、エアコンは動作を開始する。一方、パーソナルコンピュータに音声データは転送されない。 FIG. 14A shows a state where the user selects the voice recognition mode with the function selection switch 61 and utters a voice command “Easy-on” toward the microphone. The user's voice is detected by a microphone and converted into a digital signal by the A / D converter 21. The digital signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53. Since the function selection switch 61 is in the state 1, the function selection unit 19 outputs a voice recognition operation signal to the voice recognition unit 23 and outputs a voice transmission stop signal to the voice transmission unit 53. In this case, as in the second embodiment (FIG. 11A), the command “01” is wirelessly transmitted to the air conditioner, and the air conditioner starts operation. On the other hand, audio data is not transferred to the personal computer.

図１４（ｂ）は、ユーザが、機能切り替えスイッチ６１で音声伝送モードを選択した状態で「今日は音楽について話します」と発声したところを示している。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 FIG. 14B shows a state where the user utters “I will talk about music today” in a state where the audio transmission mode is selected with the function changeover switch 61. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. The digital signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択スイッチ６１は状態２にあるため、機能選択部１９は音声認識停止信号を音声認識部２３に出力し、音声伝送動作信号を音声伝送手段５３に出力する。このとき、第２実施形態(図１１（ｂ）)と同様に、エアコンに対してはなにも送信されないが、パーソナルコンピュータに符号化された音声信号が送信される。これにより、ユーザは自分が話した内容を、たとえばＰＣ内のメモリに記録することができる。パーソナルコンピュータ側にも、コマンド語彙と単語ＩＤのテーブルが設定されている場合には、記録に際して、ユーザはパーソナルコンピュータに対して音声認識処理済みの音声コマンドを無線送信し、コンピュータをＯＮにすることも可能である。 Since the function selection switch 61 is in the state 2, the function selection unit 19 outputs a voice recognition stop signal to the voice recognition unit 23 and outputs a voice transmission operation signal to the voice transmission unit 53. At this time, as in the second embodiment (FIG. 11B), nothing is transmitted to the air conditioner, but an encoded audio signal is transmitted to the personal computer. As a result, the user can record the contents spoken by the user, for example, in a memory in the PC. If a command vocabulary and word ID table is also set on the personal computer side, when recording, the user wirelessly transmits a voice command that has undergone voice recognition processing to the personal computer, and turns on the computer. Is also possible.

図１５は、機能切り替えスイッチ６１がＯＦＦモード、すなわち音声認識も音声伝送処理もしないことを選択している状態で、ユーザが「今日は音楽について話します」と発声したところを示している。ユーザが発声した音声はマイクロホンで検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 FIG. 15 shows a state in which the user utters “Today I will talk about music” in a state where the function selector switch 61 is selected to be in the OFF mode, that is, neither voice recognition nor voice transmission processing is performed. The voice uttered by the user is detected by a microphone and converted into a digital signal by the A / D converter 21. The digital signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択スイッチ６１が状態３であるため、機能選択部１９は、音声認識停止信号を音声認識部２３に出力し、音声伝送停止信号を音声伝送手段５３に出力する。 Since the function selection switch 61 is in the state 3, the function selection unit 19 outputs a voice recognition stop signal to the voice recognition unit 23 and outputs a voice transmission stop signal to the voice transmission unit 53.

音声認識手段２３に入力されるデジタル信号は、まず認識用信号遮断機４１に入力されるが、機能選択部１９が音声認識停止信号を出力しているため、認識用信号遮断機４１は開である。したがって、デジタル信号は音響分析部４３に入力されず、以降の処理は行われない。 The digital signal input to the voice recognition means 23 is first input to the recognition signal blocker 41. However, since the function selection unit 19 outputs a voice recognition stop signal, the recognition signal blocker 41 is open. is there. Therefore, the digital signal is not input to the acoustic analysis unit 43, and the subsequent processing is not performed.

同様に、音声伝送手段５３に入力されるデジタル信号は、まず伝送用信号遮断機５５に入力されるが、機能選択部１９が音声伝送停止信号を出力しているため、伝送用信号遮断機５５も開である。したがって、デジタル信号は音声符号化部５７に入力されず、以降の処理は行われない。 Similarly, the digital signal input to the voice transmission means 53 is first input to the transmission signal blocker 55. However, since the function selection unit 19 outputs the voice transmission stop signal, the transmission signal blocker 55 is output. Is also open. Therefore, the digital signal is not input to the audio encoding unit 57, and the subsequent processing is not performed.

したがってエアコンに音声制御信号は送られず、パーソナルコンピュータにも音声データは送信されない。しかしユーザは、音声の認識処理やそれにともなう動作、たとえば他機器の制御やディクテーションを目的としない機能を使用することは可能である。したがって、ユーザはヘッドセットに内蔵されたスピーカで音楽や第三者の音声を聞くことができる。 Therefore, no audio control signal is sent to the air conditioner, and no audio data is sent to the personal computer. However, the user can use functions that are not intended for voice recognition processing and operations associated therewith, for example, control or dictation of other devices. Therefore, the user can listen to music or a third party's voice through the speaker built in the headset.

(第４実施形態)
図１６および１７は、本発明の第４実施形態に係る無線通信機能付きヘッドセットのシステム構成の概略を示す。 (Fourth embodiment)
16 and 17 schematically show the system configuration of a headset with a wireless communication function according to the fourth embodiment of the present invention.

マイクロホン１３で検出された音声はＡ／Ｄ変換器２１に入力され、アナログ信号からデジタル音声信号に変換される。デジタル音声信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 The sound detected by the microphone 13 is input to the A / D converter 21 and converted from an analog signal to a digital sound signal. The digital voice signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択手段７０は、機能選択スイッチ７１と機能選択部１９とで構成される。機能選択スイッチ７１は、ユーザの操作により３状態を切り替えることができる。ユーザが、マイクロホン１３で検出した音声信号を音声認識部２３で処理することを選択した場合には状態１、マイクロホン１３で検出した音声信号を音声伝送手段５３で処理することを選択した場合は状態２、マイクロホン１３で検出した音声信号を音声認識部２３と音声伝送手段５３の両方で処理することを選択した場合には状態３とする。 The function selection unit 70 includes a function selection switch 71 and a function selection unit 19. The function selection switch 71 can switch between three states by a user operation. State 1 when the user has selected to process the audio signal detected by the microphone 13 with the speech recognition unit 23, and state when the user has selected to process the audio signal detected by the microphone 13 with the audio transmission means 53. 2. If it is selected that the voice signal detected by the microphone 13 is to be processed by both the voice recognition unit 23 and the voice transmission means 53, state 3 is set.

図１７は、機能選択スイッチ７１の一例を示す。機能選択スイッチ７１は、音声認識ボタン１０１、音声伝送ボタン１０２、両モードボタン１０４の３つの押しボタンスイッチを有する。これらの押しボタンスイッチは、常にいずれか１つのみがＯＮになるように構成される。ユーザが押しボタンスイッチ１０１をＯＮにした場合には、機能選択スイッチ７１は状態１になり、これに連動して押しボタンスイッチ１０２，１０４は自動的にＯＦＦになる。同様に、ユーザが押しボタンスイッチ１０２をＯＮにした場合には、機能選択スイッチ７１は状態２になり、これに連動して押しボタンスイッチ１０１，１０４は自動的にＯＦＦになる。押しボタンスイッチ１０４がＯＮにされた場合には、機能選択スイッチ７１は状態３になり、これに連動して押しボタンスイッチ１０１、１０２は自動的にＯＦＦになる。 FIG. 17 shows an example of the function selection switch 71. The function selection switch 71 has three push button switches: a voice recognition button 101, a voice transmission button 102, and both mode buttons 104. Only one of these push button switches is always ON. When the user turns on the push button switch 101, the function selection switch 71 is in the state 1, and in conjunction with this, the push button switches 102 and 104 are automatically turned off. Similarly, when the user turns on the push button switch 102, the function selection switch 71 is in the state 2, and the push button switches 101 and 104 are automatically turned off in conjunction with this. When the push button switch 104 is turned on, the function selection switch 71 is in the state 3, and in conjunction with this, the push button switches 101 and 102 are automatically turned off.

機能選択部１９は、機能選択スイッチ７１が状態１の場合には、音声認識部２３に音声認識動作信号を出力し、音声伝送手段５３に音声伝送停止信号を出力する。機能選択スイッチ７１が状態２の場合は、音声認識部２３に音声認識停止信号を出力し、音声伝送手段５３に音声伝送動作信号を出力する。機能選択スイッチ７１が状態３の場合は、音声認識部２３に音声認識動作信号を出力すると同時に、音声伝送手段５３に音声伝送動作信号を出力する。 When the function selection switch 71 is in the state 1, the function selection unit 19 outputs a voice recognition operation signal to the voice recognition unit 23 and outputs a voice transmission stop signal to the voice transmission unit 53. When the function selection switch 71 is in the state 2, a voice recognition stop signal is output to the voice recognition unit 23, and a voice transmission operation signal is output to the voice transmission means 53. When the function selection switch 71 is in the state 3, a voice recognition operation signal is output to the voice recognition unit 23, and at the same time, a voice transmission operation signal is output to the voice transmission means 53.

音声認識部２３および音声伝送手段５３の動作は、先に述べた実施形態と同様である。 The operations of the voice recognition unit 23 and the voice transmission unit 53 are the same as those in the above-described embodiment.

図１８は、図１６の無線通信機能付きヘッドセットの具体的動作を説明するための図である。図１８（ａ）および１８（ｂ）に示す例では、第３実施形態と同様、無線通信機能付きヘッドセットを着用したユーザが、機能選択スイッチ７１で音声認識モードと音声伝送モードとを切り替え選択して、エアコンの音声制御と、パーソナルコンピュータへの音声データの送信、記録を行う。ヘッドセットの認識語彙記憶部およびの音声モデル作成・記憶部の記憶内容は、第１実施形態の例と同様である。エアコン側の設定も第1の実施形態の例と同様であり、また、パーソナルコンピュータには大容量のハードディスクが接続されており、無線通信機能付きヘッドセットから受信した音声データはすべてこのハードディスクに蓄積されるものとする。 FIG. 18 is a diagram for explaining a specific operation of the headset with the wireless communication function of FIG. In the example shown in FIGS. 18A and 18B, as in the third embodiment, the user wearing the headset with the wireless communication function switches and selects the voice recognition mode and the voice transmission mode with the function selection switch 71. The voice control of the air conditioner and the transmission and recording of the voice data to the personal computer are performed. The stored contents of the recognized vocabulary storage unit of the headset and the speech model creation / storage unit are the same as in the example of the first embodiment. The settings on the air conditioner side are the same as in the first embodiment. Also, a large-capacity hard disk is connected to the personal computer, and all audio data received from the headset with wireless communication function is stored in this hard disk. Shall be.

図１９は、ユーザが機能切り替えスイッチ７１で、音声認識と音声伝送の双方で音声を処理することを選択している状態である。ヘッドセットを着用したユーザは、「エアコンいれて」と発声したところである。ユーザが発声した音声はマイクロホン１３で検出され、Ａ／Ｄ変換部２１でデジタル信号に変換される。デジタル信号は二分され、一方は音声認識部２３へ入力され、もう一方は音声伝送手段５３へ入力される。 FIG. 19 shows a state in which the user has selected to process the voice in both voice recognition and voice transmission with the function selector switch 71. The user wearing the headset has just voiced “Turn on air conditioning”. The voice uttered by the user is detected by the microphone 13 and converted into a digital signal by the A / D converter 21. The digital signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択スイッチ７１が状態３であるため、機能選択部１９は音声認識動作信号を音声認識部２３に出力し、かつ、伝送動作信号を音声伝送手段５３に出力する。 Since the function selection switch 71 is in the state 3, the function selection unit 19 outputs a speech recognition operation signal to the speech recognition unit 23 and outputs a transmission operation signal to the speech transmission means 53.

音声認識部２３に入力されるデジタル信号は、まず認識用信号遮断機４１に入力される。機能選択部１９が音声認識動作信号を出力しているため、認識用信号遮断機４１は閉である。デジタル音声信号は音響分析部に入力され、認識結果「０１」がエアコンに無線送信され、エアコンは動作を開始する。 The digital signal input to the voice recognition unit 23 is first input to the recognition signal blocker 41. Since the function selection unit 19 outputs a voice recognition operation signal, the recognition signal blocker 41 is closed. The digital audio signal is input to the acoustic analysis unit, the recognition result “01” is wirelessly transmitted to the air conditioner, and the air conditioner starts operation.

一方、音声伝送手段５３に入力されるデジタル信号は、まず伝送用信号遮断機５５に入力される。機能選択部１９が音声伝送動作信号を出力しているため、伝送用信号遮断機５５も閉になる。デジタル音声信号は音声符号化部に入力され、符号化された音声信号がパーソナルコンピュータに無線送信される。 On the other hand, the digital signal input to the audio transmission means 53 is first input to the transmission signal blocker 55. Since the function selection unit 19 outputs the audio transmission operation signal, the transmission signal breaker 55 is also closed. The digital audio signal is input to the audio encoding unit, and the encoded audio signal is wirelessly transmitted to the personal computer.

この場合、パーソナルコンピュータに蓄積された音声データには、無線通信機能付きヘッドセットの音声認識部２３で認識されることが期待されて発声された音声成分も含まれている。したがって、パーソナルコンピュータの中に蓄積された音声を再生することで、音声認識部２３の操作履歴を調べることが可能である。 In this case, the voice data stored in the personal computer includes a voice component uttered in expectation of being recognized by the voice recognition unit 23 of the headset with the wireless communication function. Therefore, it is possible to check the operation history of the voice recognition unit 23 by playing back the voice stored in the personal computer.

第４実施形態では、ユーザが発声した音声が、機器制御のための音声コマンドとして認識されると同時に、パーソナルコンピュータに記録、蓄積される音声データとしても処理される。このような構成のヘッドセットは、例えば、研究室や工場等で、装置、機器をキー操作なしに音声コマンドで遠隔制御しつつ、同時にその操作制御記録をコンピュータ等に記録することが可能になる。また、ヘッドセット内での音声認識処理は、単語認識に基づいた音声コマンドの処理を例にとっているが、上述したように、本発明のヘッドセットの音声認識はこれに限定されない。 In the fourth embodiment, a voice uttered by a user is recognized as a voice command for device control, and at the same time, processed as voice data recorded and stored in a personal computer. A headset having such a configuration, for example, in a laboratory or factory can remotely control devices and equipment with voice commands without key operations, and simultaneously record the operation control records on a computer or the like. . The voice recognition processing in the headset is exemplified by voice command processing based on word recognition. However, as described above, the voice recognition of the headset of the present invention is not limited to this.

(第５実施形態)
図２０は、本発明の第５実施形態に係る無線通信機能付きヘッドセットのシステム構成の概略を示す。第５実施形態は、上述した第３実施形態と第４実施形態を組み合わせたものであり、機能選択スイッチが、音声認識モード、音声伝送モード、ＯＦＦモード、音声認識／伝送モードの４つのモードを有する。 (Fifth embodiment)
FIG. 20 shows an outline of the system configuration of a headset with a wireless communication function according to the fifth embodiment of the present invention. The fifth embodiment is a combination of the third embodiment and the fourth embodiment described above, and the function selection switch has four modes: a voice recognition mode, a voice transmission mode, an OFF mode, and a voice recognition / transmission mode. Have.

第３および第４実施形態と同様に、マイクロホン１３で検出された音声はＡ／Ｄ変換器２１に入力され、アナログ信号からデジタル音声信号に変換される。デジタル音声信号は二分され、一方は音声認識部２３に入力され、もう一方は音声伝送手段５３に入力される。 As in the third and fourth embodiments, the sound detected by the microphone 13 is input to the A / D converter 21 and converted from an analog signal to a digital sound signal. The digital voice signal is divided into two parts, one is input to the voice recognition unit 23 and the other is input to the voice transmission means 53.

機能選択手段８０は、機能選択スイッチ８１と機能選択部１９とで構成される。機能選択スイッチ８１はユーザの選択により、４状態を切り替えることができる。ユーザが、マイクロホン１３で検出した音声信号を音声認識部２３で処理することを選択した場合は状態１、音声伝送手段５３で処理することを選択した場合は状態２、音声認識部２３と音声伝送手段５３の双方で処理することを選択した場合には状態３、いずれでも処理しないことを選択した場合には状態４となる。 The function selection unit 80 includes a function selection switch 81 and a function selection unit 19. The function selection switch 81 can switch between four states according to the user's selection. If the user chooses to process the voice signal detected by the microphone 13 with the voice recognition unit 23, the state 1, if the user chooses to process with the voice transmission unit 53, the voice recognition unit 23 and the voice transmission. State 3 is selected when processing is performed by both of the means 53, and state 4 is selected when processing is not performed in either case.

図２１は、機能選択スイッチ８１の一例を示す。機能選択スイッチ８１は、４個の押しボタンスイッチを有し、これら４個の押しボタンスイッチは、常にいずれか１つのみがＯＮになるように構成されている。ユーザが押しボタンスイッチ１０１をＯＮにした場合は、機能選択スイッチ８１は状態１になり、これに連動して他の３つの押しボタンスイッチ１０２，１０３，１０４は自動的にＯＦＦになる。同様に、いずれの１つを選択しても、他の３つは自動的にＯＦＦになる。 FIG. 21 shows an example of the function selection switch 81. The function selection switch 81 has four push button switches, and only one of these four push button switches is always ON. When the user turns on the push button switch 101, the function selection switch 81 is in the state 1, and in conjunction with this, the other three push button switches 102, 103, 104 are automatically turned off. Similarly, when any one is selected, the other three are automatically turned OFF.

機能選択スイッチ８１の状態（モード）に呼応する機能選択部１９の信号出力状態と、それに応じた信号遮断器４１，５５の動作、無線送出される単語ＩＤは、第３および第４実施形態と同じなので、ここでは説明を省略する。 The signal output state of the function selection unit 19 corresponding to the state (mode) of the function selection switch 81, the operation of the signal breakers 41 and 55, and the word ID transmitted wirelessly are the same as in the third and fourth embodiments. Since it is the same, description is abbreviate | omitted here.

図２２および２３は、図２０に示す無線通信機能付きヘッドセットの具体的動作の例を示す。ヘッドセットを着用したユーザは、機能選択スイッチ８１を操作することにより、４つのモードを適宜選択することができる。図２２（ａ）および２２（ｂ）では、音声認識モードと音声伝送モードを切り替えて、音声コマンドによるエアコンの制御と、パーソナルコンピュータへの音声データの送信、格納を切り替える例を示す。２３（ａ）および２３（ｂ）では、音声認識と音声伝送の双方を同時に行うモードと、いずれも行わないモードの例を示す。第３および第４実施形態で述べたのと同様に、両方を行うモードでは、音声コマンドでエアコンを制御すると同時に、その音声を符号化データとしてパーソナルコンピュータへも無線送信し、格納する。格納されたデータは、後に再生、分析可能である。ＯＦＦモードでは、音声認識も音声伝送も行われないが、ユーザは、ヘッドセットに内蔵されたスピーカで音楽や第三者の音声を聞くことができる。 22 and 23 show examples of specific operations of the headset with the wireless communication function shown in FIG. The user wearing the headset can select the four modes as appropriate by operating the function selection switch 81. 22A and 22B show an example in which the voice recognition mode and the voice transmission mode are switched to switch the control of the air conditioner by voice commands and the transmission and storage of voice data to a personal computer. 23 (a) and 23 (b) show examples of a mode in which both voice recognition and voice transmission are performed simultaneously and a mode in which neither is performed. As described in the third and fourth embodiments, in the mode in which both are performed, the air conditioner is controlled by a voice command, and at the same time, the voice is wirelessly transmitted to the personal computer as encoded data and stored. The stored data can be reproduced and analyzed later. In the OFF mode, neither voice recognition nor voice transmission is performed, but the user can listen to music or a third party's voice through a speaker built in the headset.

なお、ヘッドセット内の認識語彙記憶部や、音声モデル作成・記憶部の記憶内容、およびエアコンの記憶、設定は、第１実施形態と同様とする。パーソナルコンピュータには大容量のハードディスクが接続されており、無線通信機能付きヘッドセットから受信した音声データはすべてこのハードディスクに蓄積されるものとする。 The recognized vocabulary storage unit in the headset, the storage contents of the voice model creation / storage unit, and the storage and setting of the air conditioner are the same as in the first embodiment. A large-capacity hard disk is connected to the personal computer, and all audio data received from the headset with the wireless communication function is stored in this hard disk.

(第６実施形態)
図２４は、本発明の第６実施形態に係る音声処理システムの概略を示す。この音声処理システムは、第１〜第５実施形態で述べてきた無線通信機能付きヘッドセット１１０と、音声認識機能付き装置１３０とで構成される。このシステムでは、ヘッドセットの機能選択スイッチ１１４で、音声伝送モードを選択している場合には、マイクロホンで検出した音声信号はヘッドセットの音声伝送手段１５３を介して、音声認識機能付き装置１３０に無線送信され、装置側で音声認識処理される。ヘッドセットで音声認識モードが選択されている場合は、ヘッドセット内で音声認識処理される。 (Sixth embodiment)
FIG. 24 shows an outline of a speech processing system according to the sixth embodiment of the present invention. This voice processing system includes the headset 110 with the wireless communication function described in the first to fifth embodiments and the device 130 with the voice recognition function. In this system, when the voice transmission mode is selected by the function selection switch 114 of the headset, the voice signal detected by the microphone is transmitted to the voice recognition function device 130 via the voice transmission means 153 of the headset. Wireless transmission is performed, and voice recognition processing is performed on the device side. When the voice recognition mode is selected in the headset, voice recognition processing is performed in the headset.

すなわち、無線通信機能付きヘッドセット１１０は、ユーザの音声を検出するマイクロホン１１３と、マイクロホン１１３で検出された音声の認識処理を行う音声認識手段と、認識結果を無線送出する認識結果伝送手段１２５と、マイクロホン１１３で検出された音声信号を符号化された音声データとして無線送出する音声伝送手段１５３と、音声認識と音声伝送のいずれかの処理を選択する機能選択スイッチ１１４とを有する。 That is, the headset 110 with the wireless communication function includes a microphone 113 that detects a user's voice, a voice recognition unit that performs a process of recognizing the voice detected by the microphone 113, and a recognition result transmission unit 125 that wirelessly transmits the recognition result. The voice transmission means 153 wirelessly transmits the voice signal detected by the microphone 113 as encoded voice data, and the function selection switch 114 that selects either voice recognition or voice transmission processing.

一方、音声認識機能付き装置１３０は、ヘッドセットから無線送信された音声データを受信する音声受信手段１４０と、受信された音声を認識処理する音声認識エンジン１５０とを有する。 On the other hand, the device 130 with the voice recognition function includes a voice receiving unit 140 that receives voice data wirelessly transmitted from the headset, and a voice recognition engine 150 that recognizes the received voice.

図２５は、図２４に示す音声認識機能付き装置１３０の音声受信手段１４０を示す。ヘッドセットから無線通信で送られてきた符号化された音声信号は、符号化音声受信部１４１で受信され、符号化音声復号部１４３に入力される。 FIG. 25 shows the voice receiving means 140 of the device 130 with the voice recognition function shown in FIG. The encoded audio signal transmitted from the headset through wireless communication is received by the encoded audio receiving unit 141 and input to the encoded audio decoding unit 143.

符号化音声復号部１４３は、符号化音声の復号処理を行い、デジタル音声信号を音声認識エンジン１５０に出力する。 The encoded speech decoding unit 143 performs a decoding process on the encoded speech and outputs a digital speech signal to the speech recognition engine 150.

音声認識エンジン１５０は、単語音声認識技術、大語彙文音声認識技術のいずれを利用してもよい。ここでは大語彙文音声認識技術を用いた場合の構成を説明する。 The speech recognition engine 150 may use either word speech recognition technology or large vocabulary sentence speech recognition technology. Here, the configuration when the large vocabulary sentence speech recognition technology is used will be described.

図２６は、文音声認識技術を使用した音声認識エンジン１５０の概略図である。音声認識エンジン１５０では、あらかじめ入力音声の中で使われる可能性のある語彙を収集してある。たとえば、単語単位の語彙とする場合は、各単語の表記、読み、単語ＩＤを認識語彙記憶部１５７に記憶しておく。通常、このような単語として数万〜１０万単語程度を記憶させるが、話題や文型を制限できる場合などは、単語数を絞り込んで記憶容量を削減することも可能である。 FIG. 26 is a schematic diagram of a speech recognition engine 150 using sentence speech recognition technology. The speech recognition engine 150 collects vocabulary that may be used in the input speech beforehand. For example, when the vocabulary is in units of words, the notation, reading, and word ID of each word are stored in the recognized vocabulary storage unit 157. Usually, about tens of thousands to 100,000 words are stored as such words, but when the topic or sentence pattern can be limited, the number of words can be narrowed down to reduce the storage capacity.

また、あらかじめ認識語彙記憶部１５７に記憶された各単語間の接続し易さを表す言語モデルを作成しておき、言語モデル記憶部１６１に記憶しておく。言語モデルとしては、例えば、大量に集めた文データベース中の単語の出現頻度、２単語組み、３単語組みの出現頻度を元に作成した確率値を用いることができる。 In addition, a language model representing ease of connection between the words stored in the recognized vocabulary storage unit 157 in advance is created and stored in the language model storage unit 161. As the language model, for example, probability values created based on the appearance frequency of words in a sentence database collected in large quantities, the appearance frequency of two word combinations, and three word combinations can be used.

音声モデル作成・記憶部１５９は、認識語彙記憶部１５７に記憶されている各単語の読みから単語音声モデルを生成し、その単語の単語ＩＤと組にして記憶しておく。ここで単語音声モデルは一般によく知られているＨＭＭ（Hidden Markov Model）が用いられることが多いが、これに限定されるものではない。 The speech model creation / storage unit 159 generates a word speech model from the reading of each word stored in the recognized vocabulary storage unit 157, and stores it as a pair with the word ID of the word. Here, the generally well-known HMM (Hidden Markov Model) is often used as the word speech model, but it is not limited to this.

音響分析部１５１では、入力された音声を特徴パラメータに変換する。音声認識に使用される代表的な特徴パラメータとしては、バンドパスフィルタやフーリエ変換によって求めることができるパワースペクトル、あるいはＬＰＣ(線形予測)分析によって求めたケプストラム係数などがよく用いられるが、ここではその特徴パラメータの種類は問わない。音響分析部では、一定時間ごとに入力音声の特徴パラメータに変換する。したがってその出力は特徴パラメータの時系列(特徴パラメータ系列)となる。 The acoustic analysis unit 151 converts the input voice into feature parameters. As typical feature parameters used for speech recognition, a power spectrum that can be obtained by a band-pass filter or Fourier transform, or a cepstrum coefficient obtained by LPC (linear prediction) analysis is often used. The type of feature parameter does not matter. The acoustic analysis unit converts the feature parameter of the input speech at regular time intervals. Therefore, the output is a time series of characteristic parameters (characteristic parameter series).

モデル照合部１５５は、音声モデル作成・記憶部１５９に記憶された単語の各音声モデルと連結した連続単語音声モデルと、入力された特徴パラメータ系列との類似度あるいは距離を求め、音響的類似度(距離)を計算する。また、連続単語音声モデルを構成する各単語の並びと、言語モデル記憶部１６１に記憶された各言語モデルとを照合し、言語的な確からしさを計算する。モデル照合部１５５は、音響的類似度と、言語的な確からしさとを勘案して、入力された特徴パラメータ系列ともっともよく照合する単語系列を求め、その単語系列を構成する単語の単語ＩＤ系列を構成する単語の単語ＩＤ系列を認識結果として、単語ＩＤ表記変換部１６３に出力する。 The model matching unit 155 obtains the similarity or distance between the continuous word speech model connected to each speech model of the word stored in the speech model creation / storage unit 159 and the input feature parameter series, and the acoustic similarity Calculate (distance). Further, the sequence of each word constituting the continuous word speech model is collated with each language model stored in the language model storage unit 161, and the linguistic accuracy is calculated. The model matching unit 155 considers acoustic similarity and linguistic accuracy, obtains a word sequence that best matches the input feature parameter series, and a word ID sequence of words constituting the word sequence Is output to the word ID notation conversion unit 163 as a recognition result.

単語ＩＤ表記変換部１６３は、単語ＩＤ系列と、認識語彙記憶部１５７に記憶されている単語ＩＤ、表記とを照合し、表記を連結することによって単語ＩＤ系列に対応する文字列に変換する。 The word ID notation conversion unit 163 collates the word ID series with the word IDs and notations stored in the recognized vocabulary storage unit 157 and concatenates the notations to convert them into character strings corresponding to the word ID series.

図２７は、図２４，２５に示す音声処理システムの具体的動作を例示する。図２７の例では、無線通信機能付きヘッドセットを着用したユーザが、機能選択スイッチ１１４で音声伝送モードを選択し、自分が話す音声を、音声認識機能付き装置(パーソナルコンピュータ)へ転送する。 FIG. 27 illustrates a specific operation of the voice processing system shown in FIGS. In the example of FIG. 27, a user wearing a headset with a wireless communication function selects the voice transmission mode with the function selection switch 114 and transfers the voice he / she speaks to a device with a voice recognition function (personal computer).

ユーザが発声した「今日は音楽について話します」という音声は、マイクロホン１１３で検出され、符号化されて、音声伝達手段１５３からパーソナルコンピュータに転送される。パーソナルコンピュータは受信した信号を復号化して、音声認識処理を行う。コンピュータ側では、音声認識エンジン１５０の認識語彙記憶部１５７にあらかじめ単語の表記と読みと単語ＩＤとを対応づけて格納している。 The voice “speaking about music today” uttered by the user is detected by the microphone 113, encoded, and transferred from the voice transmission means 153 to the personal computer. The personal computer decodes the received signal and performs voice recognition processing. On the computer side, word notation, reading, and word ID are stored in advance in the recognition vocabulary storage unit 157 of the speech recognition engine 150.

図２８は、認識語彙記憶部１５７の記憶内容例を示す。例えば、表記「音楽」に対応して、読み「おんがく」と、単語ＩＤ「００８１１」が登録されている。音声モデル作成・記憶部１５９は、認識語彙記憶部１５７の記憶内容にしたがって、「音楽」等に対応する単語音声モデルを作成し、記憶する。 FIG. 28 shows an example of stored contents of the recognized vocabulary storage unit 157. For example, a reading “ongaku” and a word ID “00811” are registered corresponding to the notation “music”. The speech model creation / storage unit 159 creates and stores a word speech model corresponding to “music” or the like according to the stored contents of the recognized vocabulary storage unit 157.

図２９は、言語モデル記憶部１６１の記憶内容例を示す。図２９に示す記憶内容例では、第１の単語ＩＤと、その直後に連続する第２の単語ＩＤと、第１の単語ＩＤで示される単語に直接後続して第２の単語ＩＤで示される単語が出現する度合い（出現し易さ）を対応づけて格納する。例えば、単語ＩＤが００７１２の単語と、単語ＩＤが００８１１の単語が連続して用いられる度合い（出現し易さ）は０．０１２である。また、単語ＩＤが００７１２の単語に引き続いて単語ＩＤが０２１５５の単語が用いられる度合い（出現し易さ）は０．５８４である。 FIG. 29 shows an example of the contents stored in the language model storage unit 161. In the stored content example shown in FIG. 29, the first word ID, the second word ID that continues immediately after the first word ID, and the second word ID that immediately follows the word indicated by the first word ID are indicated. The degree of appearance of words (ease of appearance) is stored in association with each other. For example, the degree (ease of appearance) that the word ID 00712 and the word ID 00811 are used in succession is 0.012. The degree (ease of appearance) that the word ID 02155 is used after the word ID 00712 is 0.584.

認識語彙記憶部１５７の記憶内容を照合すれば、上述したそれぞれの単語ＩＤの組み合わせが、「を」「音楽」と、「を」「します」を表すことがわかる。また、出現し易さを参照するなら、後者の組み合わせのほうが、前者に比べて連続して出現する確率が高いことがわかる。したがって、文字列「をします」が優先的に選択されることになる。 If the stored contents of the recognized vocabulary storage unit 157 are collated, it can be understood that the combinations of the above-described word IDs represent “to”, “music”, “to” and “to do”. Further, when referring to the ease of appearance, it can be seen that the latter combination has a higher probability of appearing continuously than the former. Therefore, the character string “I do” is preferentially selected.

図２５、２６に戻ると、ヘッドセットから転送された音声は、まずパーソナルコンピュータの符号化音声受信部１４１で受信され、符号化音声復号部１４３で音声信号に復号された後、音声認識エンジン１５０に入力される。 25 and 26, the voice transferred from the headset is first received by the coded voice receiving unit 141 of the personal computer, decoded into a voice signal by the coded voice decoding unit 143, and then the voice recognition engine 150. Is input.

復号された音声信号は、音響分析部１５１で特徴パラメータ系列に変換されて、モデル照合部１５５に入力される。モデル照合部１５５では、音声モデル作成・記憶部１５９に記憶された各単語の音声モデルと、言語モデル記憶部１６１に記憶された言語モデルにもとづいて、パラメータ系列に対応する単語ＩＤの系列を求める。この場合、得られる単語ＩＤ系列は「01211、12322、00811、08211、12596、00712、02155」となる。 The decoded speech signal is converted into a feature parameter series by the acoustic analysis unit 151 and input to the model matching unit 155. Based on the speech model of each word stored in the speech model creation / storage unit 159 and the language model stored in the language model storage unit 161, the model matching unit 155 obtains a sequence of word IDs corresponding to the parameter sequence. . In this case, the obtained word ID series is “01211, 12322, 00811, 08211, 12596, 00712, 02155”.

単語ＩＤ表記変換部１６３では、上記単語ＩＤ系列の各単語ＩＤに対応する表記を求め、さらにそれを連結することによって、「今日は音楽の話をします」という文字列を得る。 The word ID notation conversion unit 163 obtains a character string “Today I will talk about music” by obtaining a notation corresponding to each word ID of the word ID series and concatenating them.

音声認識機能付き装置１３０が文字を表示する機能を持つ場合、モデル照合部１５５で変換された文字列を音声認識機能付き装置１３０上に表示することによって、ユーザは自分が話した内容を文字としてその場で確認することができる。図３０は、このようにしてパーソナルコンピュータが文字列をテキストとして表示した例を示す。 In the case where the device with speech recognition function 130 has a function of displaying characters, by displaying the character string converted by the model matching unit 155 on the device with speech recognition function 130, the user can use the contents spoken by the user as characters. You can check on the spot. FIG. 30 shows an example in which the personal computer displays the character string as text in this way.

また、音声認識機能付き装置１３０が編集機能を有する場合、その場でリアルタイムの編集を行うことができる。この場合、音声信号を蓄積しておいて、それを後から文字列に変換し、編集する場合に比較して、作業効率が格段に向上する。 Further, when the device 130 with the voice recognition function has an editing function, real-time editing can be performed on the spot. In this case, the work efficiency is remarkably improved as compared with the case where the voice signal is accumulated and converted into a character string and edited later.

さらに、無線通信機能付きヘッドセット１１０の機能選択スイッチ１１４を、ヘッドセット自体が有する音声認識部１２３で認識するように切り替え、そこで編集用のコマンド音声を認識し、認識結果を音声認識機能付き装置１３０に無線送信するようにすれば、編集作業を音声で行うことも可能である。機能選択スイッチ１１４がヘッドセットに設けられているので、処理モードの切り替えの手間はここでは問題にならない。音声認識機能付き装置１３０に、コマンド音声を認識する機能を追加することによってスイッチの切り替えを省略することも可能であるが、この場合は、音声認識機能付き装置１３０に、文字列を表示するための音声なのか、編集用コマンドなのかを判定する機能をさらに追加する必要がある。 Further, the function selection switch 114 of the headset 110 with the wireless communication function is switched so as to be recognized by the voice recognition unit 123 of the headset itself, and the command voice for editing is recognized there, and the recognition result is converted into a device with the voice recognition function. If wireless transmission is performed at 130, editing work can be performed by voice. Since the function selection switch 114 is provided in the headset, the trouble of switching the processing mode is not a problem here. It is possible to omit switching of the switch by adding a function for recognizing a command voice to the device with speech recognition function 130. In this case, a character string is displayed on the device with speech recognition function 130. It is necessary to add a function for determining whether the voice is an editing command or an editing command.

また、音声認識機能付き装置１３０が文字列を記憶する機能を有する場合、文字列に変換した結果をその場で蓄積することができる。この構成により、音声を記憶するよりも小さい記憶容量で発声した内容を記録することができる。また、文字列に変換されているため、検索等が容易になる。復号した音声を文字列と組にして記憶すると、さらに有用性が増す。具体的には、検索用文字列で文字列を検索し、検索された文字列に対応する音声を再生することが可能となる。 Further, when the device 130 with the voice recognition function has a function of storing a character string, the result converted into the character string can be accumulated on the spot. With this configuration, it is possible to record the content uttered with a smaller storage capacity than that for storing voice. Moreover, since it is converted into a character string, searching and the like are facilitated. When the decoded speech is stored in combination with a character string, the usefulness is further increased. Specifically, it is possible to search for a character string using a search character string and reproduce a sound corresponding to the searched character string.

また、認識機能付き装置１３０の音声認識エンジン１５０が、単語音声認識技術を用いたものである場合、その認識結果を使用して音声認識機能付き装置１３０の操作を行うことが可能である。例えば、音声認識機能付き装置がパーソナルコンピュータであり、その上でアプリケーションソフトを起動している場合、そのアプリケーションの操作を音声で行うことが可能となる。 In addition, when the speech recognition engine 150 of the device 130 with a recognition function uses a word speech recognition technology, it is possible to operate the device 130 with a speech recognition function using the recognition result. For example, when the apparatus with a voice recognition function is a personal computer and application software is activated on the personal computer, the operation of the application can be performed by voice.

（第７実施形態）
図３１は、本発明の第７実施形態に係る音声処理システムを示す。このシステムは、無線通信機能付きヘッドセット１７０と、第１の装置としての音声認識機能付き装置２００と、無線機能付き装置２００と無線通信可能な第２の装置（不図示）で構成される。音声認識機能付き装置２００は、音声受信手段２１０、音声認識エンジン２２０に加え、認識結果伝送手段２３０を有し、認識結果を第２の装置へ無線送信する。 (Seventh embodiment)
FIG. 31 shows an audio processing system according to the seventh embodiment of the present invention. This system includes a headset 170 with a wireless communication function, a device 200 with a voice recognition function as a first device, and a second device (not shown) capable of wireless communication with the device 200 with a wireless function. The device 200 with speech recognition function includes a recognition result transmission unit 230 in addition to the speech reception unit 210 and the speech recognition engine 220, and wirelessly transmits the recognition result to the second device.

音声受信手段２１０は図２４の音声受信手段１４０と同様である。音声認識エンジン２２０は単語音声認識技術、大語彙文音声認識技術のいずれを利用してもよい。ここでは単語音声認識技術を使用するものとする。 The voice receiving means 210 is the same as the voice receiving means 140 in FIG. The speech recognition engine 220 may use either word speech recognition technology or large vocabulary sentence speech recognition technology. Here, word speech recognition technology is used.

図３２は、単語音声技術を利用した場合の音声認識エンジン２２０の構成を示す。音響分析部２２３、モデル照合部２２５、認識語彙記憶部２２７、音声モデル作成・記憶部２２９は、第１実施形態の無線通信機能付きヘッドセット１０に設けられた音声認識部で用いられるのと同様の構成である。 FIG. 32 shows a configuration of the speech recognition engine 220 when the word speech technology is used. The acoustic analysis unit 223, the model matching unit 225, the recognition vocabulary storage unit 227, and the speech model creation / storage unit 229 are the same as those used in the speech recognition unit provided in the headset 10 with the wireless communication function of the first embodiment. It is the composition.

音声認識エンジン２２０から認識結果として出力される単語ＩＤは、認識結果伝送手段２３０に入力される。認識結果伝送手段２３０は、受け取った単語ＩＤを、他の機器に送信する。他の機器への送信方法として、無線通信、有線通信等が考えられるが、ここではその手段は問わない。 The word ID output as a recognition result from the speech recognition engine 220 is input to the recognition result transmission unit 230. The recognition result transmission unit 230 transmits the received word ID to another device. As a transmission method to other devices, wireless communication, wired communication, and the like can be considered, but the means is not limited here.

図３３は、図３１の音声処理システムの具体的動作を例示する。無線通信機能付きヘッドセット１７０を着用したユーザが、第１の装置としての音声認識機能付きパーソナルコンピュータを介して、第２の装置としてのエアコンを音声制御する。 FIG. 33 illustrates a specific operation of the speech processing system of FIG. A user wearing the headset 170 with the wireless communication function performs voice control of the air conditioner as the second device via the personal computer with the voice recognition function as the first device.

ユーザは、ヘッドセットの機能選択スイッチ１７４で、音声伝送モードを選択している。したがって、マイクロホン１７３で検出された「エアコンつける」という音声は、音声伝送手段１８３で符号化処理され、パーソナルコンピュータに無線通信により転送される。 The user selects the audio transmission mode with the function selection switch 174 of the headset. Therefore, the voice “turn on the air conditioner” detected by the microphone 173 is encoded by the voice transmission means 183 and transferred to the personal computer by wireless communication.

図３４は、パーソナルコンピュータ内の認識語彙記憶部２２７の記憶内容例を示す。認識語彙記憶部２２７は、「えあこんつける」、「えあこんとめる」、「おんどあげる」、「おんどさげる」という語彙に対応して、それぞれ単語ＩＤ「０１」、「０２」、「０３」、「０４」を与えて格納する。パーソナルコンピュータが「えあこんつける」という語彙を認識した場合、単語ＩＤ「０１」がエアコンに対して無線送信されることになる。 FIG. 34 shows an example of stored contents of the recognized vocabulary storage unit 227 in the personal computer. The recognition vocabulary storage unit 227 corresponds to the vocabulary of “Eakontsu”, “Eatontotume”, “Ondo Rae”, and “Ondo Sageru”, respectively, with word IDs “01”, “02”, “03” and “04” are given and stored. When the personal computer recognizes the vocabulary “Eakontsu”, the word ID “01” is wirelessly transmitted to the air conditioner.

認識語彙記憶部２２７の記憶内容にしたがって、音声モデル作成・記憶部２２９で新たな記憶内容が作成され記憶される。この例の場合、「えあこんつける」、「えあこんとめる」、「おんどあげる」、「おんどさげる」の各単語に対応する音響モデルが作成され、各単語の単語ＩＤと組になって記憶される。 According to the stored contents of the recognized vocabulary storage unit 227, new stored contents are created and stored in the speech model creation / storage unit 229. In this example, acoustic models corresponding to the words “Eakontsu,” “Eakontome,” “Odonageru,” and “Ondosageru” are created and paired with the word ID of each word. Will be remembered.

一方、エアコンは、図３５に示すように、それぞれの単語ＩＤと、それに対応する動作とを組にして記憶し、特定の単語ＩＤを受信したときに、その単語ＩＤに対応した動作を行う。 On the other hand, as shown in FIG. 35, the air conditioner stores each word ID and the corresponding operation as a set, and performs an operation corresponding to the word ID when a specific word ID is received.

パーソナルコンピュータの音声受信手段２１０で受信された符号化音声は符号化音声復号部で音声信号に変換され、音声認識エンジン２２０に入力される。音声信号は音響分析部２２３で特徴パラメータ系列に変換され、モデル照合部２２５に入力される。モデル照合部２２５は、入力された特徴パラメータ系列と、音響モデル作成・記憶部２２９に記憶された各単語の音声モデルを照合する。「えあこんつける」に対応する音声モデルの類似度がもっとも高くなった場合に、照合部２２５は認識結果として単語ＩＤ「０１」を出力する。 The coded speech received by the speech receiving means 210 of the personal computer is converted into a speech signal by the coded speech decoding unit and input to the speech recognition engine 220. The audio signal is converted into a characteristic parameter series by the acoustic analysis unit 223 and input to the model matching unit 225. The model collation unit 225 collates the input feature parameter series with the speech model of each word stored in the acoustic model creation / storage unit 229. When the similarity of the speech model corresponding to “Eakontsu” is the highest, the collation unit 225 outputs the word ID “01” as the recognition result.

単語ＩＤ「０１」は認識結果伝送手段２３０に入力され、無線通信により、エアコンに対して単語ＩＤ「０１」が送信される。エアコンは単語ＩＤ「０１」を受信すると、図３５のテーブルにしたがって、単語ＩＤに対応するエアコン機能の動作を開始する。 The word ID “01” is input to the recognition result transmission unit 230, and the word ID “01” is transmitted to the air conditioner by wireless communication. When the air conditioner receives the word ID “01”, the operation of the air conditioner function corresponding to the word ID is started according to the table of FIG.

この構成により、無線通信機能付きヘッドセット１７０のマイクロホン１７３で検出されたユーザの音声は、ほぼリアルタイムで音声認識機能付き装置２００で音声認識され、その認識結果を別の機器に送信することが可能となる。 With this configuration, the user's voice detected by the microphone 173 of the headset 170 with the wireless communication function can be recognized by the apparatus 200 with the voice recognition function almost in real time, and the recognition result can be transmitted to another device. It becomes.

音声認識機能付き装置２００がパーソナルコンピュータのように演算能力が大きい場合には、その音声認識エンジン２２０は、ヘッドセットの音声認識部１７７よりも機能的な制限が少なくなり、例えば認識語彙を大幅に増やすことができる。また、音声認識機能付き装置２００の音声認識機能がなんらかの理由で使用できなくなった場合でも、ヘッドセットの音声認識部１７７で処理するように機能選択スイッチ１７４を切り替えれば、音声を用いた機器操作を続行することが可能である。 When the device 200 with a speech recognition function has a large computing capacity like a personal computer, the speech recognition engine 220 has fewer functional restrictions than the speech recognition unit 177 of the headset, for example, greatly increases the recognition vocabulary. Can be increased. Even if the voice recognition function of the device with voice recognition function 200 cannot be used for some reason, if the function selection switch 174 is switched so as to be processed by the voice recognition unit 177 of the headset, device operation using voice can be performed. It is possible to continue.

音声認識エンジン２２０に、図２４の音声認識エンジン１５０と同様に大語彙文音声認識技術を用いた場合には、文字列に変換した結果を直ちに他の機器に転送することが可能になる。文字列を転送するのに必要な通信量は、音声を転送するのに必要な通信量と比べて小さいため、通信量を削減することができる。本システムでは発声とほぼ同時に、その音声の認識を行うことができる。従来のように、蓄積した音声を認識して、その結果を転送する技術では、すべての発声が終わった後で音声認識技術を使用し、その後転送するので、時間的な遅れがどうしても生じるが、第６実施形態のシステムでは、ユーザの発声と平行して音声を認識するため、時間的な遅れを削減することができる。 When the large vocabulary speech recognition technology is used for the speech recognition engine 220 as in the speech recognition engine 150 of FIG. 24, the result of conversion into a character string can be immediately transferred to another device. Since the communication amount necessary for transferring the character string is smaller than the communication amount necessary for transferring the voice, the communication amount can be reduced. This system can recognize the voice almost simultaneously with the utterance. In the conventional technology that recognizes the accumulated speech and transfers the result, the speech recognition technology is used after all the utterances are finished, and then the transfer is performed. In the system of the sixth embodiment, since the voice is recognized in parallel with the user's utterance, the time delay can be reduced.

以上、上述した実施形態では、ヘッドセット内、あるいは外部機器側の音声認識として単語認識を例にとって説明したが、本発明はこれに限定されない。特に、ヘッドセット内部では、連続単語認識、文認識、単語スポッティング、音声意図理解などの、演算量、メモリ、消費電力の少ない簡便な音声認識であれば、任意の音声認識を行うことができる。 As described above, in the above-described embodiment, the word recognition is described as an example of the voice recognition in the headset or on the external device side, but the present invention is not limited to this. In particular, in the headset, any speech recognition can be performed as long as it is a simple speech recognition with a small amount of calculation, memory, and power consumption, such as continuous word recognition, sentence recognition, word spotting, and speech intent understanding.

本発明の第１実施形態に係る無線通信機能付きヘッドセットの概略図である。1 is a schematic diagram of a headset with a wireless communication function according to a first embodiment of the present invention. 図１のヘッドセットの概略ブロック図である。FIG. 2 is a schematic block diagram of the headset of FIG. 図２の機能選択スイッチの一例を示す図である。It is a figure which shows an example of the function selection switch of FIG. 図２の音声認識部の内部構成例を示す図である。It is a figure which shows the internal structural example of the speech recognition part of FIG. 図４の認識語彙記憶部の記憶内容例を示す図である。It is a figure which shows the example of the memory content of the recognition vocabulary memory | storage part of FIG. エアコンが受け取った単語ＩＤと、エアコンの動作の対応を示す図である。It is a figure which shows the response | compatibility of word ID which the air conditioner received, and the operation | movement of an air conditioner. 機能選択スイッチにより音声認識モードのＯＮ／ＯＦＦ制御を示す図である。It is a figure which shows ON / OFF control of voice recognition mode by a function selection switch. 本発明の第２実施形態に係る無線機能付きヘッドセットのシステム構成を示す概略図である。It is the schematic which shows the system configuration | structure of the headset with a wireless function which concerns on 2nd Embodiment of this invention. 図８のヘッドセットで使用される機能選択スイッチの一例を示す図である。It is a figure which shows an example of the function selection switch used with the headset of FIG. 図８に示す音声伝送手段の内部構成を示す図出ある。FIG. 9 is a diagram showing an internal configuration of the voice transmission means shown in FIG. 8. 機能選択スイッチにより、音声認識と音声伝送処理を切り替え選択する図である。It is a figure which switches and selects voice recognition and voice transmission processing by a function selection switch. 本発明の第３実施形態に係る無線通信機能付きヘッドセットのシステム構成を示す概略図である。It is the schematic which shows the system configuration | structure of the headset with a wireless communication function which concerns on 3rd Embodiment of this invention. 図１２に示す機能選択スイッチの一例を示す図である。It is a figure which shows an example of the function selection switch shown in FIG. 図１３の機能選択スイッチにより、音声認識モードまたは音声伝送モードを選択したときの図である。It is a figure when the voice recognition mode or the voice transmission mode is selected by the function selection switch of FIG. 図１３の機能選択スイッチにより、ＯＦＦモードで音声認識と音声伝送のいずれも行わない例を示す図である。FIG. 14 is a diagram illustrating an example in which neither voice recognition nor voice transmission is performed in the OFF mode by the function selection switch of FIG. 13. 本発明の第４実施形態に係る無線通信機能付きヘッドセットのシステム構成を示す概略図である。It is the schematic which shows the system configuration | structure of the headset with a wireless communication function which concerns on 4th Embodiment of this invention. 図１６の機能選択スイッチの一例を示す図である。It is a figure which shows an example of the function selection switch of FIG. 図１７の機能選択スイッチで、音声認識モードまたは音声伝送モードを選択したときの図である。It is a figure when the voice recognition mode or the voice transmission mode is selected with the function selection switch of FIG. 図１７の機能選択スイッチで、音声認識と音声伝送の双方で音声の処理を行う例を示す図である。It is a figure which shows the example which processes an audio | voice by both voice recognition and audio | voice transmission with the function selection switch of FIG. 本発明の第５実施形態に係る無線通信機能付きヘッドセットのシステム構成を示す概略図である。It is the schematic which shows the system configuration | structure of the headset with a wireless communication function which concerns on 5th Embodiment of this invention. 図２０の機能選択スイッチの一例を示す図である。It is a figure which shows an example of the function selection switch of FIG. 図２０の機能選択スイッチで、音声認識モードまたは音声伝送モードを選択したときの図である。It is a figure when the voice recognition mode or the voice transmission mode is selected by the function selection switch of FIG. 図１７の機能選択スイッチで、音声認識と音声伝送の双方で処理するモード、またはいずれでも処理を行わないＯＦＦモードを選択したときの図である。FIG. 18 is a diagram when the function selection switch in FIG. 17 selects a mode for processing both voice recognition and voice transmission, or an OFF mode in which neither is processed. 本発明の第６実施形態に係る音声処理システムの概略構成図である。It is a schematic block diagram of the speech processing system which concerns on 6th Embodiment of this invention. 図２４のシステムにおける音声認識機能付き装置の音声受信手段の構成例を示す図である。It is a figure which shows the structural example of the audio | voice reception means of the apparatus with a speech recognition function in the system of FIG. 図２４のシステムにおける音声認識機能付き装置の音声認識エンジンの構成例を示す図である。It is a figure which shows the structural example of the speech recognition engine of the apparatus with a speech recognition function in the system of FIG. 図２４のシステムの使用例を示す図である。It is a figure which shows the usage example of the system of FIG. 図２６の認識語彙記憶部の記憶内容例を示す図である。It is a figure which shows the example of a memory content of the recognition vocabulary memory | storage part of FIG. 図２６の言語モデル記憶部の記憶内容例を示す図である。It is a figure which shows the example of a memory content of the language model memory | storage part of FIG. 図２４の音声認識機能付き装置の画面表示例を示す図である。It is a figure which shows the example of a screen display of the apparatus with a speech recognition function of FIG. 本発明の第６実施形態に係る音声処理システムの変形例を示す図である。It is a figure which shows the modification of the speech processing system which concerns on 6th Embodiment of this invention. 図３１のシステムにおける音声認識機能付き装置の音声認識エンジンの構成例である。FIG. 32 is a configuration example of a speech recognition engine of a device with a speech recognition function in the system of FIG. 31. FIG. 図３１に示す音声処理システムの使用例を示す図である。It is a figure which shows the usage example of the audio processing system shown in FIG. 図３１のシステムにおける認識語彙記憶部の記憶内容例を示す図である。It is a figure which shows the example of the memory content of the recognition vocabulary memory | storage part in the system of FIG. 図３３に示す使用例で、エアコンがＰＣ経由で受け取った単語ＩＤと、エアコンの動作の対応を示す図である。It is a figure which shows the response | compatibility of word ID which the air conditioner received via PC in the usage example shown in FIG. 33, and operation | movement of an air conditioner.

Explanation of symbols

１０、１１０、１７０ヘッドセット
１３、１１３、１７３マイクロホン
１４、５１、６１、７１、８１、１１４、１７４機能選択スイッチ
１７スピーカ
１６ＣＰＵボード
１７無線通信モジュール
１９、１１９、１８１機能選択部
２０、５０、６０、７０、８０機能選択手段
２１、１２１、７５Ａ／Ｄ変換器
２３、１２３、１７７音声認識部
２５、１２５、１７８、２３０認識結果伝送手段
４１認識用信号遮断機
４３、１５１、２２３音響分析部
４５、１５５、２２５モデル照合部
４７、１５７、２２７認識語彙記憶部
４９、１５９、２２９音声モデル作成・記憶部
５３、１５３、１８３音声伝送手段
５５伝送用信号遮断機
５７音声符号化部
５９音声伝送部
１３０、２００音声認識機能付き装置
１４０、２１０音声受信手段
１４１符号化音声受信部
１４３符号化音声復号部
１５０、２２０音声認識エンジン
１６１言語モデル記憶部
１６３単語ＩＤ表記変換

10, 110, 170 Headset 13, 113, 173 Microphone 14, 51, 61, 71, 81, 114, 174 Function selection switch 17 Speaker 16 CPU board 17 Wireless communication module 19, 119, 181 Function selection unit 20, 50, 60, 70, 80 Function selection means 21, 121, 75 A / D converters 23, 123, 177 Speech recognition units 25, 125, 178, 230 Recognition result transmission means 41 Recognition signal blockers 43, 151, 223 Acoustic analysis 45, 155, 225 Model matching unit
47, 157, 227 Recognition vocabulary storage unit 49, 159, 229 Speech model creation / storage unit 53, 153, 183 Speech transmission means 55 Transmission signal blocker 57 Speech coding unit 59 Speech transmission unit 130, 200 With speech recognition function Devices 140 and 210 Speech receiving means 141 Encoded speech receiving unit 143 Encoded speech decoding unit 150 and 220 Speech recognition engine 161 Language model storage unit 163 Word ID notation conversion

Claims

A headset with wireless communication function;
An external device capable of wireless communication with the headset, and the headset with the wireless communication function includes:
A microphone that detects a voice of a wearer of the headset and generates an audio signal;
Voice recognition means for recognizing the voice signal and generating an identification signal corresponding to the content of the recognized voice signal;
A speech processing system comprising: recognition result transmission means for sending the identification signal generated by the voice recognition means to the external device by wireless communication, wherein the external device performs an operation corresponding to the received identification signal.

The audio processing system according to claim 1, wherein the external device includes a table that stores a plurality of identification signals and operations corresponding to the identification signals in association with each other.

The voice processing system according to claim 1, wherein the headset further includes a function selection unit that switches whether or not a voice signal generated by the microphone is processed by a voice recognition unit.

A headset with wireless communication function;
An external device with a voice recognition function capable of wireless communication with the headset, the headset with a wireless communication function,
A microphone that detects a voice of a wearer of the headset and generates an audio signal;
Audio transmission means for transmitting the audio signal to the external device by wireless communication, the external device,
Audio receiving means for receiving an audio signal transmitted from the headset;
A speech processing system comprising speech recognition means for recognizing the received speech signal.

The voice processing system according to claim 4, wherein the external device performs an operation according to a recognition result by the voice recognition unit.

The external device further includes a display unit,
The voice recognition means recognizes the received voice signal, generates an identification signal corresponding to the content of the recognized voice signal, converts the identification signal into a character, and outputs it.
The voice processing system according to claim 4, wherein the display unit displays a character that is a recognition result.

The voice processing system according to claim 4, wherein the headset further includes voice recognition means for recognizing the voice signal.

A headset with wireless communication function;
A first external device having a voice recognition function and capable of wireless communication with the headset;
A second external device capable of wireless communication with the first external device, and the headset with the wireless communication function includes:
A microphone that detects a voice of a wearer of the headset and generates an audio signal;
Audio transmission means for transmitting the audio signal to the first external device by wireless communication, and the first external device comprises:
Audio receiving means for receiving an audio signal transmitted from the headset;
Voice recognition means for recognizing the received voice signal and generating an identification signal corresponding to the content of the recognized voice signal;
Recognition result transmission means for transmitting the identification signal to the second external device by wireless communication, wherein the second external device performs an operation corresponding to the word ID received from the first external device. Processing system.