JP2000207170A

JP2000207170A - Device and method for processing information

Info

Publication number: JP2000207170A
Application number: JP11008195A
Authority: JP
Inventors: Takashi Sasai; 崇司笹井; Masakazu Hattori; 雅一服部; Hiroshi Tsunoda; 弘史角田; Yasuhiko Kato; 靖彦加藤
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1999-01-14
Filing date: 1999-01-14
Publication date: 2000-07-28

Abstract

PROBLEM TO BE SOLVED: To present information easy to comprehend by users and to exchange additional information or high-accuracy information with a few errors between devices. SOLUTION: A CPU 21 generates an output audio signal by adding the additional information to an audio signal so as not to affect the hearing of voices due to that audio signal. The additional information is information related with the audio signal, for example. The voice composed of this output audio signal is outputted from a loudspeaker 12 towards an opposite side device. The opposite side device obtains an input audio signal by fetching that voice from a microphone 13 and supplies this audio signal to the CPU 21. The CPU 21 extracts the additional information from that input audio signal and performs processing, based on that additional information. For example, a phrase included in a natural language expressed by the voice is displayed on a display part 15. Furthermore, for example, voice recognizing processing is performed while limiting the object area of the natural language expressed by the voice with the additional information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声による情報
の通信を可能とする情報処理装置および情報処理方法に
関する。詳しくは、音声信号に対しその音声信号による
音声の聞き取りに影響しない態様で付加情報を付加して
出力音声信号を生成し、その出力音声信号を音声に変換
して出力することによって、利用者には音声により理解
しやすい情報提示を行うことができ、装置間では付加的
な情報や誤りの少ない精度の高い情報のやり取りを実現
できるようにした情報処理装置等に係るものである。[0001] 1. Field of the Invention [0002] The present invention relates to an information processing apparatus and an information processing method capable of communicating information by voice. Specifically, by adding additional information to the audio signal in a manner that does not affect the listening of the audio by the audio signal, generating an output audio signal, converting the output audio signal into audio and outputting the audio signal, The present invention relates to an information processing apparatus or the like which can provide information that can be easily understood by voice, and realizes exchange of additional information and highly accurate information with little error between apparatuses.

【０００２】[0002]

【従来の技術】複数の情報処理装置間の通信方法として
は、ケーブルなどによって装置間を接続したり、赤外線
や電波などを用いて無線で通信を行う方法がある。一般
には、二つの装置を一対一で接続するが、途中でハブな
どの特別な装置を用いることにより、三つ以上の装置間
での通信を行うこともできる。2. Description of the Related Art As a communication method between a plurality of information processing apparatuses, there is a method of connecting the apparatuses by using a cable or the like, or a method of performing wireless communication using infrared rays or radio waves. Generally, two devices are connected one-to-one, but communication between three or more devices can be performed by using a special device such as a hub on the way.

【０００３】利用者から情報処理装置への入力において
は、音声認識技術を用いることにより、自然言語を介し
た情報伝達を行うことができる。さらに、受信側でその
自然言語を構文解析や意味解析を行うことにより、利用
者に様々なサービスを提供することも可能である。[0003] In input from a user to an information processing apparatus, information can be transmitted through a natural language by using a voice recognition technique. Further, by performing syntax analysis and semantic analysis of the natural language on the receiving side, various services can be provided to the user.

【０００４】また、広範囲の利用者への出力としては、
スピーカを介した音声出力が広く用いられている。音声
が広範囲に伝搬すること、スピーカなど必要となる装置
が比較的安価に実現可能であることなどから、例えば、
駅や店内などの構内アナウンスのシステムなどで実施さ
れている。[0004] Outputs to a wide range of users include:
Audio output via speakers is widely used. Because the sound propagates over a wide area, and the necessary devices such as speakers can be realized at relatively low cost, for example,
It is implemented in the premises announcement system at stations and stores.

【０００５】[0005]

【発明が解決しようとする課題】ところが、ケーブルな
ど有線によって装置間を接続する方法では、装置にケー
ブルを接続するための端子が必要になり、また、接続の
ためのケーブルが必要になる。さらに、三つ以上の装置
間で通信を行うには、加えてハブ等の特別な装置が必要
になる。このように、通信を行うには事前の準備が必要
となり、即座に通信が行えないという課題があった。However, in a method of connecting apparatuses by wire such as a cable, a terminal for connecting a cable to the apparatus is required, and a cable for connection is required. Further, in order to perform communication between three or more devices, a special device such as a hub is additionally required. As described above, prior preparation is required to perform communication, and there is a problem that communication cannot be performed immediately.

【０００６】また、赤外線や電波などを用いる方法で
は、通信を行うためだけに特別な機構を備える必要があ
り、装置の小型化や低価格化の妨げとなるという課題が
あった。Further, in the method using infrared rays or radio waves, it is necessary to provide a special mechanism only for performing communication, and there is a problem that miniaturization and cost reduction of the apparatus are hindered.

【０００７】また一般に、装置間でどのようなデータが
やりとりされているのかは、利用者にはわからない。こ
れを利用者にわかるようにするには、画面表示や音声出
力など、通信路とは別の手段を用いなければならないと
いう課題があった。[0007] In general, the user does not know what data is exchanged between devices. In order for the user to know this, there is a problem that means other than the communication path, such as screen display and audio output, must be used.

【０００８】また情報処理装置への入力インタフェース
として音声入力を用いた場合、現状の技術では利用者の
発話を完全に正しく認識することはできない。特に、認
識語彙数が認識性能に大きな影響を与えるため、精度の
高い認識を行うにはなるべく語彙数を制限する必要があ
り、認識の対象を絞り込むなどの工夫を行う必要があっ
た。また、現状の音声認識では、利用者の発話に内在す
る感情などの情報はほとんど得ることはできない。さら
に、構文解析や意味解析においても、発話者の入力意図
を完全に反映したものを導くことは非常に困難である。[0008] Further, when voice input is used as an input interface to an information processing apparatus, it is not possible to completely and correctly recognize a user's utterance with the current technology. In particular, since the number of recognized vocabularies greatly affects recognition performance, it is necessary to limit the number of vocabularies as much as possible in order to perform highly accurate recognition, and it is necessary to take measures such as narrowing down the recognition targets. Further, in the current speech recognition, information such as emotions inherent in the utterance of the user can hardly be obtained. Furthermore, it is very difficult for syntactic analysis and semantic analysis to derive a statement that completely reflects the input intention of the speaker.

【０００９】また、音声入出力を備えた情報処理装置に
おいては、対話的に情報交換を行うことが可能であり、
互いに問い合わせをやり取りすることで装置間の情報交
換ができるが、問い合わせの結果わからなかった場合に
は何も情報交換がなされない。このため、再度問い合わ
せるには、利用者は再び音声入力する必要があり、そこ
では認識誤りなどが生じる可能性があり、利用者にとっ
て負担になるという問題があった。[0009] Further, in an information processing apparatus provided with voice input / output, information can be exchanged interactively.
Information can be exchanged between devices by exchanging inquiries with each other, but no information is exchanged if the result of the inquiry is unknown. For this reason, in order to make an inquiry again, the user needs to input the voice again, and there is a possibility that a recognition error or the like may occur, which causes a problem that the user is burdened.

【００１０】また利用者への情報提示手段として音声出
力を用いた場合、例えば、現状のアナウンスでは、耳が
聞こえない人には全く情報が伝わらない。さらに自然言
語で表現されているため、利用者が使用する言語の違い
がある場合にも情報が伝わらない。また、音声による店
内アナウンスでは利用者がその場所の地理に精通してい
ない場合が多く、言葉で表現された場所などもわかりに
くい。When voice output is used as means for presenting information to a user, for example, in the current announcement, no information is transmitted to a person who cannot hear. Further, since the language is expressed in a natural language, information is not transmitted even when there is a difference in a language used by a user. In addition, in-store announcements made by voice often make it difficult for the user to be familiar with the geography of the place, and it is difficult to understand places expressed in words.

【００１１】そこで、この発明では、上述した課題を解
決し得る情報処理装置および情報処理方法を提供するこ
とを目的とする。Therefore, an object of the present invention is to provide an information processing apparatus and an information processing method that can solve the above-mentioned problems.

【００１２】[0012]

【課題を解決するための手段】請求項１の発明に係る情
報処理装置は、音声信号を発生する音声信号発生手段
と、付加情報を発生する付加情報発生手段と、音声信号
に対しその音声信号による音声の聞き取りに影響しない
態様で付加情報を付加して出力音声信号を生成する情報
付加手段と、出力音声信号による音声を出力する音声出
力手段とを備えるものである。According to a first aspect of the present invention, there is provided an information processing apparatus comprising: an audio signal generating means for generating an audio signal; an additional information generating means for generating additional information; And an audio output unit that outputs an audio signal based on the output audio signal by adding additional information in a manner that does not affect the listening of the audio signal.

【００１３】請求項１７の発明に係る情報処理装置は、
音声を入力しその音声に対応する入力音声信号を得る音
声入力手段と、入力音声信号に付加されている付加情報
を抽出する付加情報抽出手段と、抽出された付加情報を
使用した処理をする情報処理手段とを備えるものであ
る。An information processing apparatus according to a seventeenth aspect of the present invention
Voice input means for inputting voice and obtaining an input voice signal corresponding to the voice, additional information extracting means for extracting additional information added to the input voice signal, and information for processing using the extracted additional information Processing means.

【００１４】請求項３８の発明に係る情報処理装置は、
音声信号を発生する音声信号発生手段と、付加情報を発
生する付加情報発生手段と、音声信号に対しその音声信
号による音声の聞き取りに影響しない態様で付加情報を
付加して出力音声信号を得る情報付加手段と、出力音声
信号による音声を出力する音声出力手段と、音声を入力
しその音声に対応する入力音声信号を得る音声入力手段
と、入力音声信号に付加されている付加情報を抽出する
付加情報抽出手段と、抽出された付加情報を使用した処
理をする情報処理手段とを備えるものである。The information processing apparatus according to claim 38 is
Audio signal generating means for generating an audio signal, additional information generating means for generating additional information, and information for obtaining an output audio signal by adding additional information to the audio signal in a manner that does not affect the listening of the audio by the audio signal Addition means, audio output means for outputting audio based on the output audio signal, audio input means for inputting audio to obtain an input audio signal corresponding to the audio, and addition for extracting additional information added to the input audio signal It is provided with an information extracting means and an information processing means for performing a process using the extracted additional information.

【００１５】この発明においては、音声による通信が可
能となる。この場合の音声は、本来の音声信号に対しそ
の音声信号による音声の聞き取りに影響しない態様で付
加情報を付加することで得られる出力音声信号によるも
のである。付加情報は、例えば本来の音声信号と関連す
る情報である。例えば、付加情報の付加は、音声信号中
の、アタック部分を除くと共に、広帯域である部分に瞬
断区間を形成し、その瞬断区間を用いて行われる。また
例えば、付加情報は、スペクトル拡散信号として音声信
号に付加される。これにより、利用者には音声により理
解しやすい情報提示を行うことができ、装置間では付加
的な情報や誤りの少ない精度の高い情報のやりとりを実
現することが可能となる。[0015] In the present invention, voice communication becomes possible. The voice in this case is an output voice signal obtained by adding additional information to the original voice signal in a manner that does not affect the listening of the voice by the voice signal. The additional information is, for example, information related to the original audio signal. For example, the addition of the additional information is performed by removing an attack portion in the audio signal, forming an instantaneous interruption section in a wide band portion, and using the instantaneous interruption section. Further, for example, the additional information is added to the audio signal as a spread spectrum signal. Thus, it is possible to present information that is easy for the user to understand by voice, and it is possible to realize exchange of additional information and highly accurate information with little error between devices.

【００１６】[0016]

【発明の実施の形態】以下、図面を参照しながら、この
発明の実施の形態について説明する。図１は、第１の実
施の形態としての情報処理装置１０の概観を示してい
る。この情報処理装置１０の本体１１には、音声を出力
するためのスピーカ１２が設けられていると共に、音声
を入力するためのマイクロフォン１３が設けられてい
る。また、この本体１１には、マイクロフォン１３を介
して音声を入力するとき操作されるトークスイッチ１４
が設けられている。この場合、トークスイッチ１４が利
用者に操作されたとき、マイクロフォン１３からの音声
入力が可能となる。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows an overview of an information processing apparatus 10 according to the first embodiment. A main body 11 of the information processing apparatus 10 is provided with a speaker 12 for outputting sound and a microphone 13 for inputting sound. The main body 11 has a talk switch 14 operated when inputting voice through the microphone 13.
Is provided. In this case, when the user operates the talk switch 14, voice input from the microphone 13 becomes possible.

【００１７】また、本体１１には、その中央部に、プロ
グラムのＧＵＩ（Graphical User Interface）を表示す
るために表示部１５が設けられている。さらに、この表
示部１５の表面上には、利用者がタッチペン１または指
などを用いて接触することにより、指示された位置に対
応する信号を出力する、いわゆるタッチパネル（タッチ
タブレット）１６が配置されている。The main body 11 is provided at its center with a display section 15 for displaying a GUI (Graphical User Interface) of the program. Further, on the surface of the display unit 15, a so-called touch panel (touch tablet) 16 that outputs a signal corresponding to the designated position when the user makes contact with the touch pen 1 or a finger or the like is arranged. ing.

【００１８】ここで、タッチパネル１６は、ガラスまた
は樹脂等の透明な材料により構成されている。そのた
め、利用者は、表示部１５に表示される画像を、タッチ
パネル１６を通して見ることができる。また、利用者
は、タッチペン１７を用いてタッチパネル１６に所定の
文字を入力したり、表示部１５に表示されている所定の
オブジェクト（アイコン）の選択または実行などを行う
ことができる。Here, the touch panel 16 is made of a transparent material such as glass or resin. Therefore, the user can see the image displayed on the display unit 15 through the touch panel 16. Further, the user can use the touch pen 17 to input predetermined characters on the touch panel 16, select or execute a predetermined object (icon) displayed on the display unit 15, and the like.

【００１９】図２は、情報処理装置１０の回路構成を示
している。内部バス２０は、ＣＰＵ（central processi
ng unit）２１、ＲＯＭ（read only memory）２２、Ｒ
ＡＭ（random access memory）２３、表示制御部２４、
入力インタフェース２５、および音声合成部２６を相互
に接続している。これにより、各部は、内部バス２０を
介してデータの授受を行うことができる。ＣＰＵ２１
は、ＲＯＭ２２またはＲＡＭ２３に記憶されているプロ
グラムまたは各種のデータに従って、各種の処理を実行
するようになされている。FIG. 2 shows a circuit configuration of the information processing apparatus 10. The internal bus 20 has a CPU (central process
ng unit) 21, ROM (read only memory) 22, R
AM (random access memory) 23, display control unit 24,
The input interface 25 and the voice synthesizer 26 are connected to each other. Thus, each unit can transmit and receive data via the internal bus 20. CPU 21
Executes various processes according to programs or various data stored in the ROM 22 or the RAM 23.

【００２０】表示制御部２４は、ＣＰＵ２１より供給さ
れた情報に対応して、表示部１５に表示する画像のデー
タを生成し、表示部１５にその画像を表示させる。入力
検出部２７は、タッチパネル１６やトークスイッチ１４
の入力を検出し、対応する操作信号を入力インタフェー
ス２５に供給するようになされている。Ａ／Ｄ変換部２
８は、マィクロフォン１３より出力される音声信号を、
アナログ信号からデジタル信号に変換し、入力インタフ
ェース２５に供給するようになされている。The display control section 24 generates data of an image to be displayed on the display section 15 in accordance with the information supplied from the CPU 21 and causes the display section 15 to display the image. The input detection unit 27 includes the touch panel 16 and the talk switch 14.
Is detected, and a corresponding operation signal is supplied to the input interface 25. A / D converter 2
Reference numeral 8 denotes an audio signal output from the microphone 13;
An analog signal is converted into a digital signal and supplied to the input interface 25.

【００２１】入力インタフェース２５は、Ａ／Ｄ変換部
２８より供給された音声信号、または入力検出部２７よ
り供給された操作信号を受け付け、ＣＰＵ２１に供給す
るようになされている。マイクロフォン１３、Ａ／Ｄ変
換部２８および入力インタフェース２５を介して音声信
号が入力されると、ＣＰＵ２１は、ＲＯＭ２２またはＲ
ＡＭ２３に記憶されているデータを参照して、音声信号
に付加されている付加情報の抽出処理を実行する。The input interface 25 receives the audio signal supplied from the A / D conversion section 28 or the operation signal supplied from the input detection section 27 and supplies it to the CPU 21. When an audio signal is input through the microphone 13, the A / D converter 28, and the input interface 25, the CPU 21
With reference to the data stored in the AM 23, a process of extracting additional information added to the audio signal is executed.

【００２２】音声合成部２６は、ＣＰＵ２１より供給さ
れる音声合成に必要なパラメータやテキストデータに基
づいて合成音声を生成し、スピーカ１２を介して出力さ
せる。また、音声合成部２６は、マイクロフォン１３を
介してＲＡＭ２３に録音された音声を再生する場合にも
用いられる。さらに、音声合成部２６は、ＣＰＵ２１よ
り他の装置へ送信すべき付加情報が供給されている場合
には、合成音声あるいは録音された音声への付加情報の
付加処理を行い、スピーカ１２を介して出力させる。The voice synthesizer 26 generates a synthesized voice based on parameters and text data necessary for voice synthesis supplied from the CPU 21 and outputs the synthesized voice via the speaker 12. The voice synthesizing unit 26 is also used when reproducing voice recorded in the RAM 23 via the microphone 13. Further, when additional information to be transmitted to another device is supplied from the CPU 21, the voice synthesis unit 26 performs a process of adding the additional information to the synthesized voice or the recorded voice, and Output.

【００２３】図３は、上述したＣＰＵ２１の内部動作の
機能ブロック図を示している。情報処理部３０は、様々
な情報処理を行うものであり、本実施の形態において
は、個人のスケジュール管理を行うアプリケーションプ
ログラムとしての処理を行う。FIG. 3 shows a functional block diagram of the internal operation of the CPU 21 described above. The information processing section 30 performs various types of information processing. In the present embodiment, the information processing section 30 performs processing as an application program for managing a personal schedule.

【００２４】情報付加部３１は、情報処理部３０から提
供された音声信号ＳＡａに対して、情報処理部３０から
提供された特定の付加情報ＩＦａを付加して出力音声信
号ＳＡoutを生成する。ここで、音声信号ＳＡａに対す
る付加情報ＩＦａの付加は、その音声信号ＳＡａによる
音声の聞き取りに影響しない態様で行われる。The information adding section 31 generates an output audio signal SAout by adding specific additional information IFa provided from the information processing section 30 to the audio signal SAa provided from the information processing section 30. Here, the addition of the additional information IFa to the audio signal SAa is performed in a manner that does not affect the listening of the audio by the audio signal SAa.

【００２５】例えば、特開平１０−１６２５０１号公報
に記載される方法を用いて付加処理が行われる。この場
合、情報付加部３１では、以下のような付加処理が行わ
れる。まず、音声信号ＳＡａの急激な立ち上がりかつ大
振幅部分をアタックとして検出すると共に、音声信号Ｓ
Ａａの予め定められた長さの区間についてスペクトル分
析をする。そして、音声信号ＳＡａ中の、アタック部分
を除くと共に、広帯域である部分に瞬断区間を形成し、
その遮断区間を用いて付加情報ＩＦａを付加して出力音
声信号ＳＡoutを生成する。アタック部分は音質に与え
る影響が大きい。また、帯域幅が広いほどクリック音は
聞こえにくくなる。したがって、上述のように付加処理
をすることで、瞬断によるクリック音をほとんど聞こえ
なくすることができ、音質の劣化がなく音声信号ＳＡａ
に付加情報ＩＦａを付加することが可能となる。For example, the additional processing is performed using a method described in Japanese Patent Application Laid-Open No. 10-162501. In this case, the information adding unit 31 performs the following additional processing. First, a sudden rising and large amplitude portion of the audio signal SAa is detected as an attack, and the audio signal Sa is detected.
A spectrum analysis is performed on a section of Aa having a predetermined length. Then, while removing the attack portion in the audio signal SAa, an instantaneous interruption section is formed in a wide band portion,
The output audio signal SAout is generated by adding the additional information IFa using the cutoff section. The attack part has a large effect on sound quality. Also, the wider the bandwidth, the harder it is to hear the click sound. Therefore, by performing the additional processing as described above, the click sound due to the instantaneous interruption can be made almost inaudible, and the sound signal SAa
Can be added to the additional information IFa.

【００２６】また例えば、人間の聴覚特性のなかの、あ
る大きな音声信号が存在するときにはその周波数の近傍
の低レベルの信号は聞こえない、若しくは非常に聞こえ
にくいという「聴覚マスキング特性」を利用した付加処
理を行うようにしてもよい。さらに例えば、最近注目さ
れているスペクトル拡散により、付加情報を音声信号に
重畳するようにしてもよい。この場合、付加情報ＩＦａ
をスペクトル拡散信号として音声信号ＳＡａに付加する
ものである。Further, for example, when a certain loud voice signal is present in a human auditory characteristic, a low-level signal near the frequency cannot be heard or is very hard to hear. Processing may be performed. Further, for example, the additional information may be superimposed on the audio signal by the spread spectrum which has recently attracted attention. In this case, the additional information IFa
Is added to the audio signal SAa as a spread spectrum signal.

【００２７】情報抽出部３２は、入力音声信号ＳＡin
（上述した出力音声信号ＳＡoutに対応）より付加情報
ＩＦｂを抽出する処理をする。この情報抽出部３２で行
われる処理は、情報付加部３１においてなされた処理方
法に依存する。またここでは、付加情報が挿入されてい
ない場合もある。[0027] The information extraction unit 32 receives the input audio signal SAin.
A process of extracting additional information IFb from (corresponding to the above-described output audio signal SAout) is performed. The processing performed by the information extracting unit 32 depends on the processing method performed by the information adding unit 31. Here, additional information may not be inserted.

【００２８】情報処理部３０は、実際のアプリケーショ
ン処理を行う部分であり、情報付加部３１に音声信号Ｓ
Ａａと付加情報ＩＦａを供給し、また、情報抽出部３２
から付加情報ＩＦｂまたは音声信号ＳＡｂが供給される
ようになされている。The information processing section 30 is a section for performing actual application processing.
Aa and the additional information IFa.
Supplies the additional information IFb or the audio signal SAb.

【００２９】次に、スケジュール管理を目的とする情報
処理を例として、図３に示した機能ブロック図で行われ
る処理の流れを説明する。図４および図５は、その処理
の流れを示したものであり、それぞれ外部に音声（情
報）を送出する場合、外部から音声（情報）を入力する
場合を表している。なお、以下の説明では、個人のスケ
ジュール管理を行う処理を用いるが、他の適用例におい
ても同様の手順で行うことができる。Next, the flow of processing performed in the functional block diagram shown in FIG. 3 will be described by taking information processing for schedule management as an example. FIGS. 4 and 5 show the flow of the processing, and show a case where voice (information) is transmitted to the outside and a case where voice (information) is input from the outside. In the following description, processing for managing an individual's schedule is used. However, the same procedure can be performed in other application examples.

【００３０】図４を参照して、音声出力動作を説明す
る。まず、ステップＳ４０において、利用者は、表示部
１５、タッチパネル１６およびタッチペン１７などの入
力インタフェースを用いて、送信すべき情報の選択を行
う。例えば、表示部１５のスケジュールの一項目を表す
オブジェクト「５／５１５：００会議」を選択する。
そして、ステップＳ４１において、情報処理部３０では
利用者によって選択された項目を表現するような自然な
文章（発話文）、例えば「５月５日の１５時から会議の
予定です」を生成し、さらに、システム内部で識別可能
なデータ表現の付加情報を生成する。The sound output operation will be described with reference to FIG. First, in step S40, the user selects information to be transmitted using input interfaces such as the display unit 15, the touch panel 16, and the touch pen 17. For example, the object “5/5 15:00 meeting” representing one item of the schedule on the display unit 15 is selected.
Then, in step S41, the information processing unit 30 generates a natural sentence (utterance sentence) that expresses the item selected by the user, for example, "a meeting is scheduled to be held from 15:00 on May 5". Further, additional information of a data expression that can be identified inside the system is generated.

【００３１】続くステップＳ４２において、音声合成部
２６ではステップＳ４１で生成した発話文を読み上げる
ための合成音声信号ＳＡａを生成する。さらに、ステッ
プＳ４３において、情報付加部３１では音声合成部２６
で生成した合成音声信号ＳＡａに、ステップＳ４１で生
成した付加情報ＩＦａを付加する。この際、付加処理
は、上述したように、合成音声信号ＳＡａによる音声の
聞き取りに影響しない態様で行われる。つまり、付加情
報ＩＦａの付加による音声信号ＳＡａの変化が、人間の
聴覚では識別不可能あるいは識別困難であるような手法
によって付加処理が行われる。In the following step S42, the speech synthesizer 26 generates a synthesized speech signal SAa for reading out the utterance sentence generated in step S41. Further, in step S43, the information adding unit 31 uses the voice synthesizing unit 26.
The additional information IFa generated in step S41 is added to the synthesized voice signal SAa generated in step S41. At this time, the additional processing is performed in a manner that does not affect the listening of the voice by the synthesized voice signal SAa, as described above. That is, the addition processing is performed by a method in which the change of the audio signal SAa due to the addition of the additional information IFa is indistinguishable or difficult to recognize by human hearing.

【００３２】最後に、ステップＳ４４において、ステッ
プＳ４３で生成された付加情報ＩＦａを含む出力音声信
号ＳＡoutをスピーカ１２に供給し、このスピーカ１２
より出力音声信号ＳＡoutによる音声を出力する。こう
して出力される音声はそれを聴く人間が上述した発話文
を自然言語によって理解することが可能なものである。
また他の情報処理装置１０では、その音声に係る入力音
声信号ＳＡinより付加情報ＩＦｂ（ＩＦａと同じ）を抽
出することで、上述した発話文を理解することが可能と
なる。Finally, in step S44, the output audio signal SAout including the additional information IFa generated in step S43 is supplied to the speaker 12, and
Then, a sound based on the output sound signal SAout is output. The voice output in this manner is such that a human listening to it can understand the utterance sentence described above in natural language.
Further, in another information processing apparatus 10, by extracting the additional information IFb (same as IFa) from the input audio signal SAin relating to the audio, it is possible to understand the utterance sentence described above.

【００３３】次に、図５を参照して、音声入力動作を説
明する。まず、ステップＳ５０において、マイクロフォ
ン１３で外部音声を取り入れて入力音声信号ＳＡinを得
る。外部音声の取り入れは、トークスイッチ１４が利用
者によって押されることを開始合図としたり、押されて
いる間を取り入れ区間としたり、あるレベル以上の音声
を検出したりする等して、その処理を行う。Next, the voice input operation will be described with reference to FIG. First, in step S50, an external voice is taken in by the microphone 13 to obtain an input voice signal SAin. Incorporation of the external voice is performed by, for example, signaling that the talk switch 14 is pressed by the user as a start signal, taking a period during which the talk switch 14 is being pressed as a capturing section, or detecting a voice of a certain level or more. Do.

【００３４】続く、ステップＳ５１において、情報抽出
部３２で入力音声信号ＳＡinに含まれている付加情報Ｉ
Ｆｂの抽出処理を行う。処理の結果、付加情報ＩＦｂが
抽出されなかった場合には処理を終了する。付加情報Ｉ
Ｆｂが抽出された場合、ステップＳ５２に進み、その付
加情報ＩＦｂに基づいた処理が情報処理部３０において
なされる。例えば、図４で説明したような音声がマイク
ロフォン１３で取り入れられた場合、入力音声信号ＳＡ
inには、「５／５１５：００会議」を表現するデータ
が付加情報ＩＦｂとして挿入されており、この付加情報
ＩＦｂを抽出することで、情報の追加／更新のための処
理がなされたり、利用者へのデータ更新許可を問い合わ
せるような処理がなされる。In a succeeding step S51, the information extracting section 32 adds the additional information I included in the input audio signal SAin.
An Fb extraction process is performed. As a result of the processing, if the additional information IFb is not extracted, the processing ends. Additional information I
When Fb is extracted, the process proceeds to step S52, and the information processing unit 30 performs a process based on the additional information IFb. For example, when the sound described with reference to FIG.
In "in", data expressing "5/5 15:00 conference" is inserted as additional information IFb, and by extracting this additional information IFb, a process for adding / updating information is performed. Processing such as inquiring of the user for data update permission is performed.

【００３５】次に、図６を参照して、第１の情報処理装
置１０（装置Ａ）と第２の情報処理装置１０（装置Ｂ）
との間の相互通信動作を説明する。装置Ａはスケジュー
ルに関して問い合わせを行う側の装置であり、装置Ｂは
その問い合わせに応答する側の装置である。Next, referring to FIG. 6, first information processing apparatus 10 (apparatus A) and second information processing apparatus 10 (apparatus B)
The operation of intercommunication between the two will be described. The device A is a device that makes an inquiry about the schedule, and the device B is a device that responds to the inquiry.

【００３６】まず、装置Ａでは、利用者Ｕａは「明日の
会議が何時であるか」を問い合わせるような命令を入力
する。この命令の入力は、表示部１５、タッチパネル１
６およびタッチペン１７を用いて行われる。なお、装置
Ａに音声認識機能があるときは、マイクロフォン１３よ
り音声によって上述の命令を入力するようにしてもよ
い。ステップＳ６０において、装置Ａでは、利用者Ｕａ
の入力操作に従って、上述の図４の例で示した動作と同
様の手順で、「明日の会議は何時ですか」という付加情
報ＩＦａを含んだ出力音声信号ＳＡoutを生成し、その
出力音声信号ＳＡoutによる音声を装置Ｂ側に向けて出
力する。装置Ａの利用者Ｕａおよび装置Ｂの利用者Ｕｂ
はその音声信号を聞くことにより、装置Ａが行おうとす
る処理を明確に理解することができる。First, in the device A, the user Ua inputs a command for inquiring "what time is the meeting tomorrow". The input of this command is performed by the display unit 15 and the touch panel 1
6 and the touch pen 17. When the device A has a voice recognition function, the above-described command may be input by voice from the microphone 13. In step S60, the device A uses the user Ua
, An output audio signal SAout including the additional information IFa of "What time is the meeting tomorrow?" Is generated in the same procedure as the operation shown in the example of FIG. 4 described above, and the output audio signal SAout is output. Is output to the device B side. User Ua of device A and user Ub of device B
By listening to the audio signal, the user can clearly understand the processing that the device A intends to perform.

【００３７】装置Ｂでは、ステップＳ６１において、装
置Ａより出力された音声をマイクロフォン１３より取り
入れ、図５で示した例と同様な動作によって処理する。
入力音声信号ＳＡinには「明日の会議が何時であるか」
を問い合わせる命令が付加情報ＩＦｂとして挿入されて
おり、装置Ｂでは、ステップＳ６２において、この命令
の付加情報ＩＦｂを抽出する。そして、装置Ｂでは、続
くステップＳ６３において、抽出した命令の付加情報Ｉ
Ｆｂに基づいた情報処理、ここでは明日の会議の時間に
対する情報検索が行われ、問い合わせの結果として「１
５時から」という結果を得る。In the device B, in step S61, the sound output from the device A is taken in from the microphone 13, and processed by the same operation as the example shown in FIG.
"What time is the meeting tomorrow?"
Has been inserted as additional information IFb, and the apparatus B extracts the additional information IFb of this instruction in step S62. Then, in the device B, in the subsequent step S63, the additional information I
Information processing based on Fb, that is, information search for the time of the meeting tomorrow, is performed.
From 5 o'clock.

【００３８】ステップＳ６３で得られた問い合わせの結
果は、そのまま図４のステップＳ４０における送信すべ
き情報として自動的に選択されるようになされている。
あるいは、装置Ｂが利用者Ｕｂの許可を求め、それが許
されたときにのみ選択されるようになされていてもよ
い。装置Ｂでは、送信すべき情報が決まると、続くステ
ップＳ６４において、図４と同様の動作によって、出力
音声信号ＳＡoutの生成が行われる。図６では、例とし
て、Ｓ６４において、「わかりません」、「１５時
からです」という応答文が生成された場合を示してい
る。ここでは単なる音声信号だけでなく、明確なスケジ
ュールを表現する付加情報ＩＦａを含んだ出力音声信号
ＳＡoutを生成し、ステップＳ６５において、その出力
音声信号ＳＡoutによる音声を装置Ａ側に向けて出力す
る。The result of the inquiry obtained in step S63 is automatically selected as it is as information to be transmitted in step S40 in FIG.
Alternatively, the apparatus B may request permission of the user Ub, and may be selected only when the permission is granted. In the device B, when the information to be transmitted is determined, in the subsequent step S64, the output audio signal SAout is generated by the same operation as in FIG. FIG. 6 shows, as an example, a case in which a response sentence “I do not understand” or “Because 15:00” is generated in S64. Here, an output audio signal SAout including not only a simple audio signal but also additional information IFa expressing a clear schedule is generated. In step S65, the audio based on the output audio signal SAout is output to the device A side.

【００３９】装置Ａでは、ステップＳ６６において、装
置Ｂより出力された音声をマイクロフォン１３より取り
入れ、続いてステップ６７において、入力音声信号ＳＡ
inに含まれるスケジュールを表現する付加情報ＩＦｂを
抽出する。そして、その付加情報ＩＦｂが「１５時か
ら」という情報であった場合は、ステップＳ６８に進
み、自動的に、もしくは利用者Ｕａの許可のもとに、装
置Ｂのスケジュール情報への追加／更新を行う。以上の
ような装置Ａと装置Ｂの対話的なやり取りによって、も
ともと装置Ｂに格納されていた情報を装置Ａからの問い
かけに従って装置Ａへと取り込むような処理を実現する
ことができる。In the device A, in step S66, the sound output from the device B is taken in from the microphone 13, and in step 67, the input sound signal SA
The additional information IFb expressing the schedule included in “in” is extracted. Then, if the additional information IFb is the information “from 15:00”, the process proceeds to step S68, where the addition / update to the schedule information of the device B is performed automatically or under the permission of the user Ua. I do. Through the interactive exchange between the device A and the device B as described above, it is possible to realize a process of taking in the information originally stored in the device B into the device A in response to an inquiry from the device A.

【００４０】さらに、抽出した付加情報が「わかりま
せん」という情報であった場合は、ステップＳ６９に進
み、問い合わせが未解決であるという情報とともに装置
Ａが生成した問い合わせの出力音声信号ＳＡoutあるい
はそれを生成するために必要な情報を装置Ａ内のＲＡＭ
２３に格納する。格納された情報は表示部１５を介して
利用者Ｕａが見ることができると共に、タッチパネル１
６を介してその情報を選択することで、その出力音声信
号ＳＡoutによる音声を再出力するようになされてい
る。このように、装置間の対話において解決されなかっ
た問い合わせを装置内に格納しておき、その後の任意の
時点でその問い合わせに係る音声を再出力できるように
することで、利用者の作業を軽減することが可能とな
る。Further, if the extracted additional information is the information "I do not understand", the process proceeds to step S69, and the output audio signal SAout of the inquiry generated by the device A or the information indicating that the inquiry is unresolved is output. Information required to generate the data in the RAM in the device A
23. The stored information can be viewed by the user Ua via the display unit 15 and the touch panel 1
By selecting the information via the control unit 6, the audio based on the output audio signal SAout is output again. In this way, the inquiries that are not resolved in the dialogue between the devices are stored in the devices, and the voice relating to the inquiries can be re-output at any time thereafter, thereby reducing the user's work. It is possible to do.

【００４１】なお、図６では１対１の相互通信動作を説
明したが、情報伝送媒体として音声を用いているので、
同時に１対多の通信を行うこともできる。これにより、
先の図４および５で上げた例のように、一つの情報処理
装置１０から発せられる「５／５１５：００会議」と
いうスケジュール情報を、複数の情報処理装置１０に同
時に伝送することが可能となる。Although the one-to-one mutual communication operation has been described with reference to FIG. 6, since voice is used as an information transmission medium,
One-to-many communication can be performed simultaneously. This allows
As in the examples given in FIGS. 4 and 5, the schedule information “5/5 15:00 meeting” issued from one information processing device 10 can be transmitted to a plurality of information processing devices 10 at the same time. Becomes

【００４２】また、音声信号に付加する付加情報とし
て、その音声信号による音声を出力する情報処理装置１
０の識別子を含むようにしてもよい。これにより、音声
をマイクロフォン１３より取り入れた情報処理装置１０
はその音声を出力した情報処理装置１０がどれであるか
を知ることができ、これに応じて情報処理部３０での処
理を行うことができる。例えば、上述の例において、信
頼できる情報処理装置１０からのスケジュール更新情報
であれば自動的に情報を更新し、そうでなければ、利用
者への許可確認を問い合わせるようにする。また、上述
の複数の装置間での通信においても、どの装置からの音
声かを各装置が知ることができる。Further, the information processing apparatus 1 which outputs a sound by the sound signal as additional information to be added to the sound signal.
An identifier of 0 may be included. As a result, the information processing apparatus 10 incorporating the voice from the microphone 13
Can know which information processing apparatus 10 has output the sound, and can perform the processing in the information processing unit 30 in accordance with the information. For example, in the above-described example, if the schedule update information is from the reliable information processing apparatus 10, the information is automatically updated. Otherwise, the user is asked to confirm permission. Also, in the communication between the plurality of devices described above, each device can know from which device the sound is.

【００４３】さらに、図６で例としてあげたような対話
的なデータ伝送も可能である。このような対話を複数装
置間で行う際には、複数の装置が同時に音声を出力する
可能性が生じてしまうため、付加情報として次の発話を
許可する権利を与える装置の識別子を含むようにする。
音声を取り込んだ装置は入力音声信号ＳＡinに含まれて
いる識別子が自分の識別子と異なっていれば音声出力を
しないようになされており、この付加情報を用いて複数
の装置が同時に音声出力するのを防ぐことができる。Further, interactive data transmission as exemplified in FIG. 6 is also possible. When such a dialogue is performed between a plurality of devices, it is possible that a plurality of devices output voice at the same time, so that the identifier of the device that grants the right to permit the next utterance is included as additional information. I do.
The device which has taken in the voice does not output the voice if the identifier included in the input voice signal SAin is different from its own identifier, and a plurality of devices can output voice simultaneously using this additional information. Can be prevented.

【００４４】また、本発明を適用した複数の情報処理装
置１０間において、メモを音声によってやりとりするこ
とでも可能である。この場合、メモを表す音声信号に、
メモの内容が属するいくつかのキーワードを付加情報と
して付加することにより、受信側の装置はその音声によ
り表現されたメモが属する領域を知ることができ、受信
側で自動的にそのメモを分類整理することができる。It is also possible to exchange memos by voice between a plurality of information processing apparatuses 10 to which the present invention is applied. In this case, the audio signal representing the note
By adding some keywords to which the contents of the memo belong as additional information, the receiving device can know the area to which the memo expressed by the voice belongs, and the receiving device automatically sorts and arranges the memo. can do.

【００４５】さらに、本発明によれば音声メディアを媒
体とした長距離あるいは広範囲の通信を行うことが可能
である。本発明では、情報伝送に音声を用いているた
め、音声を用いたメディア、例えば、テレビやラジオな
どを介して、広範囲の情報提供を行うことが可能とな
る。例えば、テレビやラジオ中に挿入されるＣＭからそ
の商品の情報を得たりすることができる。Further, according to the present invention, it is possible to perform long-distance or wide-range communication using audio media as a medium. In the present invention, since voice is used for information transmission, it is possible to provide a wide range of information via a medium using voice, for example, a television or a radio. For example, information on the product can be obtained from a CM inserted into a television or radio.

【００４６】次に、この発明の第２の実施の形態につい
て説明する。Next, a second embodiment of the present invention will be described.

【００４７】上述した第１の実施の形態としての情報処
理装置１０は音声認識機能を備えていないが、この第２
の実施の形態としての情報処理装置１０は音声認識機能
を備えている。この第２の実施の形態における情報処理
装置１０の回路構成は、第１の実施の形態における情報
処理装置１０と同様である（図２参照）。マィクロフォ
ン１３、Ａ／Ｄ変換部２８および入力インタフェース２
５を介して音声信号が入力されると、第１の実施の形態
と同様にして入力音声信号ＳＡinより付加情報ＩＦｂを
抽出する処理が行われた後、ＣＰＵ２１は、ＲＯＭ２２
またはＲＡＭ２３に記憶されている音声学習データや辞
書情報を参照して、音声信号ＳＡｂに対して音声認識処
理を実行するようになされている。Although the information processing apparatus 10 according to the first embodiment does not have a voice recognition function,
The information processing apparatus 10 according to the embodiment has a voice recognition function. The circuit configuration of the information processing apparatus 10 according to the second embodiment is similar to that of the information processing apparatus 10 according to the first embodiment (see FIG. 2). Microphone 13, A / D converter 28 and input interface 2
When an audio signal is input via the input audio signal SAin, a process for extracting the additional information IFb from the input audio signal SAin is performed in the same manner as in the first embodiment.
Alternatively, the speech recognition processing is performed on the speech signal SAb with reference to speech learning data and dictionary information stored in the RAM 23.

【００４８】現状の音声認識技術では、認識率が１００
％ではなく、多くの場合、誤りを含んでしまう。さら
に、音声に含まれている微妙な意図や感情をシステム側
で抽出することは非常に困難である。例えば、「明日は
五時から会議です」と「明日は五時から会議ですか」と
いう二つの文章は、末尾に「か」が付いているかいない
かでその意味が大きく異なるが、音声認識においてはわ
ずか一音分の差でしかなく、認識誤りが生じやすい。In the current speech recognition technology, the recognition rate is 100
Instead of%, it often contains errors. Furthermore, it is very difficult for the system to extract subtle intentions and emotions contained in the voice. For example, the two sentences "Tomorrow is a meeting from 5 o'clock" and "Is tomorrow a meeting from 5 o'clock" have very different meanings depending on whether or not there is a "?" At the end. Is only a difference of one sound, and recognition errors are likely to occur.

【００４９】そこで、音声認識機能を持つ情報処理装置
１０において、「問い合わせ」や「疑問」などの意図を
判別するための付加情報ＩＦａを音声信号ＳＡａに付加
する。装置間の相互通信動作を説明するために用いた上
述の図６を参照して説明すると、装置Ａから装置Ｂへの
音声信号ＳＡａに付加情報ＩＦａとして「問い合わせ」
の意図を示す情報を付加することにより、受信側の情報
処理装置１０で、末尾の「か」が音声認識時に見落とさ
れても、「断定」の意図だと解釈されることがなく、問
い合わせの文だとして解釈／処理を実行することができ
る。Therefore, in the information processing apparatus 10 having a voice recognition function, additional information IFa for determining an intention such as “inquiry” or “question” is added to the voice signal SAa. Referring to FIG. 6 described above for explaining the intercommunication operation between the devices, the "inquiry" is added to the audio signal SAa from the device A to the device B as the additional information IFa.
Is added to the information processing device 10 on the receiving side, even if the "ka" at the end is overlooked during speech recognition, it is not interpreted as the intent of "determination", The interpretation / processing can be executed as a sentence.

【００５０】また、さらに、利用者から情報処理装置１
０への情報入力時に用いる場合は利用者の音声を用い、
装置間では利用者の音声を録音したものあるいは装置内
で生成された合成音声を用いる。これらの音声信号は音
声の特性が異なるので、精度の良い音声認識を行うため
にはＲＯＭ２２またはＲＡＭ２３にはそれぞれ別の音声
学習データが必要となり、大きな記憶容量が必要にな
る。そこで、人間と情報処理装置１０間の通信において
は音声認識を用い、装置間の通信においては処理に必要
な情報を全て付加情報で与えたり、もしくは、認識誤り
が生じることを回避あるいは軽減させるための付加情
報、例えば、意図や文章が含まれる対象領域を表すキー
ワードあるいは認識に必要な辞書を明示する情報などを
与えるようにする。Further, the information processing apparatus 1
When using when inputting information to 0, use the voice of the user,
Between devices, a recorded voice of the user or a synthesized voice generated in the device is used. Since these voice signals have different voice characteristics, different voice learning data are required in the ROM 22 or the RAM 23 for accurate voice recognition, and a large storage capacity is required. Therefore, speech recognition is used in the communication between the human and the information processing apparatus 10, and in the communication between the apparatuses, all information necessary for processing is given as additional information, or in order to avoid or reduce occurrence of recognition error. , Such as a keyword indicating a target area including an intention or a sentence, or information specifying a dictionary required for recognition.

【００５１】例えば、情報処理装置からの音声出力が
「旅行」に関するものであり、しかも、「問い合わせ」
を意図したものである場合、付加情報として「旅行」と
いうキーワードと「問い合わせ」という意図情報を音声
信号に挿入する。その音声信号による音声を取り入れる
情報処理装置１０では、まず付加情報として「旅行」と
「問い合わせ」を抽出し、続いて、音声認識のための辞
書情報として「旅行」と「問い合わせ」に適したものを
選択し、音声認識を実行する。これにより対象領域を制
限することになり、より精度よく音声認識を実行するこ
とができる。For example, the voice output from the information processing device is related to "travel" and "inquiry"
In this case, the keyword “travel” and the intention information “inquiry” are inserted into the audio signal as additional information. In the information processing apparatus 10 which takes in the voice by the voice signal, "travel" and "inquiry" are first extracted as additional information, and then, the dictionary information for voice recognition suitable for "travel" and "inquiry" To execute voice recognition. As a result, the target area is limited, and speech recognition can be performed with higher accuracy.

【００５２】次に、この発明の第３の実施の形態につい
て説明する。Next, a third embodiment of the present invention will be described.

【００５３】上述した第１および第２の実施の形態にお
いてこの発明を個人情報管理装置に適用したものであっ
たが、この発明はロボットに適用することも可能であ
る。In the first and second embodiments described above, the present invention is applied to a personal information management device. However, the present invention can be applied to a robot.

【００５４】ロボットには音声認識機構および音声合成
機構が備えられており、互いに自然言語を用いて対話を
行うことができる。人間からロボットへの発話において
は、音声認識技術を用いることで互いの意思疏通を図る
ようになされている。ただし、ロボット同士の対話は、
見かけ上は自然言語によってなされているように見える
が、実際には、音声信号に付加された付加情報に基づい
て情報交換を行うようになされている。これにより、ロ
ボット間では音声認識の誤りを回避した信頼性の高い情
報伝達を行い得ると共に、それを観察している人間もそ
の内容を合成音声として聞き取ることができ、より親和
性および信頼性の高いコミュニケーションを実現するこ
とができる。The robot is provided with a voice recognition mechanism and a voice synthesis mechanism, and can interact with each other using a natural language. In utterances from humans to robots, mutual recognition is attempted by using voice recognition technology. However, the dialogue between robots is
Although it appears to be performed in a natural language, in reality, information is exchanged based on additional information added to the audio signal. This makes it possible to transmit highly reliable information between the robots while avoiding errors in speech recognition, and also enables the human observer to hear the contents as synthesized speech, thereby improving affinity and reliability. High communication can be realized.

【００５５】産業界においても、複数のロボットでチー
ムを編成し、２チームによりサッカーの試合を行うとい
うロボカップ（RoboCup）というイベントが毎年開催さ
れている。ここでは、様々な形態のロボットによってサ
ッカーの試合が行われているが、それを観察する観客に
は、ロボットが何を考え、どのような情報を交換して動
いているのかはわからない。ロボット同士では様々な情
報交換がなされているが、それを観客側からは知ること
ができない。In the industrial world, an event called RoboCup, in which teams are formed by a plurality of robots and a soccer game is played by two teams, is held every year. Here, a soccer game is played by various types of robots, but a spectator who observes the game does not know what the robot thinks and exchanges what information. Although various types of information are exchanged between robots, it cannot be known from the audience.

【００５６】図７に、この発明を適用したロボットによ
るサッカーの試合の概観を示している。サッカーを行う
ためにフィールド７０にはゴール７１が備えられてお
り、ボール７２をめぐってロボット７３，７４，７５が
プレーを行っている。なお、図７はフィールド７０の一
部のみを示したものであり、図７に表れていないロボッ
トやゴールが存在する。FIG. 7 shows an overview of a soccer game played by a robot to which the present invention is applied. A goal 70 is provided in a field 70 for playing soccer, and robots 73, 74, 75 play around a ball 72. FIG. 7 shows only a part of the field 70, and there are robots and goals not shown in FIG.

【００５７】ロボット７３，７４，７５には音声入出力
部が備えられており、互いに音声を発して情報交換を行
うようにすることで、ロボットたちが何を考え、どんな
情報交換をしているのか、観客７６が知るようになされ
ている。図７では、ロボット７３からロボット７４に
「ゴール前に走れ」という発話がなされており、観客７
６はそれを音声として聞くことで知ることができる。さ
らに、ロボット７３の発する音声を出力するための出力
音声信号ＳＡoutに付加情報ＩＦａが付加されており、
ロボット７３からロボット７４への情報伝達を高い精度
で行うことができる。The robots 73, 74, and 75 are provided with a voice input / output unit. By exchanging information by mutually uttering voices, the robots are thinking what and exchanging what kind of information. Or, the audience 76 is made to know. In FIG. 7, the utterance “run before the goal” is made from the robot 73 to the robot 74, and the audience 7
6 can be known by listening to it as voice. Further, additional information IFa is added to an output audio signal SAout for outputting an audio generated by the robot 73,
Information transmission from the robot 73 to the robot 74 can be performed with high accuracy.

【００５８】例えば、ロボット７３からロボット７４へ
の「ゴール前に走れ」という音声を出力するための出力
音声信号ＳＡoutに、付加情報ＩＦａとして、発話した
ロボット７３を表す識別子、命令の対象とされたロボッ
ト７４を表す識別子、その具体的な命令などを付加する
ことにより、ロボット間で必要な情報伝達を行うように
する。さらに、「ゴール前」という音声に対応して、付
加情報ＩＦａとしてその位置座標を与えることにより、
ロボット間の相互の位置情報の交換を行うことも可能で
ある。音声はある程度の指向性をもつものの、広い範囲
に伝搬する性質をもつ。そのため、図７におけるロボッ
ト７５においても音声を取り入れ、その付加情報を抽出
することが可能である。ところが、図７に現れていない
さらに離れたロボットにおいては、十分に音声を取り入
れることはできず、ロボット７３からの音声による情報
交換は行えない。これは、実際の人間による試合と同様
の性質であり、より現実に近いゲームをロボットを用い
て実現することが可能である。For example, as an additional information IFa, an identifier representing the uttered robot 73 and an instruction target are added to an output audio signal SAout for outputting a voice “run ahead of the goal” from the robot 73 to the robot 74. By adding an identifier representing the robot 74, its specific command, and the like, necessary information is transmitted between the robots. Furthermore, by giving the position coordinates as additional information IFa in response to the voice “before goal”,
It is also possible to exchange mutual positional information between the robots. Although sound has a certain degree of directivity, it has the property of propagating over a wide range. Therefore, it is possible for the robot 75 in FIG. 7 to take in voice and extract the additional information. However, a further distant robot that does not appear in FIG. 7 cannot sufficiently take in voice, and cannot exchange information by voice from the robot 73. This is the same property as a game played by a real human being, and it is possible to realize a more realistic game using a robot.

【００５９】また、動物型のロボットなどのように、自
然言語を発声しない方が望ましいと考えられる場合、音
声として鳴き声などを用いてもよい。この場合、現状の
音声認識技術では自然な鳴き声を認識して、その内容を
判別することは非常に困難である。鳴き声を出力するた
めの出力音声信号ＳＡoutに付加情報ＩＦａとしてその
鳴き声の意図や感情を示す情報を付加することで、人間
には鳴き声から判断するのみで明確な言語としてはわか
らないが、ロボット間ではより明確な意思疏通を図るこ
とができる。これにより、動物間では何らかの言語が交
されているが、人間には詳しくはわからないという、実
際の人間と動物の関係に近いロボットを実現することが
できる。When it is considered that it is desirable not to utter a natural language, as in an animal-type robot, a bark may be used as a voice. In this case, it is very difficult for the current speech recognition technology to recognize a natural call and determine its content. By adding information indicating the intention and emotion of the call as additional information IFa to the output sound signal SAout for outputting the call, humans can only judge from the call and do not know as a clear language, but between robots. More clear communication can be achieved. As a result, it is possible to realize a robot that is close to the actual relationship between humans and animals, although some language is exchanged between animals, but humans do not know in detail.

【００６０】また、この発明を適用することで仮想的な
動物園を実現することができる。この仮想動物園では、
実際の動物ではなく、動物の鳴き声と映像、あるいはロ
ボットによって構成されるものである。ここでは様々な
種類の動物が情報として蓄えられており、利用者は、普
段見ることができないような動物を見ることができる。A virtual zoo can be realized by applying the present invention. In this virtual zoo,
It is not an actual animal, but rather an animal cry and video, or a robot. Here, various kinds of animals are stored as information, and the user can see animals that cannot be normally seen.

【００６１】人間は動物の鳴き声からある程度動物の種
類を見分けることはできるが、現状の音声認識では主に
人間の言語の認識を対象としており、鳴き声などの音声
に対してはほとんど考慮されていない。そのため、鳴き
声から種類を識別するのは困難である。また、利用者も
あまりなじみのない動物ではその種類や名称がわからな
い。Although humans can recognize the kind of animals to some extent from the sounds of animals, the current speech recognition mainly targets recognition of human language, and almost no consideration is given to sounds such as calls. . Therefore, it is difficult to identify the type from the cry. Also, the user does not know the type or name of an animal that is not very familiar.

【００６２】そこで、鳴き声を出力する出力音声信号Ｓ
Ａoutに、その動物の種類や名称などの情報を付加情報
ＩＦａとして付加するようにする。利用者はこの発明に
おける情報処理装置１０を持ち、これによりそれぞれの
動物の鳴き声から、付加情報として付加されたそのその
種類や名称を得ることができる。さらに、その種類や名
称をもとに、情報処理装置ではより詳細な情報を利用者
に提示することができる。これにより、動物の鳴き声を
キーとして、その動物の種類や名称さらにはより詳細な
情報を利用者に提示するような仮想動物園を実現するこ
とができる。Therefore, an output sound signal S for outputting a call
Information such as the type and name of the animal is added to Aout as additional information IFa. The user has the information processing device 10 according to the present invention, whereby the type and name of the animal added as additional information can be obtained from the sound of each animal. Further, the information processing apparatus can present more detailed information to the user based on the type and name. Thus, it is possible to realize a virtual zoo that presents the user with the type and name of the animal and more detailed information using the squeal of the animal as a key.

【００６３】なお、上述では、ロボット間の情報伝達に
付加情報のみを用いた場合を述べたが、勿論、音声認識
を行い、その補助として付加情報を用いる形態であって
もかまわない。In the above description, the case where only the additional information is used for transmitting information between the robots has been described. However, it is needless to say that the voice recognition may be performed and the additional information may be used as an auxiliary.

【００６４】次に、この発明の第４の実施の形態につい
て説明する。Next, a fourth embodiment of the present invention will be described.

【００６５】上述した情報処理装置を用いたシステム例
として、駅や店内などの構内アナウンスで用いるアナウ
ンスシステムが考えられる。図８に、そのアナウンスシ
ステムの概観を示している。サーバ８０は構内アナウン
スを行うためのものであり、構内各所に設置されたスピ
ーカ８１などと接続するようになされている。利用者８
２は、クライアント側となる情報処理装置８３を携帯し
ており、スピーカ８１から出力された音声を自らの耳で
聞くと共に、情報処理装置８３でその音声を取り入れ
る。As an example of a system using the above-described information processing apparatus, an announcement system used for an announcement in a premises such as a station or a store can be considered. FIG. 8 shows an overview of the announcement system. The server 80 is for making announcements on the premises, and is connected to speakers 81 and the like installed at various places on the premises. User 8
Reference numeral 2 carries an information processing device 83 serving as a client, and listens to the sound output from the speaker 81 with its own ear, and the information processing device 83 takes in the sound.

【００６６】サーバ８０は、図２に示す情報処理装置１
０の音声出力側の機能を備えており、出力インタフェー
ス部分は複数のスピーカ８１と接続されている。サーバ
８０は、例えば、迷子のお知らせ、次の電車の到着時間
および特売などの商品情報など、構内の広範囲に提供す
る必要のある様々な情報を音声としてスピーカ８１を介
して利用者８２に伝達する機能を持つ。一方、情報処理
装置８３は、図２に示す情報処理装置１０の音声入力側
の機能を備えており、例えば図１に示した概観を持つも
のである。The server 80 is the information processing device 1 shown in FIG.
0 is provided on the audio output side, and the output interface portion is connected to a plurality of speakers 81. The server 80 transmits, to the user 82 via the speaker 81, various information that needs to be provided over a wide area of the premises, such as notification of a lost child, arrival time of the next train, and product information such as a bargain sale, for example. Has functions. On the other hand, the information processing device 83 has a function on the voice input side of the information processing device 10 shown in FIG. 2, and has, for example, the appearance shown in FIG.

【００６７】サーバ８０はオペレータが操作しており、
随時、アナウンスすべき事象が発生したときにそれにあ
った処理操作を行う。サーバ８０が送信しスピーカ８１
を通して構内の利用者８２に提供される自然言語による
音声は、オペレータの発声もしくはあらかじめ録音され
ている音声もしくは必要に応じて作成する合成音声であ
る。ただし、例えば、駅構内での時刻表のアナウンスや
店の開店閉店のアナウンスなど、あらかじめ定められた
スケジュールに従って自動的に音声出力しても構わない
ものについては、オペレータを必要とせず自動的に出力
されるようにしてもよい。The server 80 is operated by an operator.
At any time, when an event to be announced occurs, a processing operation corresponding to the event is performed. The server 80 transmits the speaker 81
The voice in the natural language provided to the user 82 on the premises through is the voice of the operator, the voice recorded in advance, or the synthesized voice created as needed. However, for example, announcements about timetables in stations and announcements of store opening and closing, such as those that can be output automatically according to a predetermined schedule, are automatically output without the need for an operator. May be performed.

【００６８】これら音声に係る音声信号に付加する付加
情報としては、例えば、その自然言語が意味する内容を
表現するような情報を用いることができる。アナウンス
の音声自体を情報処理装置８３において音声認識するこ
とは非常に困難であるが、その意味する内容を付加情報
として付加することで、そのアナウンス内容を情報処理
装置８３の表示部１５を用いて確認することができる。
さらに、耳が聞こえない、あるいは、聞こえにくい人に
も同時にアナウンスを行うことが可能であるという効果
もある。As the additional information to be added to the audio signal relating to the audio, for example, information expressing the content of the natural language can be used. Although it is very difficult for the information processing device 83 to recognize the voice of the announcement itself, the content of the announcement is added as additional information, and the content of the announcement is displayed using the display unit 15 of the information processing device 83. You can check.
Further, there is also an effect that it is possible to simultaneously make an announcement to a person who is deaf or hard to hear.

【００６９】また、迷子のお知らせでなされるよう
な「．．．３階エレベータ前までお越し下さい」のよう
なアナウンスでは、その場所をみつけるのに時間を費や
すという問題がある。このようなアナウンスへの付加情
報として、この場合、「３階エレベータ前」の位置情報
を提供することができる。それを受信した情報処理装置
８３では、その位置情報に基づく地図情報を表示部１５
に表示することにより、利用者８２へのより具体的な情
報提示を行うことが可能である。Also, in an announcement such as "... come to the 3rd floor elevator" which is given by the notification of a lost child, there is a problem that it takes time to find the place. In this case, position information of “in front of the third-floor elevator” can be provided as additional information to such an announcement. The information processing device 83 that has received the information displays the map information based on the position information on the display unit 15.
, It is possible to present more specific information to the user 82.

【００７０】また、付加情報としてアナウンスを表現す
る自然言語とは異なる言語、例えば、日本語のアナウン
スに対して英語やフランス語などの言語を表現する情報
を持たせるようにする。これにより、その国の言語がわ
からない外国人に対しても同じ情報を音声で送信するこ
とができる。言語の違いから、構内においてアナウンス
される音声自体の意味を理解することができない利用者
８２は、情報処理装置８３を用いることで、音声信号に
母国語への翻訳を表現する情報が付加されていれば、そ
れを表示部１５あるいはスピーカ１２を介して母国語と
して理解することができる。Further, a language different from the natural language expressing the announcement as the additional information, for example, a Japanese announcement is provided with information expressing a language such as English or French. As a result, the same information can be transmitted by voice to a foreigner who does not know the language of the country. The user 82 who cannot understand the meaning of the voice itself announced on the premises due to the language difference uses the information processing device 83 to add information expressing the translation into the native language to the voice signal. Then, it can be understood as a native language through the display unit 15 or the speaker 12.

【００７１】なお、上述実施の形態においては、この発
明を、図１に示すような携帯型の情報処理装置１０やロ
ボット７３〜７５あるいはサーバ８０に適用したが、こ
の発明は他の装置にも同様に適用できることは勿論であ
る。In the above embodiment, the present invention is applied to the portable information processing device 10, the robots 73 to 75, or the server 80 as shown in FIG. 1, but the present invention is applied to other devices. Of course, the same can be applied.

【００７２】[0072]

【発明の効果】この発明によれば、音声信号に対しその
音声信号による音声の聞き取りに影響しない態様で付加
情報を付加して出力音声信号を生成し、その出力音声信
号を音声に変換して出力するものであり、利用者には音
声により理解しやすい情報提示を行うことができ、装置
間では付加的な情報や誤りの少ない精度の高い情報のや
り取りを実現できる。また、通信媒体として音声を用い
ているので、指向性のある一対一の通信ではなく、近傍
の複数の装置間で同時に情報のやり取りを行うことがで
きる。さらに、通信路として、テレビ、ラジオ、電話な
どの音声メディアを用いることで、長距離の通信や、大
規模人数を対象とした通信を行うことができる。According to the present invention, an output audio signal is generated by adding additional information to an audio signal in a manner that does not affect the listening of the audio by the audio signal, and the output audio signal is converted into audio. The information can be output, and the user can be presented with information that is easy to understand by voice, and exchange of additional information and highly accurate information with few errors can be realized between the devices. In addition, since voice is used as a communication medium, information can be exchanged between a plurality of nearby devices at the same time, instead of one-to-one communication with directivity. Further, by using audio media such as television, radio, and telephone as a communication path, long-distance communication and communication for a large number of people can be performed.

[Brief description of the drawings]

【図１】実施の形態としての情報処理装置の概観を示す
図である。FIG. 1 is a diagram illustrating an overview of an information processing apparatus according to an embodiment.

【図２】情報処理装置の回路構成を示すブロック図であ
る。FIG. 2 is a block diagram illustrating a circuit configuration of the information processing apparatus.

【図３】ＣＰＵの内部動作を示す機能ブロック図であ
る。FIG. 3 is a functional block diagram showing an internal operation of a CPU.

【図４】情報処理装置の音声出力動作（送信動作）の説
明に供するフローチャートである。FIG. 4 is a flowchart for explaining a voice output operation (transmission operation) of the information processing apparatus.

【図５】情報処理装置の音声入力動作（受信動作）の説
明に供するフローチャートである。FIG. 5 is a flowchart illustrating a voice input operation (reception operation) of the information processing apparatus.

【図６】情報処理装置間の相互通信動作例を説明するた
めの図である。FIG. 6 is a diagram for describing an example of a mutual communication operation between information processing apparatuses.

【図７】ロボットによるサッカーの試合の概観を示す図
である。FIG. 7 is a diagram showing an overview of a soccer game played by a robot.

【図８】アナウンスシステムの概観を示す図である。FIG. 8 is a diagram showing an overview of an announcement system.

[Explanation of symbols]

１０・・・情報処理装置、１１・・・本体、１２・・・
スピーカ、１３・・・マイクロフォン、１４・・・トー
クスイッチ、１５・・・表示部、１６・・・タッチパネ
ル、２０・・・内部バス、２１・・・ＣＰＵ、２２・・
・ＲＯＭ、２３・・・ＲＡＭ、２４・・・表示制御部、
２５・・・入力インタフェース、２６・・・音声合成
部、２７・・・入力検出部、２８・・・Ａ／Ｄ変換部、
３０・・・情報処理部、３１・・・情報付加部、３２・
・・情報抽出部10 ... information processing device, 11 ... body, 12 ...
Speaker, 13 ... Microphone, 14 ... Talk switch, 15 ... Display unit, 16 ... Touch panel, 20 ... Internal bus, 21 ... CPU, 22 ...
ROM, 23 RAM, 24 display control unit
25 input interface, 26 voice synthesizer, 27 input detector, 28 A / D converter,
30 ... information processing unit, 31 ... information adding unit, 32
..Information extraction section

フロントページの続き (72)発明者角田弘史東京都品川区北品川６丁目７番35号ソニー株式会社内 (72)発明者加藤靖彦東京都品川区北品川６丁目７番35号ソニー株式会社内Ｆターム(参考） 9A001 BB04 DD13 EE07 HH17 HH18 HH19 JJ38 JZ76 Continuing from the front page (72) Inventor Hirofumi Tsunoda 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Inside Sony Corporation (72) Inventor Yasuhiko Kato 6-35, Kita-Shinagawa, Shinagawa-ku, Tokyo Sony Stock In-house F term (reference) 9A001 BB04 DD13 EE07 HH17 HH18 HH19 JJ38 JZ76

Claims

[Claims]

1. An audio signal generating means for generating an audio signal; an additional information generating means for generating additional information; and adding the additional information to the audio signal in a manner that does not affect the hearing of the audio by the audio signal. An information processing apparatus, comprising: an information adding unit that generates an output audio signal by using the output unit; and an audio output unit that outputs audio based on the output audio signal.

2. The information processing apparatus according to claim 1, wherein the additional information is information related to the audio signal.

3. The information processing apparatus according to claim 2, wherein the audio signal represents a predetermined natural language, and the additional information is information indicating a phrase included in the natural language. apparatus.

4. The apparatus according to claim 2, wherein the audio signal represents a predetermined natural language, and the additional information is information used for recognition or interpretation of the natural language. Information processing device.

5. The apparatus according to claim 4, wherein the additional information is information indicating a keyword related to the natural language.
An information processing apparatus according to claim 1.

6. The information processing apparatus according to claim 4, wherein the additional information is information indicating an intention or an emotion expressed in the natural language.

7. The information processing apparatus according to claim 4, wherein the additional information is information for identifying a target area including the natural language.

8. The information processing apparatus according to claim 2, wherein the audio signal represents a place in a natural language, and the additional information is position information indicating the place.

9. The information processing apparatus according to claim 2, wherein the audio signal represents a predetermined natural language, and the additional information is information in another language corresponding to the natural language. apparatus.

10. The audio signal according to claim 2, wherein the audio signal corresponds to a human or animal voice, and the additional information is information indicating an intention or an emotion expressed by the voice. Information processing device.

11. The information processing apparatus according to claim 2, wherein the audio signal corresponds to a sound of an animal, and the additional information is information indicating a type and a name of the animal.

12. The information processing apparatus according to claim 1, wherein the additional information is identification information indicating its own apparatus.

13. The information processing apparatus according to claim 1, wherein the additional information is information for identifying a device having a right to output the sound next.

14. An information adding means, comprising: an attack detection unit for detecting a sudden rising and large amplitude portion of an audio signal as an attack; and a spectrum analysis unit for performing spectrum analysis on a section of the audio signal having a predetermined length. An instantaneous interruption section forming section for removing the attack part in the audio signal from the output of the attack detection section and the output of the spectrum analysis section, and forming an instantaneous interruption section in a wide band portion; 2. An information adding unit for adding the additional information by using the determined instantaneous interruption section.
An information processing apparatus according to claim 1.

15. The information processing apparatus according to claim 1, wherein the information adding means adds the additional information to the audio signal as a spread spectrum signal.

16. A step of generating an audio signal; a step of generating additional information; and generating an output audio signal by adding the additional information to the audio signal in a manner that does not affect the listening of the audio by the audio signal. And a step of converting the output audio signal into audio and outputting the audio.

17. An audio input means for inputting audio and obtaining an input audio signal corresponding to the audio, additional information extracting means for extracting additional information added to the input audio signal, and the extracted additional information An information processing apparatus comprising: an information processing unit that performs a process using information.

18. The information processing apparatus according to claim 17, wherein the additional information is information related to the voice.

19. The information processing apparatus according to claim 18, wherein the voice represents a predetermined natural language, and the additional information is information indicating a phrase included in the natural language. .

20. The information processing apparatus according to claim 19, wherein the information processing means displays the phrase on a display unit using the additional information.

21. The information processing apparatus according to claim 18, wherein the voice represents a predetermined natural language, and the additional information is information necessary for recognition of the natural language.

22. The information processing apparatus according to claim 21, wherein the additional information is a keyword related to the natural language.

23. The information processing apparatus according to claim 21, wherein the additional information is information indicating an intention or an emotion expressed in the natural language.

24. The information processing apparatus according to claim 21, wherein the information processing means performs a voice recognition process for recognizing the natural language from the input voice signal.

25. The apparatus according to claim 2, wherein the additional information is information specifying a dictionary necessary for the voice recognition.
4. Information processing measures according to 4.

26. The information processing apparatus according to claim 24, wherein the additional information is information for identifying a target area including the natural language.

27. The information processing apparatus according to claim 18, wherein the voice is a representation of a place in a natural language, and the additional information is position information indicating the place.

28. The information processing apparatus according to claim 27, wherein the information processing means displays the location on a display unit using the additional information.

29. The information processing apparatus according to claim 18, wherein the voice represents a predetermined natural language, and the additional information is information of another language corresponding to the natural language. .

30. The information processing apparatus according to claim 29, wherein the information processing means displays the foreign language on a display unit using the additional information.

31. The voice is a human or animal voice,
19. The information processing apparatus according to claim 18, wherein the additional information is information indicating an intention or an emotion expressed by the voice.

32. The information processing apparatus according to claim 18, wherein the voice is a sound of an animal, and the additional information is information indicating a type and a name of the animal.

33. The information processing apparatus according to claim 32, wherein the information processing means displays the type and name of the animal on a display unit using the additional information.

34. The information processing apparatus according to claim 17, wherein the additional information is information for identifying a device that has output the sound.

35. The information processing apparatus according to claim 17, wherein the additional information is information for identifying a device having a right to output the sound next.

36. The information processing apparatus according to claim 17, wherein said information processing means is a control means for controlling an operation of a robot imitating a living thing.

37. A step of converting an input voice to obtain an input voice signal, a step of extracting additional information added to the input voice signal, and performing a process using the extracted additional information. And an information processing method.

38. An audio signal generating means for generating an audio signal, additional information generating means for generating additional information, and adding the additional information to the audio signal in a manner that does not affect the listening of the audio by the audio signal. Information adding means for obtaining an output audio signal by performing the above operation; audio output means for outputting an audio based on the output audio signal; audio input means for inputting the audio and obtaining an input audio signal corresponding to the audio; An information processing apparatus, comprising: an additional information extracting unit that extracts additional information added to a file; and an information processing unit that performs a process using the extracted additional information.

39. The information processing apparatus according to claim 38, wherein said information processing means is a control means for controlling an operation of a robot imitating a living thing.

40. When the input voice signal including the information for resolving the query is not obtained from the voice input means after outputting the voice of the output voice signal relating to the query from the voice output means, the information of the query is provided. Information storage means for storing the unresolved inquiry information as unresolved inquiry information, and based on the unresolved inquiry information stored in the information storage means, from the voice output means at an arbitrary timing. 39. The information processing apparatus according to claim 38, further comprising: re-output control means for outputting a sound based on the output sound signal according to the inquiry.