JPH08116385A

JPH08116385A - Individual information terminal equipment and voice response system

Info

Publication number: JPH08116385A
Application number: JP6275674A
Authority: JP
Inventors: Hiroaki Kokubo; 浩明小窪; Toshiyuki Aritsuka; 俊之在塚
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-10-14
Filing date: 1994-10-14
Publication date: 1996-05-07

Abstract

PURPOSE: To preserve large quantity of data by storing encoded voice data, decoding voice data by using a codebook as necessary, and reproducing and outputting voice data. CONSTITUTION: An input/output part 402 is composed of the input/output interface of a microphone and an input/output control part and outputs the inputs of data and commands and processing results to a user. A voice data recording part 403 is an exclusive memory storing the voice data transferred from a server outputting the voice data of a voice response server. A CODEC part 404 has a function encoding voice by using the codebook of a code book control part 405 and reproducing a voice waveform and is composed of a CODEC part 404 performing an information compression and a decoding part reproducing voice data. The voice data which is received from the server and is encoded is stored in the voice data recording part 403, and the voice data is decoded, reproduced and outputted by using the codebook in the CODEC part 404. Therefore, the capacity of the recording part 403 can be increased.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、クライアント・サーバ
型で構成される音声応答システムに関し、特に音声応答
システムで用いられる個人情報端末装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a client / server type voice response system, and more particularly to a personal information terminal device used in the voice response system.

【０００２】[0002]

【従来の技術】個人情報端末では、省スペースや携帯性
を高めるため小型化および省電力化が望まれるが、搭載
することができるメモリ量や処理能力にも限界がある。
一方、音声認識、音声合成といった音声処理を行なうた
めには、大量のメモリと膨大な処理能力が必要であり、
現状の個人情報端末で処理することは困難である。上記
の課題を解決する手段として、個人情報端末をネットワ
ークに接続し、ネットワーク上の高性能な装置を音声認
識サーバあるいは音声合成サーバとして音声処理を請け
負わせるクライアント・サーバ型構成のシステムが提案
されている。2. Description of the Related Art In personal information terminals, it is desired to reduce the size and power consumption in order to save space and enhance portability, but there is a limit to the amount of memory and processing capacity that can be installed.
On the other hand, in order to perform voice processing such as voice recognition and voice synthesis, a large amount of memory and a huge processing capacity are required.
It is difficult to process with the current personal information terminal. As a means for solving the above problems, a client / server type system has been proposed in which a personal information terminal is connected to a network and a high-performance device on the network is used as a voice recognition server or a voice synthesis server to undertake voice processing. There is.

【０００３】ところで、音声データはデータ量が大きい
ため、音声データを直接用いて通信を行なおうとする
と、数十kbpsの転送レートが必要となる。一方、無線通
信では、限られた周波数帯域にできるだけ多くのチャネ
ルを確保する必要がある。そこで、一般に無線通信にお
いては、符号化によって圧縮した音声データを使って通
信することが多い。By the way, since the voice data has a large amount of data, if the voice data is directly used for communication, a transfer rate of several tens of kbps is required. On the other hand, in wireless communication, it is necessary to secure as many channels as possible in a limited frequency band. Therefore, generally in wireless communication, communication is often performed using audio data compressed by encoding.

【０００４】音声符号化方式としては、古くからさまざ
まな方式が提案されてきた。例えば、中田和男著「ディ
ジタル情報圧縮」（廣済堂産報出版、電子科学シリーズ
100）には、さまざまな音声符号化方式が解説されてお
り、波形符号化方式や情報源符号化方式（パラメータ符
号化方式）に関する多数の方式が示されている。As a voice encoding system, various systems have been proposed for a long time. For example, Kazuo Nakata "Digital Information Compression" (Kosaido Kogyo Publishing, Electronic Science Series)
100) describes various speech coding methods, and shows many methods related to waveform coding methods and information source coding methods (parameter coding methods).

【０００５】一般に、高能率符号化方式は、音声の情報
の存在が片寄っている点に注目し、情報の存在している
部分に符号の割当を厚くすることにより、音声符号化を
実現している。この点をさらに積極的に推し進め、複数
のパラメータの組合わせとしての情報の片寄りに注目
し、パラメータの組み合わせのセット（ベクトルと呼
ぶ）に対し、音声情報の存在している部分に符号の割当
てを厚くする方式（ベクトル量子化と呼ぶ）が注目され
ている。このような方式は、例えば、S.Roucos etal.,"
Segment quantization for very-low-rate speech codi
ng" Proc.ICASSP 82,pp1563 (1982）などに開示されて
いる。Generally, in the high-efficiency coding system, attention is paid to the fact that the existence of speech information is biased, and speech coding is realized by thickening the allocation of codes to the portions where information exists. There is. This point is pushed even more positively, paying attention to the deviation of information as a combination of a plurality of parameters, and assigning a code to a portion where voice information exists for a set of parameter combinations (called a vector). A method of increasing the thickness (called vector quantization) is drawing attention. Such a method is described in, for example, S. Roucos et al., "
Segment quantization for very-low-rate speech codi
ng "Proc. ICASSP 82, pp1563 (1982) and the like.

【０００６】ベクトル量子化では、有限個のベクトルを
指標（インデックス）に対応付けて格納するコードブッ
クを用いる。ベクトル量子化は、入力ベクトルを有限個
のコードベクトルに写像するという意味で量子化過程で
あるが、指標の情報量が元のベクトルよりも少なくでき
るので、情報圧縮のためにも用いられる。In vector quantization, a codebook that stores a finite number of vectors in association with indexes is used. Vector quantization is a quantization process in the sense that an input vector is mapped to a finite number of code vectors, but since the information amount of the index can be made smaller than the original vector, it is also used for information compression.

【０００７】ベクトル量子化の方式としては、音声の予
測残差に対してベクトル量子化するCELP方式（例えば、
B.S.Atal et al.,"Stochastic coding of speech signa
ls at very low bit rates" Proc. ICC 84,pp.1610-161
3(1984))や、その改良方式であるVSELP（例えば、I.A.G
erson et al.,"Vector sum excited linear prodiction
(VSELP)"Proc.IEEE workshop on speech coding for te
lecommunications,pp.66-68(1989))などが提案されてい
る。日本では、ディジタル方式の携帯電話の分野で、VS
ELPとPSI-CELPが標準化方式として制定されている。As a vector quantization method, a CELP method (for example
BSAtal et al., "Stochastic coding of speech signa
ls at very low bit rates "Proc. ICC 84, pp.1610-161
3 (1984)) or its improved method VSELP (for example, IAG
erson et al., "Vector sum excited linear prodiction
(VSELP) "Proc. IEEE workshop on speech coding for te
telecommunications, pp.66-68 (1989)) and the like have been proposed. In Japan, in the field of digital mobile phones, VS
ELP and PSI-CELP have been established as standardization methods.

【０００８】[0008]

【発明が解決しようとする課題】ところで、上述したク
ライアント・サーバ構成による音声応答システムで問題
となるのは盗聴の危険性である。上述のベクトル量子化
の方式では、クライアントとサーバとが共通のコードブ
ックを用いて符号化した音声データを通信している。し
たがって、コードブックが無いと符号化された音声デー
タを再生することは非常に困難であるが、逆にコードブ
ックさえ入手できれば符号化された音声データを再生す
ることは容易である。By the way, the problem of the voice response system having the above-mentioned client / server configuration is the risk of eavesdropping. In the above vector quantization method, the client and the server communicate voice data encoded using a common codebook. Therefore, it is very difficult to reproduce encoded voice data without a codebook, but conversely, it is easy to reproduce encoded voice data if a codebook is available.

【０００９】つまり、サーバがひとつのコードブックし
か使用していない場合には、このサーバのサービスを利
用している不特定多数の人々が同じコードブックを持っ
ていることになり、これらの人々のうち悪意を持った人
による盗聴の可能性が避けられないという問題があっ
た。In other words, if the server uses only one codebook, it means that an unspecified number of people who use the service of this server have the same codebook. There was a problem that the possibility of eavesdropping by a malicious person was unavoidable.

【００１０】また、携帯端末には紛失の危険性も避けら
れない。紛失した携帯端末を取得した第三者が、本来の
所有者に無断で、保存してある音声の内容を聞いたり、
サーバとの通信を行なった場合には、情報の漏洩など、
所有者が多大な不利益を被ることになる。In addition, there is an unavoidable risk of loss in the mobile terminal. A third party who acquired the lost mobile terminal can listen to the contents of the saved voice without the original owner's consent,
When communicating with the server, information leakage, etc.
The owner suffers a great deal of disadvantage.

【００１１】以上のような問題点に加え、大量の音声デ
ータの保存に係る下記のような問題点もある。大量なデ
ータベースの検索など処理に時間のかかる場合、あるい
は定期的にデータが転送されてくる株価データやニュー
スの配送の場合には、サーバから送信された音声をその
時点で再生するよりも、送信された音声データをいった
んメモリに格納し、使用者が都合の良いときに再生でき
た方が都合がよい。しかし、音声データは文字データに
比べてデータ量が大きく、記憶するためには大容量のメ
モリが必要となる。大量のデータを保存するためには、
高価なＳＲＡＭ（Static RAM）を使うよりも、比較的安
価で、セル面積あたりのビット容量の多いＤＲＡＭ（Dy
namic RAM ）を用いる方が経済的である。ところが、携
帯性を考えた個人情報端末において、バッテリーバック
アップが必須であるＤＲＡＭ内に大量の音声データを保
存する場合には、バッテリーの消耗により長期間の保存
は不可能であるという問題がある。In addition to the above problems, there are the following problems related to the storage of a large amount of voice data. If it takes a lot of time to process a large amount of database search, or if you want to send stock price data or news that is regularly transferred, send the audio sent from the server rather than playing it at that point. It is preferable that the voice data thus created be temporarily stored in a memory so that it can be played back by the user at his or her convenience. However, the voice data has a larger data amount than the character data, and a large-capacity memory is required to store the voice data. To store large amounts of data,
A DRAM (Dy) that is relatively inexpensive and has a large bit capacity per cell area compared to using an expensive SRAM (Static RAM)
It is more economical to use (namic RAM). However, in a personal information terminal considering portability, when a large amount of voice data is stored in a DRAM, which requires battery backup, there is a problem in that it cannot be stored for a long time due to exhaustion of the battery.

【００１２】本発明は、音声応答システムで用いられる
個人情報端末装置において、大量の音声データを保存す
ることができ、使用者が好きなときに音声データを再生
することができるようにすることを目的とする。According to the present invention, a personal information terminal device used in a voice response system can store a large amount of voice data and can reproduce the voice data at any time by the user. To aim.

【００１３】また本発明は、音声応答システムにおける
盗聴の可能性を減少せしめ、個人情報端末装置が盗難あ
るいは紛失したときなどでも、第三者に情報が漏洩する
ことがないようにすることを目的とする。It is another object of the present invention to reduce the possibility of eavesdropping in the voice response system and prevent the information from being leaked to a third party even when the personal information terminal device is stolen or lost. And

【００１４】[0014]

【課題を解決するための手段】本発明に係る個人情報端
末装置は、サーバから受信した応答メッセージである符
号化された音声データを記憶する音声データ記録手段
と、多次元ベクトルとして表わされる音声の特徴量を代
表するベクトルであるコードベクトルとそれぞれのコー
ドベクトルに対応した一次元の値であるコードワードと
から構成されるコードブックを記憶するコードブック格
納手段と、前記コードブック格納手段に記憶されている
コードブックを用いて音声を符号化、復号化するコーデ
ック手段とを備えている。そして、受信した符号化され
た音声データをいったん前記音声データ記録手段に記憶
しておき、必要なときに、前記コーデック手段により前
記コードブック格納手段に記憶されているコードブック
を用いて音声データを復号化し再生出力するようにす
る。A personal information terminal device according to the present invention includes a voice data recording means for storing encoded voice data which is a response message received from a server, and a voice data represented as a multidimensional vector. Codebook storage means for storing a codebook composed of a code vector that is a vector representing a feature quantity and a code word that is a one-dimensional value corresponding to each code vector; and a codebook storage means that is stored in the codebook storage means. And a codec means for encoding and decoding voice using a codebook. Then, the received encoded voice data is temporarily stored in the voice data recording means, and when necessary, the voice data is recorded by the codebook means using the codebook stored in the codebook storage means. Decrypt and reproduce and output.

【００１５】さらに、コードブック格納手段に記憶する
１つ以上のコードブックには、それぞれを特定する識別
子を付して、コードブック管理手段により管理するよう
にする。そして、前記個人情報端末装置から前記サーバ
にコマンドを送信する際、使用するコードブックの識別
子を前記サーバに送信し、前記サーバでは該指定された
識別子のコードブックを用いて音声データを符号化して
応答メッセージとして送信し、該符号化された音声デー
タを受信した個人情報端末装置では、前記サーバから受
信した符号化された音声データをいったん前記音声デー
タ記録手段に記憶しておき、必要なときに、前記指定し
た識別子のコードブックを用いて前記音声データを復号
化し再生出力するようにする。Further, one or more codebooks stored in the codebook storage means are assigned an identifier for identifying each, and are managed by the codebook management means. Then, when the command is transmitted from the personal information terminal device to the server, the identifier of the codebook to be used is transmitted to the server, and the server encodes the voice data using the codebook of the designated identifier. In the personal information terminal device which has transmitted the encoded voice data as a response message and has received the encoded voice data, the encoded voice data received from the server is temporarily stored in the voice data recording means, and when necessary. The audio data is decoded and reproduced and output by using the codebook of the specified identifier.

【００１６】個人情報端末装置からサーバに送信するコ
マンドは音声コマンドでもよい。音声データ記録手段や
コードブック格納手段には不揮発性メモリを用いること
ができる。フラッシュメモリを用いることもできる。フ
ラッシュメモリを用いた場合は、コードブック格納手段
において個々のコードブックを記録するメモリ領域をフ
ラッシュメモリの消去単位にあわせるとよい。The command transmitted from the personal information terminal device to the server may be a voice command. A non-volatile memory can be used for the voice data recording means and the codebook storage means. Flash memory can also be used. When a flash memory is used, the memory area in which each codebook is recorded in the codebook storage means may be adjusted to the erase unit of the flash memory.

【００１７】個人情報端末装置のコードブック管理手段
は、コードブックの登録機能（サーバからコードブック
を受信し、コードブック格納手段に登録する機能）、コ
ードブックを消去する機能（コードブック消去信号を受
信したとき、コードブック格納手段に記録してあるコー
ドブックを消去する機能）、および暗証番号のチェック
機能（正しい暗証番号が入力されたときのみ、前記コー
ドブック格納手段に格納されているコードブックの内容
を変更を許可する機能）を備えるようにするとよい。The codebook management means of the personal information terminal device has a function of registering a codebook (a function of receiving a codebook from a server and registering it in the codebook storage means) and a function of erasing the codebook (a codebook erasing signal). When received, the function of erasing the codebook recorded in the codebook storage means) and the function of checking the personal identification number (only when the correct personal identification number is input, the codebook stored in the codebook storage means) It is advisable to provide a function for permitting the change of the contents of.

【００１８】[0018]

【作用】本発明には、数々の変形が考えられるが、その
中で代表的な手段についてその作用を説明する。The present invention can be modified in various ways, and the typical means will be described below.

【００１９】個人情報端末装置と音声応答サーバにより
構成されるクライアント・サーバ型音声応答システムに
おいて、サーバから個人情報端末装置へ音声データを転
送する際、サーバ側に記憶されているコードブックを使
って音声データを符号化してから送信する。音声データ
を符号化することで、音声データを直接送信する場合に
比べて通信量を大幅に削減することができる。In the client-server type voice response system composed of the personal information terminal device and the voice response server, when the voice data is transferred from the server to the personal information terminal device, the codebook stored in the server side is used. Encode audio data before sending. By encoding the voice data, it is possible to significantly reduce the communication amount as compared with the case of directly transmitting the voice data.

【００２０】個人情報端末装置側では、受信した符号化
された音声データをいったん音声データ記録手段に記憶
する。音声の再生は、記憶された符号化音声データを順
次読みだし、コーデック手段により、コードブック格納
手段に記憶してあるコードブックを用いてデコードす
る。もちろん、その場で再生したくない場合には、音声
データ記録手段に保存したままにしておくことも可能で
ある。On the personal information terminal side, the received encoded voice data is temporarily stored in the voice data recording means. For reproduction of voice, the stored encoded voice data is sequentially read and decoded by the codec means using the codebook stored in the codebook storage means. Of course, if it is not desired to reproduce it on the spot, it is possible to keep it stored in the audio data recording means.

【００２１】ところで、サーバ側で用いたコードブック
と個人情報端末装置側で用いるコードブックとが同一で
ないと正しい音声の復元はできない。そこで、コードブ
ック管理手段では、事前にサーバ側にある多数のコード
ブックのうちひとつをコードブック格納手段に登録する
機能を持つ。コードブックはサーバと共通の識別子（例
えば、ＩＤ番号）をつけて管理される。もちろん、コー
ドブック格納手段に複数のコードブックを登録すること
も可能である。そして、通信を開始する時点で、使用す
るコードブックの識別子を指定することで、サーバと個
人情報端末装置（クライアント）のコードブックを一致
させることが可能となる。If the codebook used on the server side and the codebook used on the personal information terminal device side are not the same, correct voice cannot be restored. Therefore, the codebook management means has a function of registering one of a large number of codebooks on the server side in advance in the codebook storage means. The codebook is managed with an identifier (for example, ID number) common to the server. Of course, it is also possible to register a plurality of codebooks in the codebook storage means. By specifying the codebook identifier to be used at the time of starting communication, the codebooks of the server and the personal information terminal device (client) can be matched.

【００２２】したがって、本発明によれば、サーバから
符号化されて送信された音声データを符号化したまま音
声データ記録手段に記憶することで、音声を直接保存す
る場合に比べて、少ないメモリ容量で大量の音声データ
を保存することができ、使用者が好きなときに音声デー
タを再生することが可能となる。Therefore, according to the present invention, the voice data encoded and transmitted from the server is stored in the voice data recording means in the encoded state, so that the memory capacity is smaller than that in the case of directly storing the voice. With, a large amount of voice data can be stored, and the voice data can be played back when the user likes.

【００２３】また、通信を開始する時点で、使用するコ
ードブックの識別子を指定することで、標準化された単
一のコードブックを使用する場合に問題となる盗聴の問
題を回避することができる。また、コードブック消去信
号を受信するとコードブック格納手段のコードブックを
消去する機能を用意しておけば、個人情報端末を紛失し
た際も、取得した第三者による不正使用を防止すること
ができる。Further, by specifying the identifier of the codebook to be used at the time of starting communication, it is possible to avoid the problem of eavesdropping, which is a problem when using a single standardized codebook. Further, if a function of erasing the codebook of the codebook storage means upon reception of the codebook erasing signal is prepared, even if the personal information terminal is lost, it is possible to prevent unauthorized use by the acquired third party. .

【００２４】[0024]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００２５】図１は、個人情報端末装置をクライアント
とするクライアント・サーバ型音声応答システムの一実
施例のブロック図である。本実施例では、個人情報端末
装置１０１から接続可能な各種サービス（例えば、大量
のデータベースの検索を行なうデータ検索や、航空機や
ホテルなどの予約事務を行なう予約サービス、入力した
音声を他言語に翻訳する翻訳サービスなど）を音声で操
作する音声応答システムを想定している。FIG. 1 is a block diagram of an embodiment of a client / server type voice response system in which a personal information terminal device is a client. In the present embodiment, various services connectable from the personal information terminal device 101 (for example, data search for searching a large amount of databases, reservation service for reservations for aircraft, hotels, etc., translation of input voice into another language) It is envisioned that the voice response system will operate voice translation services such as a translation service.

【００２６】図１において、１０１は個人情報端末装
置、１０２は音声応答サーバ、１０３は通信システム、
１０４は音声認識システム、１０５はアプリケーション
システム、１０６は音声合成システムである。In FIG. 1, 101 is a personal information terminal device, 102 is a voice response server, 103 is a communication system,
Reference numeral 104 is a voice recognition system, 105 is an application system, and 106 is a voice synthesis system.

【００２７】個人情報端末装置１０１と音声応答サーバ
１０２とは、それぞれ通信機能を有し、無線による通信
を行なうことができる。通信システム１０３、音声認識
システム１０４、アプリケーションシステム１０５、お
よび音声合成システム１０６は、それぞれ音声応答サー
バ１０２の個々の機能として、ひとつのマシンがこれら
の機能を有していてもよいし、ネットワーク上にある複
数のコンピュータをそれぞれのサーバマシンとして機能
を分散させた構成でもよい。The personal information terminal device 101 and the voice response server 102 each have a communication function and can perform wireless communication. In the communication system 103, the voice recognition system 104, the application system 105, and the voice synthesis system 106, one machine may have these functions as the respective functions of the voice response server 102, or on the network. A configuration in which a plurality of computers are distributed as the respective server machines may be used.

【００２８】アプリケーションシステム１０５は、音声
応答システムを用いて行なう各種サービスを提供する部
分である。例えば、データベース検索サービスの場合に
は、アプリケーションシステム１０５はデータベースの
検索機能を有する。個人情報端末装置１０１、通信シス
テム１０３、音声認識システム１０４、および音声合成
システム１０６については後に詳しく説明する。The application system 105 is a part that provides various services provided by using the voice response system. For example, in the case of a database search service, the application system 105 has a database search function. The personal information terminal device 101, the communication system 103, the voice recognition system 104, and the voice synthesis system 106 will be described in detail later.

【００２９】図２は、個人情報端末１０１の一実施例の
外観図である。図２において、２０１はディスプレイ、
２０２はペン、２０３はプッシュボタン、２０４はマイ
クロフォン、２０５はスピーカ、２０６はアンテナであ
る。FIG. 2 is an external view of an embodiment of the personal information terminal 101. In FIG. 2, 201 is a display,
202 is a pen, 203 is a push button, 204 is a microphone, 205 is a speaker, and 206 is an antenna.

【００３０】ディスプレイ２０１は、テキストやグラフ
ィックス、動画などの表示に用いられると共に、ペン２
０２を使ったペン入力の入力インタフェースとしても使
用される。プッシュボタン２０３は、電源のオンオフな
どの種々の機能を果たすスイッチ、およびキーボードを
代替するキャラクタ入力手段として用いるスイッチなど
からなる。マイクロフォン２０４は、音声メモの録音や
音声コマンドの入力などに用いる。スピーカ２０５は、
音声の再生音やアラーム音を出力する。アンテナ２０６
は、無線電波の送信や受信を行なう部分である。本実施
例では、アンテナ２０６は外部に設けてあるが、筐体内
部に内蔵することも可能である。The display 201 is used for displaying text, graphics, moving images, etc.
It is also used as an input interface for pen input using 02. The push button 203 includes a switch that performs various functions such as turning the power on and off, and a switch that is used as a character input unit that substitutes for the keyboard. The microphone 204 is used for recording voice memos and inputting voice commands. The speaker 205 is
Outputs audio playback sounds and alarm sounds. Antenna 206
Is a part for transmitting and receiving radio waves. In this embodiment, the antenna 206 is provided outside, but it can be built in the housing.

【００３１】図３は、個人情報端末１０１の一実施例の
ハードウエア構成である。図３において、３０１はマイ
クロフォン、３０２はスピーカ、３０３はディスプレ
イ、３０４はボタン、３０５はアンテナ、３０６は入出
力制御部、３０７はＡ／Ｄ変換部、３０８はＤ／Ａ変換
部、３０９はＶＲＡＭ（ビデオＲＡＭ）、３１０はビデ
オコントローラ、３１１はタッチセンサコントローラ、
３１２はボタンコントローラ、３１３はＲＦモジュレー
タ・デモジュレータ、３１４はＣＰＵ、３１５はバス、
３１６はメモリ部、３１７はＲＯＭ、３１８はＲＡＭ、
３１９はメモリコントローラ、３２０はアドレス線、３
２１はデータ線、３２２は制御線である。FIG. 3 shows a hardware configuration of an embodiment of the personal information terminal 101. In FIG. 3, 301 is a microphone, 302 is a speaker, 303 is a display, 304 is a button, 305 is an antenna, 306 is an input / output control unit, 307 is an A / D conversion unit, 308 is a D / A conversion unit, and 309 is a VRAM. (Video RAM), 310 is a video controller, 311 is a touch sensor controller,
312 is a button controller, 313 is an RF modulator / demodulator, 314 is a CPU, 315 is a bus,
316 is a memory unit, 317 is a ROM, 318 is a RAM,
319 is a memory controller, 320 is an address line, 3
Reference numeral 21 is a data line and 322 is a control line.

【００３２】マイクロフォン３０１、スピーカ３０２、
ディスプレイ３０３、ボタン３０４、およびアンテナ３
０５は、それぞれ外部とのデータのやり取りを行なうた
めのインタフェースとして用いられている。これら入出
力インタフェースの機能については、図２で既に説明し
た。A microphone 301, a speaker 302,
Display 303, button 304, and antenna 3
Reference numeral 05 is used as an interface for exchanging data with the outside. The functions of these input / output interfaces have already been described with reference to FIG.

【００３３】入出力制御部３０６は、入出力インタフェ
ースを制御する部分であり、Ａ／Ｄ変換部３０７、Ｄ／
Ａ変換部３０８、ＶＲＡＭ３０９、ビデオコントローラ
３１０、タッチセンサコントローラ３１１、ボタンコン
トローラ３１２、およびＲＦモジュレータ・デモジュレ
ータ３１３によって構成される。The input / output control section 306 is a section for controlling the input / output interface, and includes the A / D conversion section 307 and the D / D conversion section 307.
It is composed of an A conversion unit 308, a VRAM 309, a video controller 310, a touch sensor controller 311, a button controller 312, and an RF modulator / demodulator 313.

【００３４】Ａ／Ｄ変換部３０７は、マイクロフォン３
０１から入力したアナログ信号をディジタル信号に変換
する機能をもつ。Ｄ／Ａ変換部３０８は、逆に、ディジ
タルデータをアナログ信号に変換し、スピーカに出力す
る。ＶＲＡＭ３０９は、ディスプレイ３０３に表示する
画面イメージデータを記憶するための専用メモリであ
る。ビデオコントローラ３１０は、ＶＲＡＭ３０９に読
み込まれるイメージデータの更新制御と、ＶＲＡＭ３０
９に記憶されている画面イメージデータをディスプレイ
３０３に表示する働きをする。The A / D converter 307 is provided for the microphone 3
It has a function of converting an analog signal input from 01 into a digital signal. Conversely, the D / A converter 308 converts digital data into an analog signal and outputs it to the speaker. The VRAM 309 is a dedicated memory for storing screen image data displayed on the display 303. The video controller 310 controls the update of the image data read into the VRAM 309 and the VRAM 30.
9 functions to display the screen image data stored in 9 on the display 303.

【００３５】タッチセンサコントローラ３１１は、ディ
スプレイ３０３のタッチパネル用のセンサコントローラ
である。ボタンコントローラ３１２は、ボタン３０４の
オン／オフをセンスするためのコントローラである。Ｒ
Ｆモジュレータ・デモジュレータ３１３は、送信するデ
ータをアンテナ３０５から無線電波として放出するため
に、データをＲＦ（Radio frequency ）信号に変調（モ
ジュレート）する働きと、アンテナ３０５で受信したＲ
Ｆ信号を復調（デモジュレート）する働きをもつ。The touch sensor controller 311 is a sensor controller for the touch panel of the display 303. The button controller 312 is a controller for sensing ON / OFF of the button 304. R
The F modulator / demodulator 313 has a function of modulating (modulating) data into an RF (Radio frequency) signal in order to emit the data to be transmitted from the antenna 305 as a radio wave, and the R received by the antenna 305.
It has the function of demodulating the F signal.

【００３６】ＣＰＵ３１４は、プログラムに基づいてシ
ステムを制御・動作させるメインの処理ユニットであ
る。バス３１５は、ＣＰＵ３１４と他の処理ユニットと
の間で制御命令やデータなどを通信するための通信路で
ある。The CPU 314 is a main processing unit that controls and operates the system based on a program. The bus 315 is a communication path for communicating control commands and data between the CPU 314 and other processing units.

【００３７】メモリ部３１６は、ＲＯＭ３１７、ＲＡＭ
３１８、およびメモリコントローラ３１９から構成され
ている。ＲＯＭ３１７は、読みだし専用のメモリで、各
種の制御プログラムなどが格納されている。ＲＡＭ３１
８は、読みだし書き込み共に可能なメモリであり、使用
者が入力した各種のデータなどを格納する。ＲＡＭ３１
８には、通常ＤＲＡＭ（Dynamic RAM ）が用いられる。
本実施例でもＲＡＭ３１８はＤＲＡＭとする。The memory unit 316 includes a ROM 317 and a RAM.
318 and a memory controller 319. The ROM 317 is a read-only memory and stores various control programs and the like. RAM31
A read / write memory 8 stores various data input by the user. RAM31
A DRAM (Dynamic RAM) is usually used for the memory 8.
Also in this embodiment, the RAM 318 is a DRAM.

【００３８】ＤＲＡＭは、単位面積あたりの記憶容量が
大きく、かつビットあたりのコストが低いといった特徴
がある。ところが、ＤＲＡＭは一定周期ごとにリフレッ
シュしてやらないと記録したデータを保持することがで
きない。つまり、携帯端末の記憶装置として使用するた
めには、バックアップ用のバッテリが必要となり、また
長時間のデータ保存ができない。そこで、ＲＡＭ３１８
として、高価ではあるが消費電力の少ないＳＲＡＭ（St
atic RAM）を使用することもある。The DRAM is characterized by a large storage capacity per unit area and a low cost per bit. However, the DRAM cannot hold the recorded data unless it is refreshed at regular intervals. That is, in order to use it as a storage device of a mobile terminal, a backup battery is required, and data cannot be stored for a long time. Therefore, RAM318
As an SRAM (St
atic RAM) sometimes used.

【００３９】なお、本実施例では、ＲＯＭ３１７および
ＲＡＭ３１８は、それぞれ１チップで構成されている
が、複数チップで構成されていてもよい。In this embodiment, the ROM 317 and the RAM 318 are each composed of one chip, but they may be composed of a plurality of chips.

【００４０】メモリコントローラ３１９は、ＲＯＭ３１
７およびＲＡＭ３１８からのデータの読みだしや、ＲＡ
Ｍ３１８へのデータの書き込みを制御するためのコント
ローラである。メモリコントローラ３１９と各種メモリ
３１７，３１８とは、アドレス線３２０、データ線３２
１、および制御線３２２で接続されている。The memory controller 319 is a ROM 31.
7 and the reading of data from the RAM 318, RA
A controller for controlling the writing of data to the M318. The memory controller 319 and the various memories 317 and 318 include an address line 320 and a data line 32.
1 and a control line 322.

【００４１】アドレス線３２０はアクセスするメモリ領
域を指定するアドレスを、データ線３２１はアクセスし
たデータを、制御線３２２は書き込み、消去、および読
み込みなどの命令を、それぞれ通信する線である。例え
ば、ＲＡＭ３１８にデータを書き込む場合を例にとる
と、メモリコントローラ３１９は、アドレス線３２０に
書き込み先のアドレスを、データ線３２１に書き込むデ
ータの値をセットし、制御線３２２より書き込み信号を
送る。The address line 320 is a line for communicating an address designating a memory area to be accessed, the data line 321 is a line for communicating the accessed data, and the control line 322 is a line for communicating commands such as writing, erasing and reading. For example, taking the case of writing data to the RAM 318 as an example, the memory controller 319 sets a write destination address in the address line 320, sets the value of the data to be written in the data line 321, and sends a write signal from the control line 322.

【００４２】次に、個人情報端末１０１の機能について
説明する。図４は、個人情報端末１０１の一実施例の機
能を説明するためのブロック図である。図４において、
４０１は制御部、４０２は入出力部、４０３は音声デー
タ記録部、４０４はコーデック部、４０５はコードブッ
ク管理部、４０６はコードブック格納部、４０７はプロ
グラム格納部、４０８はデータ格納部、４０９は通信部
である。Next, the function of the personal information terminal 101 will be described. FIG. 4 is a block diagram for explaining the function of one embodiment of the personal information terminal 101. In FIG.
Reference numeral 401 is a control unit, 402 is an input / output unit, 403 is an audio data recording unit, 404 is a codec unit, 405 is a codebook management unit, 406 is a codebook storage unit, 407 is a program storage unit, 408 is a data storage unit, 409. Is a communication unit.

【００４３】制御部４０１は、図３のＣＰＵ３１４に相
当し、システム全体の制御をつかさどる部分である。入
出力部４０２は、図３のマイク３０１などの入出力イン
タフェースと入出力制御部３０６から構成され、使用者
からのデータやコマンドの入力、処理結果の使用者への
出力などを行なう。音声データ記録部４０３は、音声応
答サーバ１０２などの音声データを出力するサーバから
転送された音声データを格納するための専用メモリであ
る。この実施例では、ＲＡＭ３１８の一部を音声データ
記録部４０３として用いている。The control unit 401 corresponds to the CPU 314 in FIG. 3, and is a part that controls the entire system. The input / output unit 402 is composed of an input / output interface such as the microphone 301 of FIG. 3 and an input / output control unit 306, and inputs data and commands from the user and outputs processing results to the user. The voice data recording unit 403 is a dedicated memory for storing voice data transferred from a server such as the voice response server 102 that outputs voice data. In this embodiment, part of the RAM 318 is used as the audio data recording unit 403.

【００４４】コーデック部４０４は、コードブックを用
いて、音声を符号化したり、符号化された音声データを
復号化してもとの音声波形を再生する機能をもつ。コー
デック部４０４については、後に詳しく説明する。コー
ドブック管理部４０５は、コードブック格納部４０６に
記録されているコードブックを管理し、コーデック部４
０４で使用するコードブックを決定すると共に、お互い
が使用するコードブックが同一となるよう、音声応答サ
ーバ１０２側に使用するコードブックを通知する機能を
もつ。The codec section 404 has a function of reproducing the original audio waveform by encoding audio using a codebook or decoding encoded audio data. The codec section 404 will be described later in detail. The codebook management unit 405 manages the codebook recorded in the codebook storage unit 406, and the codebook unit 4
It has a function of determining the codebook to be used in 04 and notifying the codebook to be used to the voice response server 102 side so that the codebooks used by each other are the same.

【００４５】プログラム格納部４０７には、ＣＰＵ３１
４を動作させるためのプログラムが格納されている。デ
ータ格納部４０８には、使用者が入力したデータやシス
テムの処理結果など各種のデータが格納されている。プ
ログラム格納部４０７とデータ格納部４０８は、共に、
図３のメモリ部３１６に相当する。通信部４０９は、ア
ンテナ３０５とＲＦモジュレータ・デモジュレータ３１
３とから構成され、サーバマシンとの通信を行なう機能
を有する。In the program storage unit 407, the CPU 31
A program for operating 4 is stored. The data storage unit 408 stores various data such as data input by the user and system processing results. Both the program storage unit 407 and the data storage unit 408 are
It corresponds to the memory unit 316 in FIG. The communication unit 409 includes an antenna 305 and an RF modulator / demodulator 31.
3, and has a function of communicating with the server machine.

【００４６】図５は、図４のコーデック部４０４の一実
施例を説明するための図である。コーデック部４０４
は、音声を符号化することで情報圧縮を行なうコーディ
ング部と、符号化された音声データを再生するデコーデ
ィング部とからなる。音声の符号化に関しては、従来の
技術の項目ですでに述べたように、古くからさまざまな
方式が提案されている。本実施例ではそれらの中の一例
について説明する。FIG. 5 is a diagram for explaining an embodiment of the codec section 404 of FIG. Codec section 404
Is composed of a coding unit for compressing information by encoding voice and a decoding unit for reproducing encoded voice data. With respect to audio coding, various methods have been proposed for a long time, as already described in the section of the prior art. In this embodiment, an example of them will be described.

【００４７】図５において、５０１は分析部、５０２は
ベクトル量子化部、５０３はベクトル逆量子化部、５０
４は合成部である。In FIG. 5, 501 is an analysis unit, 502 is a vector quantization unit, 503 is a vector dequantization unit, and 50 is a vector dequantization unit.
4 is a synthesis unit.

【００４８】分析部５０１は、入力した音声を一定周期
（20〜60msec. 程度）毎に切り出し、切り出した短時間
音声波形データから音声の特徴量を抽出する。本実施例
では、パワスペクトル包絡（PSE）分析法を例にとって
説明する。PSE分析法については、中島等、”パワスペ
クトル包絡（PSE ）音声分析・合成系”、日本音響学会
誌44巻11号（昭63-11 ）に詳細に述べられている。The analysis unit 501 cuts out the input voice at regular intervals (about 20 to 60 msec.), And extracts the feature amount of the voice from the cut out short-time voice waveform data. In this embodiment, a power spectrum envelope (PSE) analysis method will be described as an example. The PSE analysis method is described in detail in Nakajima et al., "Power Spectrum Envelope (PSE) Speech Analysis / Synthesis System", Journal of Acoustical Society of Japan, Vol. 44, No. 11 (Sho 63-11).

【００４９】PSE 分析法を用いた分析部５０１では、切
り出した短時間音声波形データから、ピッチ情報（ピッ
チ周波数またはピッチ周期）５０５、パワ情報（短時間
パワ）５０６、およびスペクトル情報（線スペクトル成
分）５０７の３つの情報を抽出する。ピッチ情報５０５
の抽出は、相関法やAMDF法など公知の方法を用いればよ
い。スペクトル情報５０７の抽出には、フーリエ変換を
用いる。フーリエ変換で得られたフーリエ係数を自乗す
ると、切り出し波形のパワスペクトルが得られる。In the analysis unit 501 using the PSE analysis method, pitch information (pitch frequency or pitch period) 505, power information (short time power) 506, and spectrum information (line spectrum component) are extracted from the cut out short-time speech waveform data. ) Extract three pieces of information 507. Pitch information 505
For the extraction of R, a known method such as a correlation method or AMDF method may be used. Fourier transform is used to extract the spectrum information 507. When the Fourier coefficient obtained by the Fourier transform is squared, the power spectrum of the cutout waveform is obtained.

【００５０】切り出し波形が周期構造を有する場合は、
このスペクトルはピッチの高調波による線スペクトル構
造を有する。そこで、フーリエ変換で得られたスペクト
ルの中から、抽出したピッチ周波数の高調波成分のみを
取り出す。このようにして取り出したデータを用いて、
パワスペクトル包絡を下記の数１に示す有限個の余弦級
数で近似する。When the cut-out waveform has a periodic structure,
This spectrum has a line spectral structure due to pitch harmonics. Therefore, only the extracted harmonic component of the pitch frequency is extracted from the spectrum obtained by the Fourier transform. Using the data retrieved in this way,
The power spectrum envelope is approximated by a finite number of cosine series shown in Equation 1 below.

【００５１】[0051]

【数１】 [Equation 1]

【００５２】ここで、f0をサンプリング周波数、fpをピ
ッチ周波数とすると、λは下記の数２で表される。Here, when f0 is the sampling frequency and fp is the pitch frequency, λ is expressed by the following equation 2.

【００５３】[0053]

【数２】 [Equation 2]

【００５４】数１における係数A0およびA(1)からA(m)
は、パワスペクトルから抽出されたピッチ周波数の高調
波成分と数１によるＹとの二乗誤差が最小になるよう求
められる。ここで、A0は入力音声の短時間パワを表して
いるのでパワ情報５０６として出力し、A(1)からA(m)を
スペクトル情報５０７として出力する。Coefficients A0 and A (1) to A (m) in Equation 1
Is calculated so that the squared error between the harmonic component of the pitch frequency extracted from the power spectrum and Y according to Equation 1 is minimized. Here, since A0 represents the short-time power of the input voice, it is output as power information 506, and A (1) to A (m) are output as spectrum information 507.

【００５５】ベクトル量子化部５０２では、音声のスペ
クトル情報を代表するベクトル（コードベクトル）とそ
のコードベクトルに対応する指標（コードワード）とを
格納するコードブックを用いて、スペクトル情報５０７
（入力ベクトル）をベクトル量子化する。コードブック
の詳細については、後に図６を用いて説明する。The vector quantizer 502 uses the codebook that stores the vector (code vector) representing the spectrum information of the voice and the index (codeword) corresponding to the code vector to use the spectrum information 507.
(Input vector) is vector-quantized. Details of the codebook will be described later with reference to FIG.

【００５６】入力ベクトル５０７がベクトル量子化部５
０２に入力すると、コードブックから各コードベクトル
が読みだされ、入力ベクトル５０７と各コードベクトル
との距離djを以下の数３でそれぞれ計算する。そして、
距離が最小となるコードベクトル（例えば、i番目のコ
ードベクトルBi(1),Bi(2),...,Bi(m))を求め、そのコー
ドワードiを出力する。The input vector 507 is the vector quantizer 5
When input to 02, each code vector is read from the code book, and the distance dj between the input vector 507 and each code vector is calculated by the following formula 3. And
A code vector that minimizes the distance (for example, i-th code vector Bi (1), Bi (2), ..., Bi (m)) is obtained, and the codeword i is output.

【００５７】[0057]

【数３】 (Equation 3)

【００５８】本実施例では距離尺度として、ベクトルの
各要素に重み付けしたユークリッド距離を用いた。もち
ろん、その他の適当な距離尺度を用いてもよい。In this embodiment, the Euclidean distance weighting each element of the vector is used as the distance measure. Of course, other suitable distance measures may be used.

【００５９】以上、分析部５０１とベクトル量子化部５
０２で説明した処理によって、音声データのコーディン
グ（符号化）が行なわれ、入力音声は、ピッチ情報５０
５、パワ情報５０６、およびコードワード５０８の３つ
のパラメータに変換され送信される。As described above, the analysis unit 501 and the vector quantization unit 5
By the processing described in 02, the audio data is coded (encoded), and the input audio becomes pitch information 50.
5, power information 506, and codeword 508 are converted into three parameters and transmitted.

【００６０】次に、音声データのデコーディング処理に
関して、ベクトル逆量子化部５０３と合成部５０４の動
作を説明する。Next, the operations of the vector dequantization unit 503 and the synthesis unit 504 regarding the decoding process of the voice data will be described.

【００６１】ベクトル逆量子化部５０３は、ベクトル量
子化部５０２とは逆に、コードブックを用いて、ベクト
ル量子化されたデータ５０９（コードワード）からべク
トルデータ（スペクトル情報）５１０に変換する。ベク
トル逆量子化部５０３で用いるコードブックは、ベクト
ル量子化に用いたコードブックと同一でなければならな
いことは言うまでもない。Inversely to the vector quantizer 502, the vector dequantizer 503 converts the vector quantized data 509 (codeword) into vector data (spectral information) 510 using a codebook. . It goes without saying that the codebook used by the vector dequantization unit 503 must be the same as the codebook used for vector quantization.

【００６２】合成部５０４は、入力したピッチ情報５１
１、パワ情報５１２、およびべクトル逆量子化部５０３
で得られたスペクトル情報５１０とから音声波形を合成
する。入力されたピッチ周波数５１１の値をfp'、パワ
５１２の値をA0'、スペクトル情報５１０の各要素をBk
(1)、Bk(2)、...、Bk(m)とすると、パワスペクトルY'は
以下の数４で得られる。The synthesizer 504 receives the input pitch information 51.
1, power information 512, and vector dequantization unit 503
A voice waveform is synthesized from the spectrum information 510 obtained in (1). The input pitch frequency 511 value is fp ', the power 512 value is A0', and each element of the spectrum information 510 is Bk.
(1), Bk (2), ..., Bk (m), the power spectrum Y ′ is obtained by the following equation 4.

【００６３】[0063]

【数４】 [Equation 4]

【００６４】ただし、λ'は、下記の数５の通りであ
る。However, λ'is as shown in the following expression 5.

【００６５】[0065]

【数５】 (Equation 5)

【００６６】再生されたパワスペクトルは、振幅スペク
トルに変換した後、フーリエ逆変換を施すことで音素片
が得られる。この音素片をピッチ間隔だけずらしながら
加え合わせることによって、再生音声が得られる。The reproduced power spectrum is converted into an amplitude spectrum and then subjected to inverse Fourier transform to obtain a phoneme piece. A reproduced voice can be obtained by adding these phonemes while shifting them by a pitch interval.

【００６７】図６を用いて、コードブックについて説明
する。図６は、図５のコーデック部で使用するコードブ
ックの一実施例である。図６に示すように、コードブッ
クはコードワード６０１とコードベクトル６０２とから
構成されている。コードベクトル６０２は、音声の特徴
量を示すｍ個の値で表わされるｍ次元ベクトル量であ
り、本実施例では、B(1)、B(2)、...、B(m)のｍ個のス
ペクトル情報である。The codebook will be described with reference to FIG. FIG. 6 is an example of a codebook used in the codec section of FIG. As shown in FIG. 6, the codebook is composed of a codeword 601 and a code vector 602. The code vector 602 is an m-dimensional vector quantity represented by m values indicating a voice feature quantity, and in this embodiment, m of B (1), B (2), ..., B (m). It is individual spectrum information.

【００６８】コードベクトルは、あらかじめ学習用の音
声データを用い、それらの音声から得られた特徴ベクト
ルを代表するベクトルとして作成しておく。コードベク
トルの作成アルゴリズムとしては、LPGアルゴリズムが
有名である。LPGアルゴリズムは、歪尺度を用いてデー
タ（学習音声の特徴ベクトル）をクラスタリングする手
法であり、パタン認識の分野で用いられるクラスタリン
グ手法のk-meansアルゴリズムと本質的には同じであ
る。For the code vector, voice data for learning is used in advance and is prepared as a vector representing the feature vector obtained from those voices. The LPG algorithm is well known as a code vector creation algorithm. The LPG algorithm is a method of clustering data (feature vector of learning speech) using a distortion measure, and is essentially the same as the k-means algorithm which is a clustering method used in the field of pattern recognition.

【００６９】コードワード６０１は、これらのコードベ
クトルに１対１に対応づけられた一次元の値である。つ
まり、コードワードが指定されると、指定されたコード
ワード（例えば、２）に対応するコードベクトル（この
場合はB2(1),B2(2),...,B2(m)）が一意に特定される。The codeword 601 is a one-dimensional value associated with these code vectors in a one-to-one correspondence. That is, when a codeword is specified, the codevector (B2 (1), B2 (2), ..., B2 (m) in this case) corresponding to the specified codeword (eg, 2) is unique. Specified in.

【００７０】個人情報端末１０１の説明の最後として、
コードブック管理部４０５とコードブック格納部４０６
について説明する。本実施例では、クライアントである
個人情報端末１０１と音声応答サーバ１０２とが共通の
コードブックを用いることで、音声データを符号化して
送受信を行ない、通信データ量の削減が図られている。At the end of the explanation of the personal information terminal 101,
Codebook management unit 405 and codebook storage unit 406
Will be described. In this embodiment, the personal information terminal 101, which is a client, and the voice response server 102 use a common codebook, so that voice data is encoded and transmitted / received to reduce the communication data amount.

【００７１】ここで問題となるのが通信のセキュリティ
である。つまり、第三者が同じコードブックを持ってい
れば通信内容を盗聴することは容易である。したがっ
て、個人情報端末ごとにオリジナルのコードブックを用
意するか、あるいは多くのバラエティをもったコードブ
ックを用意しておき、通信毎に切換えて使用することが
望ましい。もちろん、サーバとクライアントの両者が共
通のコードブックを使用しないと通信は成り立たない。The problem here is communication security. In other words, if a third party has the same codebook, it is easy to eavesdrop on the communication content. Therefore, it is desirable to prepare an original codebook for each personal information terminal, or prepare a codebook with many varieties and switch and use it for each communication. Of course, communication cannot be established unless both the server and the client use a common codebook.

【００７２】図７は、コードブック格納部４０６の一実
施例のメモリマップである。図７において、７０１には
コードブック管理情報が、７０２，７０３にはそれぞれ
コードブックが格納されている。各コードブックは、図
６に示した構成を有する。FIG. 7 is a memory map of one embodiment of the codebook storage unit 406. In FIG. 7, codebook management information is stored in 701, and codebooks are stored in 702 and 703, respectively. Each codebook has the configuration shown in FIG.

【００７３】図７において、コードブック管理情報７０
１は、コードブック管理部４０５によって管理されてい
る。コードブック格納部４０６に格納されているコード
ブックは、それぞれサーバ側と共通のＩＤ番号が付けら
れて管理されている。このように、サーバとクライアン
トとでＩＤ番号を統一しておくことで、サーバ側にコー
ドブックのＩＤ番号のみを指定するだけで共通のコード
ブックをアクセスすることが可能となる。In FIG. 7, codebook management information 70
1 is managed by the codebook management unit 405. The codebooks stored in the codebook storage unit 406 are managed with ID numbers common to the server side. By thus unifying the ID numbers of the server and the client, it becomes possible to access the common codebook by only specifying the ID number of the codebook on the server side.

【００７４】コードブック管理情報７０１には、このＩ
Ｄ番号（７０４）とそのＩＤ番号を持ったコードブック
が格納されているメモリ領域の先頭アドレス（７０５）
が交互に記録されている。コードブック管理部４０５
は、このコードブック管理情報７０１をアクセスするこ
とによって、任意のＩＤ番号を持ったコードブックを使
用することができる。また、新たにコードブックを追
加、または削除する場合もコードブック管理情報７０１
を更新すればよい。The codebook management information 701 contains this I
Start address (705) of the memory area in which the codebook having the D number (704) and the ID number is stored
Are recorded alternately. Codebook management unit 405
By accessing this codebook management information 701, a codebook having an arbitrary ID number can be used. In addition, when a codebook is newly added or deleted, the codebook management information 701 is also added.
Should be updated.

【００７５】以上で個人情報端末１０１の説明を終わ
る。This is the end of the description of the personal information terminal 101.

【００７６】次に、音声応答サーバ１０２の詳細につい
て説明する。Next, the details of the voice response server 102 will be described.

【００７７】図８は、通信システム１０３の一実施例を
説明するための図である。図８において、８０１は通信
部、８０２はコーデック部、８０３はコードブック管理
部、８０４はコードブック格納部である。通信部８０１
は、図３のＲＦモジュレータ・デモジュレータ３１３と
同じく、送信するデータをアンテナから放出するため
に、ＲＦ信号に変調する働きと、アンテナで受信したＲ
Ｆ信号を復調（検波）する働きをもつ。FIG. 8 is a diagram for explaining an embodiment of the communication system 103. In FIG. 8, 801 is a communication unit, 802 is a codec unit, 803 is a codebook management unit, and 804 is a codebook storage unit. Communication unit 801
Is the same as the RF modulator / demodulator 313 in FIG. 3, it functions to modulate the RF signal in order to emit the data to be transmitted from the antenna and the R received by the antenna.
It has a function of demodulating (detecting) the F signal.

【００７８】コーデック部８０２、コードブック管理部
８０３、コードブック格納部８０４も、図４で説明した
個人情報端末１０１のコーデック部４０４、コードブッ
ク管理部４０５、コードブック格納部４０６とほぼ同じ
働きをする。ただし、コードブック管理部８０３は、ク
ライアント毎にコードブックを管理する。また、コード
ブック格納部８０４には、多数のクライアントが使用す
るすべてのコードブックが格納されている。The codec section 802, the codebook management section 803, and the codebook storage section 804 also have substantially the same functions as the codec section 404, the codebook management section 405, and the codebook storage section 406 of the personal information terminal 101 described with reference to FIG. To do. However, the codebook management unit 803 manages the codebook for each client. The codebook storage unit 804 stores all codebooks used by many clients.

【００７９】図９は、音声認識システム１０４の一実施
例を説明するための図である。図９において、９０１は
音響分析部、９０２は言語処理部、９０３は意味処理部
である。FIG. 9 is a diagram for explaining one embodiment of the voice recognition system 104. In FIG. 9, 901 is an acoustic analysis unit, 902 is a language processing unit, and 903 is a semantic processing unit.

【００８０】音響分析部９０１は、入力した音声データ
を一定間隔（通常は数十ms）毎に分割した後、各分割単
位（分析フレーム）毎に認識の判定に用いる音声の特徴
量を計算する。音声認識に用いられる音声の特徴量とし
ては、LPCケプストラムが広く用いられている。LPCケプ
ストラムとは、線形予測分析（LPC分析）によって求め
られた線形予測係数を使って計算されるケプストラムで
ある。LPCケプストラムをはじめとして、音声の特徴分
析に関しては、古井の「ディジタル音声処理」（東海大
学出版会）などに詳しく解説されている。The acoustic analysis unit 901 divides the input voice data at regular intervals (usually several tens of ms), and then calculates the feature amount of the voice used for recognition determination for each division unit (analysis frame). . The LPC cepstrum is widely used as a feature quantity of speech used for speech recognition. The LPC cepstrum is a cepstrum calculated using the linear prediction coefficient obtained by the linear prediction analysis (LPC analysis). The feature analysis of speech, including the LPC cepstrum, is explained in detail in Furui's "Digital Speech Processing" (Tokai University Press).

【００８１】言語処理部９０２は、音響分析部９０１で
求められた入力音声の特徴量とあらかじめ用意してある
音響モデル９０４、単語辞書９０５、および文法９０６
とを用いて、音声の認識を行なう。音声の認識は、HMM
（Hidden Markov Model ）を用いる方式が主流である。
HMM では、音声を確率モデルにあてはめて認識を行な
う。The language processing unit 902 and the characteristic amount of the input voice obtained by the acoustic analysis unit 901 and the acoustic model 904, the word dictionary 905, and the grammar 906 prepared in advance.
And are used for voice recognition. HMM for voice recognition
The method using (Hidden Markov Model) is the mainstream.
In HMM, speech is applied to a probabilistic model for recognition.

【００８２】具体的には、音響モデルを数個の状態数を
もつ遷移モデルで表し、各音韻のコードブックの生起確
率と状態間の遷移確率をあらかじめ学習しておく。音声
の認識では、単語辞書９０５および文法９０６を用いて
音響モデルを接続したモデルを作成し、入力した音声の
特徴量の時系列をこれらのモデルにあてはめたときの確
率が最大となるモデルを認識結果とする。Specifically, the acoustic model is represented by a transition model having several states, and the occurrence probability of the codebook of each phoneme and the transition probability between states are learned in advance. In the speech recognition, a model in which acoustic models are connected is created using the word dictionary 905 and the grammar 906, and the model having the maximum probability when the time series of the input speech feature amount is applied to these models is recognized. The result.

【００８３】意味処理部９０３は、言語処理部９０２で
認識された結果を意味解析し、アプリケーションシステ
ム１０５が理解できるコマンドに変換する。例えば、
「XXXを検索して」といった認識結果に対して、"search
XXX"に変換し、アプリケーションシステム１０５に送
信する。The semantic processing unit 903 semantically analyzes the result recognized by the language processing unit 902 and converts it into a command that the application system 105 can understand. For example,
For the recognition result such as "Search XXX", "search
It is converted to “XXX” and transmitted to the application system 105.

【００８４】図１０は、音声合成システム１０６の一実
施例を説明するための図である。図１０において、１０
０１は応答文生成部、１００２はテキスト解析部、１０
０３は韻律制御部、１００４は音声生成部である。FIG. 10 is a diagram for explaining one embodiment of the voice synthesis system 106. In FIG. 10, 10
01 is a response sentence generation part, 1002 is a text analysis part, 10
Reference numeral 03 is a prosody control unit, and 1004 is a voice generation unit.

【００８５】応答文生成部１００１は、アプリケーショ
ンシステム１０５が処理した結果を受取り、応答文を生
成する。例えば、アプリケーションシステム１０５にデ
ータ検索を依頼した場合には、検索結果に合わせて、
「該当する項目は見つかりませんでした。」や「該当す
る項目は３件あります。」といった文章を生成する。も
ちろん、アプリケーションシステム１０５で応答文を生
成し出力する場合には、応答文生成部１００１の処理は
不要である。The response sentence generator 1001 receives the result processed by the application system 105 and generates a response sentence. For example, when the application system 105 is requested to search for data, according to the search result,
A sentence such as "No corresponding item was found." Or "There are three corresponding items." Is generated. Of course, when the application system 105 generates and outputs a response sentence, the processing of the response sentence generation unit 1001 is unnecessary.

【００８６】テキスト解析部１００２は、応答文生成部
１００１で生成された応答文に対して、単語辞書１００
５を用いて形態素解析を行ない、入力テキストに正しい
読みを与えた後、アクセント結合を考慮して発音記号列
を生成する。形態素解析にはいくつかの方法が提案され
ているが、例えば文章の先頭から一番長く構文をあては
められた結果を正しい解析結果とする、左最長一致法な
どが良く知られている。The text analysis unit 1002 uses the word dictionary 100 for the response sentence generated by the response sentence generation unit 1001.
Morphological analysis is performed using 5 to give a correct reading to the input text, and then a phonetic symbol string is generated in consideration of accent combination. Several methods have been proposed for morphological analysis, and for example, the left longest matching method, which uses the result of the longest syntax fitting from the beginning of a sentence as the correct analysis result, is well known.

【００８７】韻律制御部１００３は、テキスト解析部１
００２で生成した発音記号列に対して韻律規則にのっと
って韻律情報を付加する。韻律規則は、実際に発話され
た人間の音声を開発者が分析した結果に基づいて作成さ
れた制御モデル１００６から構成される。継続長の制御
やピッチパタンの制御といった韻律情報を付加すること
によって、自然なリズムやイントネーションを伴った合
成音声の生成が可能となる。The prosody control unit 1003 has a text analysis unit 1
Prosodic information is added to the phonetic symbol string generated in 002 according to the prosodic rule. The prosody rule is composed of the control model 1006 created based on the result of the analysis of the actually spoken human voice by the developer. By adding prosody information such as duration control and pitch pattern control, it becomes possible to generate synthetic speech with natural rhythm and intonation.

【００８８】音声生成部１００４は、韻律制御部１００
３で処理された韻律情報を付加した発音記号列に対し、
音声単位モデル１００７を使って音声を再生する。音声
生成部１００４の処理は、単位音声の選択と音声信号の
生成とに分かれる。単位音声の選択では、音声単位モデ
ルとして用意した単位音声（例えば、CV音節、VC音節な
ど）の中から、合成する音素系列と抽出元の音素環境と
を考慮して適切な単位音声が選択される。音声信号の生
成では、継続長の制御やピッチパタンの制御を行ないな
がら、選択された単位音声を接続して音声信号を生成す
る。このとき、接続した部分が不連続にならないように
補間処理を行なう。The voice generation unit 1004 has a prosody control unit 100.
For the phonetic symbol string added with the prosody information processed in 3,
The voice is reproduced using the voice unit model 1007. The process of the voice generation unit 1004 is divided into unit voice selection and voice signal generation. In the unit voice selection, an appropriate unit voice is selected from the unit voices prepared as the voice unit model (for example, CV syllable, VC syllable, etc.) in consideration of the phoneme sequence to be synthesized and the phoneme environment of the extraction source. It In the generation of a voice signal, while controlling the duration and the pitch pattern, the selected unit voices are connected to generate a voice signal. At this time, interpolation processing is performed so that the connected portions do not become discontinuous.

【００８９】以上で、音声応答サーバ１０２の説明を終
える。The description of the voice response server 102 is completed.

【００９０】次に、個人情報端末装置１０１と音声応答
サーバ１０２とを用いたクライアント・サーバ型音声応
答システムの処理について説明する。Next, the processing of the client / server type voice response system using the personal information terminal device 101 and the voice response server 102 will be described.

【００９１】図１１は、クライアント・サーバ型音声応
答システムの一実施例の処理シーケンスである。このよ
うな処理シーケンスは、音声応答サーバで行なうサービ
ス、および使用するコマンドにより多くのバラエティー
が考えられる。よって、ここで説明する処理シーケンス
は、ほんの一例に過ぎない。FIG. 11 is a processing sequence of an embodiment of the client / server type voice response system. Many kinds of such processing sequences are conceivable depending on the service provided by the voice response server and the command used. Therefore, the processing sequence described here is only an example.

【００９２】以下、図１１の処理フローにのっとって処
理シーケンスを説明する。最初に個人情報端末装置１０
１の使用者が、音声コマンドの入力を行なう（１１０
１）。音声コマンドとは、音声を使った音声応答システ
ムに対する要求であり、具体的には「XXXを検索して」
や「○月X日に○○に△を予約して」といった命令であ
る。The processing sequence will be described below based on the processing flow of FIG. First, the personal information terminal device 10
1 user inputs a voice command (110)
1). A voice command is a request for a voice response system using voice, specifically, "search for XXX".
Or "Reserving △ to XX on XX month".

【００９３】個人情報端末装置１０１は、入力した音声
を符号化により情報圧縮した（１１０２）後、音声無線
機能を使って音声応答サーバ１０２と通信し、音声コマ
ンドを送る（１１０３）。このとき、符号化に用いたコ
ードブックのＩＤ番号を音声応答サーバ１０２に送信
し、使用するコードブックを一致させる。The personal information terminal device 101 compresses the input voice by compressing the information (1102), communicates with the voice response server 102 using the voice wireless function, and sends a voice command (1103). At this time, the ID number of the codebook used for encoding is transmitted to the voice response server 102, and the codebooks used are matched.

【００９４】音声応答サーバ１０２では、符号化された
音声コマンドが入力されると（１１０４）、通信システ
ム１０３において符号化された音声のデコード（復号
化）を行ない、もとの音声波形に復元する（１１０
５）。次に、復元した音声コマンドに対して、音声認識
システム１０４を用いて音声の認識を行なう（１１０
６）。音声コマンドの認識については、音声認識システ
ム１０４の説明の箇所で詳しく述べた。音声の認識が終
了すると、アプリケーションシステム１０５は、音声認
識システム１０４より送られたコマンドを受理し、コマ
ンドを実行する（１１０７）。When the encoded voice command is input (1104), the voice response server 102 decodes the encoded voice in the communication system 103 to restore the original voice waveform. (110
5). Next, voice recognition is performed on the restored voice command using the voice recognition system 104 (110).
6). The recognition of voice commands is described in detail in the description of the voice recognition system 104. When the voice recognition is completed, the application system 105 receives the command sent from the voice recognition system 104 and executes the command (1107).

【００９５】アプリケーションシステム１０５の処理が
完了すると、音声合成システム１０６は、アプリケーシ
ョンシステム１０５が実行した処理結果を合成音声に変
換する（１１０８）。音声合成システム１０６で合成さ
れた処理結果の音声は、通信システム１０３で符号化し
（１１０９）、個人情報端末１０１に処理結果を送信す
る（１１１０）。When the processing of the application system 105 is completed, the speech synthesis system 106 converts the processing result executed by the application system 105 into synthetic speech (1108). The speech of the processing result synthesized by the speech synthesis system 106 is encoded by the communication system 103 (1109) and the processing result is transmitted to the personal information terminal 101 (1110).

【００９６】個人情報端末１０１は、音声応答サーバ１
０２から送信された処理結果の符号化音声データを受信
すると（１１１１）、受信した音声データを音声データ
記録部４０３に格納する（１１１２）。音声データを音
声データ記録部４０３に保存することによって、その保
存データが消去されないかぎり、使用者は好きなときに
その音声を再生し、聞くことができる。音声の再生は、
符号化された音声データをコーデック部４０４において
デコードし（１１１３）、Ｄ／Ａ変換部３０８を通し
て、スピーカ３０２より出力する（１１１４）。The personal information terminal 101 is the voice response server 1
When the encoded voice data of the processing result transmitted from 02 is received (1111), the received voice data is stored in the voice data recording unit 403 (1112). By storing the voice data in the voice data recording unit 403, the user can reproduce and listen to the voice at any time unless the stored data is deleted. The audio playback is
The coded audio data is decoded by the codec unit 404 (1113) and output from the speaker 302 through the D / A conversion unit 308 (1114).

【００９７】以上より、本実施例によれば、サーバから
符号化されて送信された音声データを符号化したまま音
声データ記録部に記憶することで、大量の音声データを
保存することができ、使用者が好きなときに音声データ
を再生することが可能となる。As described above, according to the present embodiment, a large amount of voice data can be stored by storing the voice data encoded and transmitted from the server in the voice data recording unit as it is, It becomes possible for the user to reproduce the voice data at any time.

【００９８】また、通信を開始する時点で、使用するコ
ードブックのＩＤ番号を指定することで、標準化された
単一のコードブックを使用する場合に問題となる盗聴の
問題を回避することができる。Further, by designating the ID number of the codebook to be used at the time of starting communication, it is possible to avoid the problem of eavesdropping which is a problem when a standardized single codebook is used. .

【００９９】次に、図７で説明したコードブックの登録
について説明する。コードブックの登録は、サーバ側に
あるコードブックをクライアント側である個人情報端末
１０１のコードブック格納部４０６に記憶する作業であ
る。Next, registration of the code book described with reference to FIG. 7 will be described. The codebook registration is an operation of storing the codebook on the server side in the codebook storage unit 406 of the personal information terminal 101 on the client side.

【０１００】図１２に、コードブックの登録シーケンス
の一実施例の流れ図を示す。図１２において、個人情報
端末１０１の使用者が登録コマンドによってコードブッ
クの登録を開始しようとすると（１２０１）、システム
は、使用者に対して暗証番号の入力を要求する（１２０
２）。暗証番号の要求は必ずしも必須ではないが、所有
者以外の第三者が勝手にコードブックの登録や削除を行
なうことを防止するために有効である。暗証番号が入力
されると（１２０３）、入力した暗証番号が正しいかど
うかをチェックする（１２０４）。FIG. 12 shows a flow chart of an embodiment of a codebook registration sequence. In FIG. 12, when the user of the personal information terminal 101 tries to start registration of the codebook by the registration command (1201), the system requests the user to input the personal identification number (120).
2). The request for the personal identification number is not necessarily required, but it is effective for preventing a third party other than the owner from arbitrarily registering or deleting the codebook. When the personal identification number is input (1203), it is checked whether the input personal identification number is correct (1204).

【０１０１】もし、入力した暗証番号が正しくない場合
には、処理を終了し、コードブックの登録作業は行なわ
ない。暗証番号が正しい場合には、登録するコードブッ
クのＩＤ番号を設定する（１２０５）。ＩＤ番号は、コ
ードブックをユニークに指定する番号を割り当てる必要
があるため、すでに登録してあるＩＤ番号と重複がない
ようにシステムが自動的に決定する。If the entered personal identification number is incorrect, the process is terminated and the codebook registration work is not performed. If the personal identification number is correct, the ID number of the codebook to be registered is set (1205). As the ID number needs to be assigned a number that uniquely specifies the codebook, the system automatically determines the ID number so that it does not overlap with the already registered ID number.

【０１０２】ＩＤ番号が決定すると、サーバ側から新た
なコードブックが送信される（１２０６）。個人情報端
末１０１は、送信されたコードブックを受信すると、コ
ードブック格納部８１２にそのコードブックを記録し
（１２０７）、コードブック管理情報に新規登録したコ
ードブックのＩＤと記録領域のアドレスとを追加する
（１２０８）。When the ID number is decided, a new codebook is transmitted from the server side (1206). Upon receiving the transmitted codebook, the personal information terminal 101 records the codebook in the codebook storage unit 812 (1207), and stores the ID of the newly registered codebook in the codebook management information and the address of the recording area. Add (1208).

【０１０３】以上の処理がすべて終了することで、新た
なコードブックの登録が完了する。The registration of a new codebook is completed when all the above processes are completed.

【０１０４】ところで、携帯型の個人情報端末は、所有
者の移動に伴ってどこへでも持ち運ぶことができる。こ
のことは、どこにいても使用できるといった利点がある
半面、紛失や盗難などにより他人に無断使用される危険
が伴うということである。もちろん、暗証番号を設定し
ておけば、このような不正使用を防止することは可能で
ある。しかし、所有者本人までもが使用する度に暗証番
号を入力するのは非常に煩わしい。By the way, the portable personal information terminal can be carried anywhere as the owner moves. This has the advantage that it can be used anywhere, but it also entails the risk of unauthorized use by others due to loss or theft. Of course, it is possible to prevent such unauthorized use by setting a personal identification number. However, it is very troublesome for the owner to enter the personal identification number each time it is used.

【０１０５】そこで、紛失時のセキュリティの保護のた
めにコードブックを使用する。すでに説明したように、
音声データ記録部４０３には、符号化された音声データ
が保存されているため、音声を再生するには、コーデッ
ク部４０４でデコードする必要がある。つまり、記録し
てある音声を消去しなくとも、コードブックさえ消去し
てしまえば、記録してある音声を再生することは不可能
である。このことを利用して、紛失や盗難などにより他
人に無断使用される危険を回避する。以下、具体的に説
明する。Therefore, a codebook is used to protect security when lost. As already mentioned,
Since the encoded audio data is stored in the audio data recording unit 403, in order to reproduce the audio, the codec unit 404 needs to decode the audio data. That is, even if the recorded voice is not erased, it is impossible to reproduce the recorded voice if the codebook is erased. By utilizing this, the risk of unauthorized use by others due to loss or theft is avoided. Hereinafter, a specific description will be given.

【０１０６】図１３は、この紛失時のセキュリティの保
護のためのコードブック消去シーケンスを示す。まず、
自分の個人情報端末１０１を紛失したことに気づいた所
有者は、音声応答サーバ１０２に自分の個人情報端末１
０１のコードブックを消去するように要求する（１３０
１）。もちろん、このサーバとの交信には、自分の個人
情報端末１０１は使えないため、サーバとの通信が可能
な他の端末装置を使って行なうことになる。また、音声
応答サーバ１０２に電話回線インターフェースを用意し
ておけば、電話を使ってアクセスすることも可能であ
る。FIG. 13 shows a codebook erasing sequence for protection of security when lost. First,
The owner who notices that his / her personal information terminal 101 has been lost is instructed by his / her own personal information terminal 1 in the voice response server 102.
Request to erase 01 codebook (130
1). Of course, since the personal information terminal 101 of one's own cannot be used for communication with this server, another terminal device capable of communicating with the server is used. Further, if a telephone line interface is prepared for the voice response server 102, it is possible to access using a telephone.

【０１０７】音声応答サーバ１０２は、コードブック消
去要求を受け取ると、コードブック登録シーケンスと同
様に、要求者に暗証番号の入力を求める（１３０２）。
要求者から暗証番号が入力されると（１３０３）、入力
した暗証番号が正しいかどうかチェックする（１３０
４）。もし、入力した暗証番号が正しくない場合には、
処理を終了し、コードブックの消去要求は却下される。
暗証番号が正しい場合には、音声応答サーバ１０２は、
消去先の個人情報端末１０１に対して消去信号を送信す
る（１３０５）。Upon receiving the codebook erasing request, the voice response server 102 asks the requester to input the personal identification number as in the codebook registration sequence (1302).
When the personal identification number is input from the requester (1303), it is checked whether the input personal identification number is correct (130).
4). If the entered PIN is incorrect,
The process is terminated and the codebook deletion request is rejected.
If the personal identification number is correct, the voice response server 102
An erasing signal is transmitted to the erasing destination personal information terminal 101 (1305).

【０１０８】消去信号を受信した個人情報端末１０１
は、現在の使用状態（電源オフ、通信中など）の如何に
かかわらず、コードブック格納部４０６に格納されてい
るコードブックを消去し（１３０６）、コードブック管
理情報にコードブックを消去したことを記録する（１３
０７）。Personal information terminal 101 that has received the erase signal
Deletes the codebook stored in the codebook storage unit 406 regardless of the current usage state (power off, communication in progress, etc.) (1306) and deletes the codebook in the codebook management information. Record (13
07).

【０１０９】以上のシーケンスによって、個人情報端末
１０１に記憶されていたコードブックの消去を完了す
る。以後、音声データ記録部４０３に記録してある音声
を再生することは不可能となる。もちろん、新たに音声
応答サーバとの通信も行なうことはできない。また、本
来の所有者が、消去したコードブックを登録し直せば
（暗証番号によるセキュリティチェックが必要とな
る）、再び、記録してある音声を再生することが可能と
なる。By the sequence described above, the erasing of the codebook stored in the personal information terminal 101 is completed. After that, it becomes impossible to reproduce the voice recorded in the voice data recording unit 403. Of course, it is not possible to newly communicate with the voice response server. Further, if the original owner re-registers the deleted codebook (the security check with the personal identification number is required), the recorded voice can be reproduced again.

【０１１０】以上より、本実施例によれば、個人情報端
末を紛失した際も、取得した第三者による不正使用を防
止することができる。As described above, according to this embodiment, even when the personal information terminal is lost, it is possible to prevent the unauthorized use by the acquired third party.

【０１１１】次に、個人情報端末１０１の第二の実施例
として、メモリ部３１６の一部にフラッシュメモリを用
いた場合について説明する。すでに説明したように、Ｄ
ＲＡＭやＳＲＡＭではデータの保存にバッテリが必要と
なるため、携帯を前提とした個人情報端末１０１にＤＲ
ＡＭやＳＲＡＭを用いてコードブックや音声データのよ
うな大量のデータを長時間保存することはできない。そ
こで、個人情報端末１０１のメモリ部の一部をフラッシ
ュメモリに代表される不揮発性メモリに置き換えれば、
バッテリバックアップの必要なしにコードブックや音声
データを長時間保持しておくことが可能となる。Next, as a second embodiment of the personal information terminal 101, a case where a flash memory is used as a part of the memory unit 316 will be described. As I already explained, D
Since a battery is required to store data in RAM or SRAM, DR is added to the personal information terminal 101 that is supposed to be portable.
It is not possible to store large amounts of data such as codebooks and voice data for a long time using AM or SRAM. Therefore, if a part of the memory unit of the personal information terminal 101 is replaced with a non-volatile memory represented by a flash memory,
It is possible to retain the codebook and voice data for a long time without the need for battery backup.

【０１１２】図１４は、メモリの一部をフラッシュメモ
リに置き換えたメモリ部の一実施例である。図１４にお
いて、１４０１はメモリコントローラ、１４０２はＲＯ
Ｍ、１４０３はＤＲＡＭ、１４０４はフラッシュメモ
リ、１４０５はアドレス線、１４０６はデータ線、１４
０７は制御信号線である。FIG. 14 shows an embodiment of a memory section in which a part of the memory is replaced with a flash memory. In FIG. 14, 1401 is a memory controller, and 1402 is RO.
M, 1403 is a DRAM, 1404 is a flash memory, 1405 is an address line, 1406 is a data line, 14
Reference numeral 07 is a control signal line.

【０１１３】図３と図１４と比較して明らかなように、
図１４のメモリ部は、図３のＲＡＭ３１８の一部をフラ
ッシュメモリ１４０４に置き換え、残りをＤＲＡＭ１４
０３としただけである。したがって、各部分の機能も図
３と同様である。ただし、フラッシュメモリでは、デー
タ書き込み時に必ず消去プロセスが伴い、その消去単位
がブロック単位（例えば512byte）であるといった制限
があるため、メモリコントローラ１４０１の制御が多少
異なる。As is clear from comparison between FIGS. 3 and 14,
In the memory section of FIG. 14, a part of the RAM 318 of FIG.
I just set 03. Therefore, the function of each part is the same as in FIG. However, in the flash memory, there is a limitation that an erasing process is always performed when writing data, and the erasing unit is a block unit (for example, 512 bytes). Therefore, the control of the memory controller 1401 is slightly different.

【０１１４】図１５は、コードブック格納部と音声記録
部とにフラッシュメモリを用いた場合のメモリマップの
一実施例である。FIG. 15 shows an embodiment of a memory map when a flash memory is used for the codebook storage section and the voice recording section.

【０１１５】フラッシュメモリには、コードブック格納
部４０６に格納するデータ（１５０１）と音声データ記
録部４０３に記憶する音声データ（１５０２）が記録さ
れている。既に説明したように、フラッシュメモリに
は、データの書き込みのために必ず消去プロセスが伴
い、その消去単位がブロック単位（例えば512byte）で
あるといった制限がある。したがって、データを更新し
たいブロック内に、更新しないデータが混在している
と、メモリ管理が複雑となる。Data (1501) stored in the codebook storage unit 406 and voice data (1502) stored in the voice data recording unit 403 are recorded in the flash memory. As described above, the flash memory is necessarily accompanied by an erasing process for writing data, and has a limitation that the erasing unit is a block unit (for example, 512 bytes). Therefore, if the data to be updated is mixed in the block to be updated, the memory management becomes complicated.

【０１１６】そこで、コードブックのサイズはすべて同
じ（本実施例では、（ｍ次元）×（Ｎ個）ワード）であ
ることを利用して、あらかじめコードブックを格納する
領域を固定し、この固定した領域をフラッシュメモリの
消去ブロックに合わせておく。例えば、図１６に示すメ
モリマップでは、コードブック１を格納するメモリ領域
をフラッシュメモリの消去単位のblock1からblock3に、
コードブック２を格納するメモリ領域をblock4からbloc
k6に固定してしまう。Therefore, by utilizing the fact that all codebooks have the same size ((m-dimensional) × (N) words in this embodiment), the area for storing the codebook is fixed in advance, and this fixed The created area is aligned with the erase block of the flash memory. For example, in the memory map shown in FIG. 16, the memory area for storing the codebook 1 is changed from block1 to block3 of the erase unit of the flash memory to
The memory area for storing codebook 2 is changed from block4 to bloc.
Fixed to k6.

【０１１７】図１６のように設定しておけば、コードブ
ック１を変更するためには、block1、block2、block3を
順次消去したあとに、block1からblock3に新たなコード
ブックを書き込むだけでよく、ブロック内に、消去する
データと残しておくデータとが混在する場合に比べ、メ
モリの管理が楽になる。With the setting as shown in FIG. 16, in order to change the codebook 1, it is sufficient to write a new codebook from block1 to block3 after sequentially erasing block1, block2 and block3. Memory management becomes easier compared to the case where data to be erased and data to be left are mixed in a block.

【０１１８】以上説明した第二の実施例を用いれば、コ
ードブックや音声データをバッテリバックアップなしに
長時間保存することがでるようになる。By using the second embodiment described above, the codebook and voice data can be stored for a long time without battery backup.

【０１１９】[0119]

【発明の効果】以上述べてきたように、本発明によれ
ば、サーバから符号化されて送信された音声データを符
号化したまま音声データ記録部に記憶することで、大量
の音声データを保存することができ、使用者が好きなと
きに音声データを再生することが可能となる。また、サ
ーバと個人情報端末装置との間で用いるコードブックを
特定する識別子を通信を開始する時点でやり取りして、
その識別子のコードブックを用いるようにしているの
で、盗聴の可能性を減少させることができる。さらに、
個人情報端末装置が盗難あるいは紛失したときなどに
は、消去信号を送ってコードブックをすべて消去できる
ので、第三者による不正使用を防止できる。As described above, according to the present invention, a large amount of voice data can be saved by storing the voice data encoded and transmitted from the server in the voice data recording unit in the encoded state. Therefore, it becomes possible for the user to reproduce the voice data at any time. Also, an identifier for identifying the codebook used between the server and the personal information terminal device is exchanged at the time of starting communication,
Since the codebook of the identifier is used, the possibility of eavesdropping can be reduced. further,
When the personal information terminal device is stolen or lost, an erasing signal can be sent to erase the entire codebook, so that unauthorized use by a third party can be prevented.

[Brief description of drawings]

【図１】クライアント・サーバ型音声応答システムの一
実施例を説明するための図である。FIG. 1 is a diagram for explaining an example of a client-server type voice response system.

【図２】個人情報端末の一実施例の外観図である。FIG. 2 is an external view of an embodiment of a personal information terminal.

【図３】個人情報端末の一実施例のハードウエア構成を
説明する図である。FIG. 3 is a diagram illustrating a hardware configuration of an embodiment of a personal information terminal.

【図４】個人情報端末の一実施例の機能を説明するため
のブロック図である。FIG. 4 is a block diagram for explaining a function of one embodiment of the personal information terminal.

【図５】コーデック部の一実施例を説明するための図で
ある。FIG. 5 is a diagram illustrating an example of a codec unit.

【図６】コードブックの一実施例を説明するための図で
ある。FIG. 6 is a diagram for explaining an example of a codebook.

【図７】コードブック格納部の一実施例を説明するため
の図である。FIG. 7 is a diagram for explaining an embodiment of a codebook storage unit.

【図８】通信システムの一実施例を説明するための図で
ある。FIG. 8 is a diagram for explaining an example of a communication system.

【図９】音声認識システムの一実施例を説明するための
図である。FIG. 9 is a diagram for explaining an example of a voice recognition system.

【図１０】音声合成システムの一実施例を説明するため
の図である。FIG. 10 is a diagram for explaining an example of a voice synthesis system.

【図１１】クライアント・サーバ型音声応答システムの
一実施例の処理シーケンスを説明するための図である。FIG. 11 is a diagram illustrating a processing sequence of an embodiment of the client / server type voice response system.

【図１２】コードブック登録シーケンスの一実施例を説
明するための図である。FIG. 12 is a diagram for explaining an example of a codebook registration sequence.

【図１３】コードブック消去シーケンスの一実施例を説
明するための図である。FIG. 13 is a diagram for explaining an example of a codebook erasing sequence.

【図１４】メモリ部の一実施例を説明するための図であ
る。FIG. 14 is a diagram illustrating an example of a memory unit.

【図１５】フラッシュメモリのメモリマップの一実施例
を説明するための図である。FIG. 15 is a diagram for explaining an example of a memory map of a flash memory.

【図１６】フラッシュメモリを用いたコードブック格納
部の一実施例を説明するための図である。FIG. 16 is a diagram for explaining an embodiment of a codebook storage unit using a flash memory.

[Explanation of symbols]

１０１…個人情報端末、１０２…音声応答サーバ、４０
１…制御部、４０２…入出力部、４０３…音声データ記
録部、４０４…コーデック部、４０５…コードブック管
理部、４０６…コードブック格納部、４０７…プログラ
ム格納部、４０８…データ格納部。101 ... Personal information terminal, 102 ... Voice response server, 40
1 ... Control part, 402 ... Input / output part, 403 ... Audio data recording part, 404 ... Codec part, 405 ... Codebook management part, 406 ... Codebook storage part, 407 ... Program storage part, 408 ... Data storage part.

Claims

[Claims]

1. A personal information terminal device, which transmits a command designating a process requested to a server and receives voice data encoded as a response message from the server, wherein data is exchanged with the server. Communication means for transmitting and receiving, voice data recording means for storing encoded voice data received from the server, code vectors that are vectors representing the feature amount of voice represented as a multidimensional vector, and respective codes Codebook storage means for storing a codebook composed of a codeword that is a one-dimensional value corresponding to a vector, and encoding and decoding speech using the codebook stored in the codebook storage means. And a codec unit for converting the encoded voice data received from the server into the voice data. It is stored in the recording means, when necessary, a personal digital assistant device, characterized in that to reproduce and output decoded audio data by using the codebook stored in the codebook storage unit by the codec unit.

2. A personal information terminal device for transmitting a command designating a process requested to a server and receiving encoded voice data transmitted as a response message from the server, the personal information terminal device communicating with the server. Communication means for transmitting and receiving data, voice data recording means for storing the encoded voice data received from the server, and a code vector which is a vector representing a feature amount of voice represented as a multidimensional vector, respectively. Using a codebook storing means for storing one or more codebooks each including a codeword that is a one-dimensional value corresponding to the code vector of: and a codebook stored in the codebook storing means. Codec means for encoding and decoding voice, and a codebook stored in the codebook storage means A codebook management unit that manages using an identifier that identifies each codebook, and when transmitting a command from the personal information terminal device to the server, transmits the identifier of the codebook to be used to the server, The server encodes the voice data using the codebook of the designated identifier and transmits it as a response message, and the personal information terminal device receiving the encoded voice data encodes the voice data received from the server. The personal information terminal device, wherein the voice data is temporarily stored in the voice data recording means, and when necessary, the voice data is decoded and reproduced and output using a codebook of the designated identifier.

3. The personal information terminal device encodes voice data representing the command using the voice code data by means of inputting a voice data command for designating a process requested to the server, and transmits the voice data to the server. Means to
The personal information terminal device according to claim 1 or 2, further comprising:

4. The personal information terminal device according to claim 1, wherein a nonvolatile memory is used as the voice data recording means.

5. The personal information terminal device according to claim 4, wherein a flash memory is used as the voice data recording means.

6. The personal information terminal device according to claim 1, wherein a nonvolatile memory is used as the codebook storage means.

7. The personal information terminal device according to claim 6, wherein a flash memory is used as the codebook storage means.

8. The personal information terminal device according to claim 7, wherein in the codebook storage means, a memory area for recording each codebook is preset according to an erasing unit of the flash memory.

9. The personal information terminal device according to claim 2, wherein the codebook management means has a function of receiving the codebook from the server and storing the received codebook in the codebook storage means.

10. The personal information terminal device according to claim 2, wherein said codebook managing means has a function of erasing the codebook recorded in said codebook storing means when receiving the codebook erasing signal.

11. The codebook management means has a function of permitting modification of the contents of the codebook stored in the codebook storage means only when a correct personal identification number is input. The personal information terminal device according to 2.

12. A server and a personal information terminal device, wherein a command is transmitted from the personal information terminal device to the server to request various processing, and the server performs processing according to the command and displays the result. In the voice response system, which transmits the encoded voice data that is represented as a response message to the personal information terminal device, and the personal information terminal device receives and reproduces the encoded voice data, wherein the server is the individual Receiving means for receiving the command transmitted from the information terminal device, application processing means for performing processing according to the command transmitted from the personal information terminal device, and outputting the processing result, and from the processing result of the application processing means A response sentence generating means for generating a response sentence, and voice synthesis for synthesizing a voice according to the generated response sentence And a transmitting means for encoding the synthesized voice and transmitting the encoded voice to the personal information terminal device, the personal information terminal device transmitting means for transmitting a command to the server, and a code transmitted from the server. Receiving means for receiving the encoded voice data, voice data recording means for storing the received encoded voice data, and a code vector which is a vector representing the feature quantity of the voice represented as a multidimensional vector Codebook storage means for storing a codebook composed of a codeword, which is a one-dimensional value corresponding to a code vector, and encoding and decoding speech using the codebook stored in the codebook storage means. The personal information terminal device, the encoded voice data received from the server. A voice characterized in that the voice data is once stored in the voice data recording means, and when necessary, the voice data is decoded and reproduced and output by the codec means using the codebook stored in the codebook storage means. Response system.

13. A server and a personal information terminal device, wherein the personal information terminal device sends a command to the server to request various processing, and the server performs processing according to the command and displays the result. In the voice response system, which transmits the encoded voice data represented as a response message to the personal information terminal device, and the personal information terminal device receives and reproduces the encoded voice data, wherein the personal information terminal device is , Store one or more codebooks each composed of a code vector that is a vector representing a feature amount of speech represented as a multidimensional vector and a codeword that is a one-dimensional value corresponding to each code vector Codebook storing means and coded speech using the codebook stored in the codebook storing means Codebook managing means for managing the codebook means for decoding, and the codebook stored in the codebook storing means, using an identifier for identifying each codebook, and transmitting a command to the server, and At this time, transmitting means for transmitting the identifier of the codebook to be used, receiving means for receiving the encoded audio data transmitted from the server, and audio data recording means for storing the received encoded audio data. The server includes a receiving unit that receives a command and a codebook identifier transmitted from the personal information terminal device, and performs processing according to the command transmitted from the personal information terminal device, and outputs a processing result. Generating a response sentence from the application processing unit that performs the processing and the processing result of the application processing unit Response sentence generating means, a voice synthesizing means for synthesizing a voice in accordance with the generated response sentence, and the synthesized voice is encoded by using a codebook having an identifier designated by the personal information terminal device to A transmission means for transmitting to the information terminal device, and at the time of starting communication, the personal information terminal device exchanges the identifiers of the codebooks used by the server and makes the codebooks used coincide with each other. Voice response system.

14. The voice response system according to claim 13, wherein the command transmitted from the personal information terminal device to the server is a voice command.

15. The voice response system according to claim 13, wherein a flash memory is used as the voice data recording means.

16. The voice response system according to claim 13, wherein a flash memory is used as the codebook storage means.