JP2002033851A

JP2002033851A - Mobile phone

Info

Publication number: JP2002033851A
Application number: JP2000213313A
Authority: JP
Inventors: Takashi Matsumura; 隆司松村; Katsuhiko Shimizu; 克彦清水
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2000-07-13
Filing date: 2000-07-13
Publication date: 2002-01-31

Abstract

PROBLEM TO BE SOLVED: To provide a mobile phone with user-friendliness close to that of a voice speech by a conventional line switching by reducing the consumed amount of channel resources and charging on users. SOLUTION: A voice recognition section 12 recognizes a voice received from a microphone 11 as to whether or not a voice is a voice of a specific talker, and converts the result of speech recognition into character data, a packet assembly section 15 assembles the character data into packets and transmits the packets to a designated speech opposite party, a vice coding section 13 converts the character data and part of failed recognition into coded voice data when the speech recognition is failed, the packet assembly section 15 assembles the character data the converted coded voice into packets and transmits the packets to the designated speech opposite party, a voice read section 22 reproduces character data sent from a specific speech opposite party through packet transmission into a voice signal on the other hand and a voice decoding reproduction section 23 properly decodes and reproduces the coded voice data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は携帯電話機に係り、
特にパケット交換によるデータ通信に対応した携帯電話
機に関する。尚、本明細書において携帯電話機とは、所
謂、携帯電話機のみならず、ＰＨＳ（Personal Handyph
one System）等も含むものとする。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a mobile phone,
In particular, the present invention relates to a mobile phone that supports data communication by packet switching. In this specification, a mobile phone means not only a so-called mobile phone but also a PHS (Personal Handyph).
one System).

【０００２】[0002]

【従来の技術】従来の電話網が音声のみの回線交換によ
るネットワークからデータ通信を融合したパケット交換
網に変化しつつある近年、携帯電話機に要求される機能
も単なる音声通話端末から文字伝送、データ伝送等も行
うマルチメディア端末へと多機能化・高機能化する方向
にある。同時に携帯電話機に搭載されるＣＰＵやＤＳＰ
も高性能化し、さらに電話機能以外のアプリケーション
をまかなうためのＣＰＵ、ＤＳＰ等も追加される傾向に
あり、従来演算量や消費電力的に現実的ではなかった画
像の圧縮・伸張、音声認識等の機能も現実のものとなり
つつある。2. Description of the Related Art In recent years, the conventional telephone network is changing from a network using voice-only circuit switching to a packet-switching network integrating data communication. There is a trend toward multi-functionality and high functionality to multimedia terminals that also perform transmission and the like. At the same time CPU and DSP mounted on mobile phones
CPUs and DSPs for applications other than telephone functions also tend to be added, such as image compression / decompression, voice recognition, etc., which were not realistic in terms of calculation amount and power consumption. Features are also becoming reality.

【０００３】その一方で従来の音声端末としての機能は
依然として携帯電話機の機能として重要なものでありな
がら、あまり大きな変化はなく、その使い勝手には改善
の余地がある。回線を占有した上で時間により課金され
る従来の回線交換網と比較したパケット交換網の特徴と
して、伝送したデータ量に応じた課金がなされ、接続し
ている時間の長さに関わらず使用したデータ伝送量のみ
について課金されるという点が挙げられる。これによ
り、情報サービス等に接続したままで必要な情報だけを
随時閲覧するような、間欠的にデータ伝送が発生するよ
うなアプリケーションにおいてはいちいちデータ要求が
発生するたびに接続し、通信が終わったら即座に切断す
るというような手間がなくなり回線資源の有効利用やユ
ーザへの課金の低減に効果を見せている。On the other hand, the function as a conventional voice terminal is still important as a function of a portable telephone, but there is not much change, and there is room for improvement in usability. As a feature of the packet switching network compared to the conventional circuit switching network, which charges based on time after occupying the line, charging according to the amount of transmitted data is made and used regardless of the length of connection time The point is that only the data transmission amount is charged. With this, in applications where data transmission occurs intermittently, such as browsing only necessary information at any time while connected to an information service, etc., connection is made every time a data request is generated, and communication is terminated. This eliminates the trouble of immediately disconnecting, and is effective in effectively utilizing line resources and reducing user billing.

【０００４】しかし音声通話においては例えば長時間、
回線に接続したままにした場合、時間により課金されユ
ーザの金銭的負担は大きなものとなり、また必要なつど
発呼、通話して切断を繰り返すのは発呼・切断操作の手
間や接続のたびに移動局と基地局、相手の移動局等が相
互に接続手順を行うための時間を要するという問題点は
改善されないままであった。電話回線を長時間、接続し
たままにする用途としては、介護用途や業務用連絡手段
等、通話の即時性が重要である状態が長時間連続するよ
うな用途が想定される。However, in a voice call, for example, for a long time,
If you leave the line connected, you will be charged according to time and the user's financial burden will be great, and it will be necessary to make calls, make calls and repeat disconnection whenever necessary, every time it takes trouble to call and disconnect or every time you connect The problem that it takes time for the mobile station and the base station, the partner mobile station, and the like to perform the connection procedure with each other has not been improved. Examples of applications in which the telephone line is kept connected for a long time include applications in which the state in which the immediacy of a call is important continues for a long period of time, such as nursing care use or business contact means.

【０００５】ところで、携帯電話機に音声認識機能を備
えたものとして、特開平１１−１６８５５２号公報、特
開平１０−２４０２８３号公報に記載のものがある。す
なわち、特開平１１−１６８５５２号公報には通話内容
をメモリに記録する際に音声認識を用いて音声データを
テキストデータに変換してメモリに記録することにより
記録時間を拡大する携帯電話装置が提案されている。ま
た、特開平１０−２４０２８３号公報には、音声通話に
おいて、周囲雑音の低減を目的として、送話段で話者の
音声を音声認識した後、音声合成することにより、周囲
雑音を除去したうえで送信する（伝送内容は音声データ
ないし音声符号化データのまま）音声処理装置及び電話
装置が提案されている。[0005] Japanese Patent Application Laid-Open Nos. 11-168552 and 10-240283 disclose a portable telephone having a voice recognition function. In other words, Japanese Patent Application Laid-Open No. 11-168552 proposes a mobile phone device that converts voice data into text data using voice recognition and records the data in a memory when the content of a call is stored in a memory, thereby extending the recording time. Have been. Japanese Patent Application Laid-Open No. Hei 10-240283 discloses that, in a voice call, in order to reduce the ambient noise, the voice of the speaker is recognized at the transmission stage and then the voice is synthesized to remove the ambient noise. (A transmission content is voice data or voice coded data), and a voice processing device and a telephone device have been proposed.

【０００６】[0006]

【発明が解決しようとする課題】特開平１０−２４０２
８３号公報に記載の音声処理装置及び電話装置において
は音声を音声認識手段により一旦、文字情報化している
が、音声合成により再度音声信号に戻した上で回線交換
網により伝送しているため、回線資源の使用量及び課金
の低減には寄与していない。また、特開平１１−１６８
５５２号公報に記載の携帯電話装置では、伝送を目的と
してではなく記録（録音）されるデータ量を削減する目
的で音声認識を使用しているが、この場合近端の話者の
音声だけではなく通話相手の音声も音声認識する必要が
あり、計算量、認識率で一般に不利とされる不特定話者
認識を行う必要があり、また現在のところ100％ではな
い認識率の問題も考慮されていない。Problems to be Solved by the Invention
In the voice processing device and the telephone device described in Japanese Patent Publication No. 83, the voice is once converted into character information by voice recognition means, but is converted back to a voice signal by voice synthesis and transmitted through a circuit switching network. It does not contribute to a reduction in the usage of line resources and charging. Also, JP-A-11-168
In the portable telephone device described in Japanese Patent Application Publication No. 552, speech recognition is used not for transmission but for the purpose of reducing the amount of data to be recorded (recorded). In this case, only the voice of the near-end speaker is used. It is also necessary to recognize the other party's voice, and it is necessary to perform unspecified speaker recognition, which is generally disadvantageous in terms of calculation amount and recognition rate, and also consider the problem of recognition rate which is not 100% at present. Not.

【０００７】また、符号化した音声をパケット伝送する
方法も、音声の存在しない区間では回線資源を使用しな
い（回線交換であれば、例えばｃｄｍａＯｎｅの可変
レート音声符号化方式ＥＶＲＣでは音声のない区間でも
最低レート１／8での伝送は行われる）ため回線資源の
使用量の点では改善の可能性があるが、伝送される符号
化音声のデータ量は大きく変わらないためその効果は小
さく、また現在の回線交換の時間課金（例えば１０秒/
１０円）とパケット交換の情報量課金（例えば１２８オ
クテット/０．１円）の体系では課金低減の効果はほと
んどないか、逆に高価になる可能性もある。[0007] Also, the method of transmitting coded voice packets does not use line resources in sections where voice is not present (for circuit switching, for example, in the cdma One variable-rate voice coding scheme EVRC, there is no voice section). However, transmission is performed at the lowest rate of 1/8), so there is a possibility of improvement in the amount of line resources used, but the effect is small because the amount of coded voice data to be transmitted does not change much. Current circuit switching time billing (eg 10 seconds /
In the system of charging the information amount of packet switching (10 yen) (for example, 128 octets / 0.1 yen), there is little effect of reducing the charging, or conversely, it may be expensive.

【０００８】例えば、カタカナで１００文字程度の文章
を１０秒程度で息継ぎ無く読み上げることを想定する
と、回線交換では１０秒で１０円、パケットでは８００
０ｂｐｓ×１０秒／（１２８オクテット×８ｂｉｔ/オ
クテット）×０．１円＝７．８円、但しパケットの場合
は送信、受信とも課金されるためその倍の１５．６円と
いう計算になる。文字を１文字/２ｂｙｔｅで送ったと
したら１００文字で１００文字×２ｂｙｔｅ×８ｂｉｔ
/ｂｙｔｅ／（１２８オクテット×８ｂｉｔ/オクテッ
ト）×０．１円＝０．１６円と格段に安い課金となる。
本発明はこのような事情に鑑みてなされたものであり、
回線資源使用量およびユーザへの課金の低減を図ると共
に、従来の回線交換による音声通話に近い使い勝手の携
帯電話機を提供することを第１の目的とする。また本発
明は、利便性の向上を図った携帯電話機を提供すること
を第２の目的とする。For example, assuming that a sentence of about 100 characters in katakana is read out without breathing in about 10 seconds, 10 yen for 10 seconds for circuit switching and 800 yen for a packet.
0 bps × 10 seconds / (128 octets × 8 bits / octet) × 0.1 yen = 7.8 yen However, in the case of a packet, both transmission and reception are charged, so the calculation is twice as much as 15.6 yen. If characters are sent in 1 character / 2 bytes, 100 characters x 100 characters x 2 bytes x 8 bits
/ byte / (128 octets × 8 bits / octet) × 0.1 yen = 0.16 yen, which is a remarkably low charge.
The present invention has been made in view of such circumstances,
It is a first object of the present invention to provide a portable telephone which can reduce the amount of line resources used and the charge to a user, and which is easy to use and is close to a voice call by conventional line switching. It is a second object of the present invention to provide a mobile phone with improved convenience.

【０００９】[0009]

【課題を解決するための手段】上記第１の目的を達成す
るために、請求項１に記載の発明は、パケット伝送に対
応した携帯電話機であって、通話時に送話段の音声を特
定話者音声認識し、該音声認識結果を文字データ化し、
該文字データを指定の通話相手にパケット伝送し、また
前記音声認識において音声認識が失敗した場合には前記
文字データと、前記認識失敗部分を符号化音声データに
変換し、該変換された符号化音声データとを指定の通話
相手にパケット伝送し、一方、特定の通話相手からパケ
ット伝送により送信された文字データの読み上げ及び符
号化音声データの復号かつ再生を適宜、行うことを特徴
とする。According to a first aspect of the present invention, there is provided a portable telephone compatible with packet transmission, wherein a voice of a transmitting stage is transmitted during a telephone call. Person voice recognition, the voice recognition result is converted into character data,
The character data is packet-transmitted to a designated communication partner. If the voice recognition fails in the voice recognition, the character data and the recognition failure portion are converted into coded voice data. It is characterized in that voice data and packets are transmitted to a designated communication partner, while reading of character data transmitted from a specific communication partner by packet transmission and decoding and reproduction of encoded voice data are performed as appropriate.

【００１０】請求項１に記載の発明によれば、通話時に
送話段の音声を特定話者音声認識し、該音声認識結果を
文字データ化し、該文字データを指定の通話相手にパケ
ット伝送し、また前記音声認識において音声認識が失敗
した場合には前記文字データと、前記認識失敗部分を符
号化音声データに変換し、該変換された符号化音声デー
タとを指定の通話相手にパケット伝送し、一方、特定の
通話相手からパケット伝送により送信された文字データ
の読み上げ及び符号化音声データの復号かつ再生を適
宜、行うようにしたので、回線資源使用量およびユーザ
への課金の低減を図ることができ、従来の回線交換によ
る音声通話に近い使い勝手の携帯電話機を実現できる。According to the first aspect of the present invention, at the time of a telephone call, the voice at the transmitting stage is recognized as a specific speaker's voice, the voice recognition result is converted into character data, and the character data is transmitted to a designated communication partner by packet transmission. If the speech recognition fails in the speech recognition, the character data and the recognition failure portion are converted into encoded speech data, and the converted encoded speech data is packet-transmitted to a designated communication partner. On the other hand, the reading of character data transmitted from a specific communication partner by packet transmission and the decoding and reproduction of encoded voice data are performed as appropriate, thereby reducing the amount of line resources used and the charge to the user. Thus, it is possible to realize a mobile phone that is as easy to use as a voice call by conventional circuit switching.

【００１１】上記第２の目的を達成するために、請求項
２に記載の発明は、請求項１に記載の携帯電話機におい
て、パケット伝送により送受信した文字データを画面上
に表示することを特徴とする。According to a second aspect of the present invention, in the portable telephone according to the first aspect, character data transmitted and received by packet transmission is displayed on a screen. I do.

【００１２】請求項２に記載の発明によれば、請求項１
に記載の携帯電話機において、パケット伝送により送受
信した文字データを画面上に表示するようにしたので、
通話内容の確認が容易であり、利便性の向上が図れる。According to the invention described in claim 2, according to claim 1,
In the mobile phone described in the above, character data transmitted and received by packet transmission is displayed on the screen,
It is easy to check the contents of the call, and the convenience can be improved.

【００１３】上記第１の目的を達成するために、請求項
３に記載の発明は、送信する文字データまたは符号化音
声データに通話相手先のアドレスを付加してパケットに
変換しパケット交換網に送出するパケット化手段と、受
信したパケットを文字データまたは符号化音声データに
復元するパケット復元手段とを有する携帯電話機であっ
て、音声通話時に送話段において入力された音声信号を
音声認識し、文字データに変換する音声認識手段と、前
記音声認識手段において音声認識に失敗した部分におけ
る音声信号を符号化音声データに変換する音声符号化手
段と、前記音声認識手段から出力される文字データと前
記音声符号化手段より出力される符号化音声データを選
択的に前記パケット化手段に出力する選択手段と、前記
パケット復元手段により復元されたデータの種別に応じ
て該データを音声読上手段または音声復号化・再生手段
に振り分けるデータ振分手段と、前記データ振分手段よ
り入力された文字データを音声として読上げる音声読上
手段と、前記データ振分手段より入力された符号化音声
データを復号化し、再生する音声復号化・再生手段と、
前記音声読上手段から出力される音声信号と前記音声復
号化・再生手段から出力される音声信号とを合成して音
声出力手段に出力する合成手段とを有することを特徴と
する。According to a third aspect of the present invention, there is provided a packet switching network in which character data or coded voice data to be transmitted is converted into a packet by adding an address of a communication partner to the packet switching network. A mobile phone having packetizing means for transmitting and packet restoring means for restoring a received packet to character data or encoded voice data, wherein the mobile phone recognizes a voice signal input at a transmitting stage during a voice call, Voice recognition means for converting to character data, voice coding means for converting a voice signal in a portion where voice recognition failed in the voice recognition means into coded voice data, character data output from the voice recognition means, and Selecting means for selectively outputting coded voice data output from the voice coding means to the packetizing means; A data distributing means for distributing the data to a voice reading means or a voice decoding / reproducing means in accordance with the type of the restored data; and a voice reading for reading the character data input from the data distributing means as voice. Upper means, audio decoding / reproducing means for decoding and reproducing the encoded audio data inputted from the data distribution means,
A speech signal output from the voice reading means and a voice signal output from the voice decoding / reproducing means;

【００１４】請求項３に記載の発明によれば、音声通話
時に送話段において入力された音声信号を音声認識し、
文字データに変換する音声認識手段と、前記音声認識手
段において音声認識に失敗した部分における音声信号を
符号化音声データに変換する音声符号化手段と、前記音
声認識手段から出力される文字データと前記音声符号化
手段より出力される符号化音声データを選択的に前記パ
ケット化手段に出力する選択手段と、前記パケット復元
手段により復元されたデータの種別に応じて該データを
音声読上手段または音声復号化・再生手段に振り分ける
データ振分手段と、前記データ振分手段より入力された
文字データを音声として読上げる音声読上手段と、前記
データ振分手段より入力された符号化音声データを復号
化し、再生する音声復号化・再生手段と、前記音声読上
手段から出力される音声信号と前記音声復号化・再生手
段から出力される音声信号とを合成して音声出力手段に
出力する合成手段とを有するので、回線資源使用量およ
びユーザへの課金の低減を図ることができ、従来の回線
交換による音声通話に近い使い勝手の携帯電話機を実現
できる。According to the third aspect of the present invention, the voice signal input at the transmitting stage during voice communication is recognized by voice,
Voice recognition means for converting to character data, voice coding means for converting a voice signal in a portion where voice recognition failed in the voice recognition means into coded voice data, character data output from the voice recognition means, and Selecting means for selectively outputting coded voice data output from the voice coding means to the packetizing means; and reading out the data according to the type of data restored by the packet restoring means or voice reading means. Data distributing means for distributing the data to the decoding / reproducing means, voice reading means for reading out the character data inputted from the data distributing means as voice, and decoding of the encoded voice data inputted from the data distributing means. Audio decoding / reproducing means for converting and reproducing, an audio signal output from the audio reading means and an audio signal output from the audio decoding / reproducing means Since it has a synthesizing means for synthesizing a voice signal and outputting it to a voice output means, it is possible to reduce the amount of line resources used and the charge to the user, and it is easy to use a portable telephone which is close to a conventional voice call by line switching Can be realized.

【００１５】また、請求項４に記載の発明は、請求項３
に記載の携帯電話機において、パケット伝送により送受
信した文字データを画面上に表示する表示手段を有する
ことを特徴とする。The invention described in claim 4 is the same as the claim 3.
The portable telephone described in (1), further comprising display means for displaying character data transmitted and received by packet transmission on a screen.

【００１６】請求項４に記載の発明によれば、請求項３
に記載の携帯電話機において、パケット伝送により送受
信した文字データを画面上に表示する表示手段を有する
ので、通話内容の確認が容易であり、利便性の向上が図
れる。According to the invention set forth in claim 4, according to claim 3,
Since the mobile phone described in (1) has display means for displaying character data transmitted and received by packet transmission on the screen, it is easy to check the contents of the call and to improve the convenience.

【００１７】また、請求項５に記載の発明は、請求項３
または４のいずれかに記載の携帯電話機において、前記
音声認識手段は、特定の利用者の音声を予め学習させた
特定話者音声認識手段であることを特徴とする。The invention described in claim 5 is the third invention.
5. The mobile phone according to claim 4, wherein the voice recognition unit is a specific speaker voice recognition unit in which the voice of a specific user is learned in advance.

【００１８】請求項５に記載の発明によれば、請求項３
または４のいずれかに記載の携帯電話機において、音声
認識手段として、特定の利用者の音声を予め学習させた
特定話者音声認識手段を用いるようにしたので、音声の
認識率が高く、特定話者音声認識手段を実現するために
必要になるＣＰＵまたはＤＳＰの演算量、演算性能を低
く抑えることができる。また、音声認識に失敗した部分
については符号化音声データとして伝送するので、音声
認識率が１００％でなくても会話内容の欠落を抑制し、
現実的な音声認識性能でも実用的な音声通話を実現でき
る。According to the invention described in claim 5, according to claim 3,
In the mobile phone according to any one of the first to fourth aspects, the specific speaker voice recognizing means in which a specific user's voice has been learned in advance is used as the voice recognizing means. It is possible to reduce the amount of computation and computational performance of the CPU or DSP required to realize the user voice recognition means. In addition, since the portion where the speech recognition has failed is transmitted as coded speech data, even if the speech recognition rate is not 100%, the loss of the conversation content is suppressed,
Practical voice calls can be realized even with realistic voice recognition performance.

【００１９】上記第２の目的を達成するために、請求項
６に記載の発明は、請求項４または５のいずれかに記載
の携帯電話機において、更に、送話段において、文字デ
ータを入力する文字入力手段を有し、前記選択手段は、
前記文字入力手段より出力される文字データと、前記音
声認識手段から出力される文字データと、前記音声符号
化手段より出力される符号化音声データとのうちいずれ
かのデータを選択的に前記パケット化手段に出力するこ
とを特徴とする。According to a sixth aspect of the present invention, there is provided a portable telephone according to the fourth or fifth aspect, further comprising inputting character data in a transmission stage. It has character input means, and the selecting means,
The packet selectively selects any one of character data output from the character input unit, character data output from the voice recognition unit, and coded voice data output from the voice coding unit. Output to the conversion means.

【００２０】請求項６に記載の発明によれば、請求項６
に記載の発明は、請求項４または５のいずれかに記載の
携帯電話機において、更に、送話段において、文字デー
タを入力する文字入力手段を有し、前記選択手段は、前
記文字入力手段より出力される文字データと、前記音声
認識手段から出力される文字データと、前記音声符号化
手段より出力される符号化音声データとのうちいずれか
のデータを選択的に前記パケット化手段に出力するよう
にしたので、通話内容を文字化して文字データを画面に
表示すると同時に、文字入力手段により文字データを入
力することができ、それ故発声や聴覚に障害がある人で
も音声による通話を行う相手と通話することが可能とな
る。According to the invention described in claim 6, according to claim 6 of the present invention.
The portable telephone according to any one of claims 4 and 5, further comprising a character input unit for inputting character data in a transmission stage, wherein the selection unit is provided with a character input unit. Any one of the output character data, the character data output from the voice recognition unit, and the encoded voice data output from the voice encoding unit is selectively output to the packetizing unit. As a result, the contents of the call can be transcribed and the character data can be displayed on the screen, and at the same time, the character data can be input by the character input means. It becomes possible to talk with.

【００２１】また、請求項７に記載の発明は、請求項６
に記載の携帯電話機において、更に、パケット伝送によ
り送受信する文字データを記憶する記憶手段と、各種の
動作を指定する操作手段を有し、前記操作手段の操作に
より前記記憶手段に記憶されている文字データを読み出
し、前記表示手段に表示することを特徴とする。The invention described in claim 7 is the same as the claim 6.
The mobile phone according to claim 1, further comprising a storage unit for storing character data to be transmitted and received by packet transmission, and an operation unit for designating various operations, and the character stored in the storage unit by operation of the operation unit. Data is read out and displayed on the display means.

【００２２】請求項７に記載の発明によれば、請求項６
に記載の携帯電話機において、更に、パケット伝送によ
り送受信する文字データを記憶する記憶手段と、各種の
動作を指定する操作手段を有し、前記操作手段の操作に
より前記記憶手段に記憶されている文字データを読み出
し、前記表示手段に表示するようにしたので、通話内容
の記録を符号化音声データに比較してデータ量の少ない
文字情報により行えるため、少ない記憶容量の記憶手段
で長時間の会話内容を記録でき、かつ音声として再生す
ることができる。According to the invention of claim 7, according to claim 6,
The mobile phone according to claim 1, further comprising a storage unit for storing character data to be transmitted and received by packet transmission, and an operation unit for designating various operations, and the character stored in the storage unit by operation of the operation unit. Since the data is read out and displayed on the display means, the contents of the conversation can be recorded by using character information having a smaller data amount as compared with the encoded voice data. Can be recorded and reproduced as audio.

【００２３】また、請求項８に記載の発明は、請求項７
に記載の携帯電話機において、前記記憶手段は、パケッ
ト伝送により送受信される文字データに加えて、符号化
音声データを記憶し、前記操作手段の操作により前記記
憶手段に記憶されている前記文字データまたは前記符号
化音声データを読み出し、前記音声読上手段または前記
音声復号化・再生手段により音声信号として再生するこ
とを特徴とする。The invention described in claim 8 is the same as the invention described in claim 7.
In the mobile phone described in the above, the storage means stores, in addition to the character data transmitted and received by packet transmission, encoded voice data, the character data or the character data stored in the storage means by the operation of the operation means The encoded audio data is read and reproduced as an audio signal by the audio reading means or the audio decoding / reproducing means.

【００２４】請求項８に記載の発明によれば、記憶手段
に文字データに加えて、符号化音声データを記憶させ、
操作手段の操作によりこれらのデータを読み出して音声
読上手段または前記音声復号化・再生手段により音声信
号として再生することにより、文字情報を画面で確認す
るだけでなく、音声として聴き取ることができる。According to the present invention, in addition to the character data, the encoded voice data is stored in the storage means,
By reading these data by operating the operation means and reproducing the data as a voice signal by the voice reading means or the voice decoding / reproducing means, it is possible not only to confirm the character information on the screen but also to listen to the voice as voice. .

【００２５】[0025]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。本発明の実施の形態に係る携帯電
話機の構成を図１に示す。同図において、本実施の形態
に係る携帯電話機は、パケット伝送に対応したものであ
り、アンテナ１と、アンテナ１を介して受信した信号を
ベースバンド信号に復調し、またはベースバンド信号に
より搬送波を変調しアンテナより送信するＲＦ部２と、
ベースバンド処理部３とを有している。ベースバンド処
理部３は、パケット化部１５から出力されるパケットを
ベースバンド信号に変換処理しＲＦ部２に出力し、ある
いはＲＦ部２より受け取ったベースバンド信号からパケ
ットを抽出し、パケット復元部２０に出力する。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 shows a configuration of a mobile phone according to an embodiment of the present invention. In the figure, the mobile phone according to the present embodiment is compatible with packet transmission, and demodulates an antenna 1 and a signal received via antenna 1 into a baseband signal, or converts a carrier wave using the baseband signal. An RF unit 2 for modulating and transmitting from an antenna;
And a baseband processing unit 3. The baseband processing unit 3 converts the packet output from the packetizing unit 15 into a baseband signal and outputs the baseband signal to the RF unit 2, or extracts a packet from the baseband signal received from the RF unit 2, and 20.

【００２６】また、本実施の形態に係る携帯電話機は、
入力部１０と、マイク１１と、音声認識部１２と、音声
符号化部１３と、セレクタ１４とを有している。入力部
１０は、通話時に使用する開始キー、通話を終了する時
に使用する終了キー、数値キー（文字キーを兼用す
る。）及びコードキー等からなるテンキー、各種機能を
設定する際に使用する機能キー、電源キー、各種設定を
解除するためのクリアキー等を有している。入力部１０
は、本発明の文字入力手段および操作手段に相当し、文
字データの入力や各種操作に使用される。Further, the mobile phone according to the present embodiment
It has an input unit 10, a microphone 11, a speech recognition unit 12, a speech encoding unit 13, and a selector 14. The input unit 10 includes a start key used at the time of a call, an end key used at the end of the call, a numeric keypad including a numeric key (also used as a character key) and a code key, and a function used at the time of setting various functions. It has a key, a power key, a clear key for releasing various settings, and the like. Input unit 10
Is equivalent to the character input means and the operation means of the present invention, and is used for input of character data and various operations.

【００２７】また、音声認識部１２は、音声通話時に送
話段を構成するマイク１１から入力された音声信号を音
声認識し、文字データに変換する機能を有する。音声認
識部１２は、本実施の形態では、特定の利用者の音声を
予め学習させた特定話者音声認識手段が使用される。音
声認識部１２は本発明の音声認識手段に相当する。The voice recognition unit 12 has a function of recognizing voice of a voice signal input from the microphone 11 constituting a transmitting stage during voice communication and converting the voice signal into character data. In the present embodiment, the voice recognition unit 12 uses a specific speaker voice recognition unit that has learned the voice of a specific user in advance. The voice recognition unit 12 corresponds to a voice recognition unit of the present invention.

【００２８】音声認識の手段としては、一般的に比較的
多い演算量を必要とし、かつ認識率の低い不特定話者音
声認識ではなく、携帯電話機の所有者または使用者の音
声によりあらかじめ学習させた特定話者音声認識手段を
利用することができる。携帯電話機における通話内容を
音声認識手段により文字化しようとする場合、送話段に
おける使用者から相手への音声と、受話段における相手
から使用者への音声の、２方向の音声を音声認識する必
要があり、また通常通話相手は特定の一人ではなく、不
特定多数の人となるため、その音声を音声認識するため
には不特定話者音声認識手段を利用する必要があるが、
本実施の形態では、音声認識は送話段だけで行われるた
め、音声認識する必要のある対象人物はその携帯電話機
の使用者だけとなる。As means for speech recognition, generally speaking, a relatively large amount of calculation is required, and learning is performed in advance by the voice of the owner or user of the mobile phone, instead of speech recognition of an unspecified speaker having a low recognition rate. The specific speaker voice recognition means can be used. When the contents of a call in a mobile phone are to be transcribed by voice recognition means, two-way voices are recognized: a voice from the user to the other party in the transmitting stage and a voice from the other party to the user in the receiving stage. It is also necessary to use an unspecified speaker's voice recognition means to recognize the voice, since the other party is usually not a specific person but an unspecified number of people.
In the present embodiment, since the voice recognition is performed only at the transmitting stage, only the user of the mobile phone needs to perform voice recognition.

【００２９】音声認識装置の使用者が一人または少数の
複数人である場合、各使用者の音声を使用して音声認識
に使用するコードブックないしテンプレートをあらかじ
め学習させることにより、不特定話者音声認識に比較し
て高い音声認識率を得られることが知られている。携帯
電話機は今日では一般に特定の使用者が利用するものと
考えてよく、この場合特定話者音声認識手段を使用する
ことにより携帯電話機に搭載可能なＣＰＵおよびＤＳＰ
で実現可能な演算量で十分な認識率を得ることができ
る。When one or a small number of users of the speech recognition apparatus are used, a codebook or a template used for speech recognition is previously learned using the speech of each user, so that the unspecified speaker's speech can be obtained. It is known that a higher speech recognition rate can be obtained compared to recognition. A mobile phone can be generally considered to be used by a specific user today. In this case, a CPU and a DSP which can be mounted on the mobile phone by using specific speaker voice recognition means.
Thus, a sufficient recognition rate can be obtained with a calculation amount that can be realized by the above.

【００３０】音声符号化部１３は、音声認識部１２にお
いて、音声認識に失敗した部分における音声信号を符号
化音声データに変換する機能を有する。セレクタ１４
は、入力部１０から出力される文字データ、音声認識部
１２から出力される文字データ及び音声符号化部１３か
ら出力される符号化音声データのいずれかを選択的にパ
ケット化部１５に出力する機能を有している。音声符号
化部１３は本発明の音声符号化手段に、セレクタ１４は
本発明の選択手段に相当する。The voice coding unit 13 has a function of converting the voice signal in the portion where voice recognition has failed in the voice recognition unit 12 into coded voice data. Selector 14
Selectively outputs any of the character data output from the input unit 10, the character data output from the voice recognition unit 12, and the coded voice data output from the voice coding unit 13 to the packetizing unit 15. Has a function. The voice coding unit 13 corresponds to voice coding means of the present invention, and the selector 14 corresponds to selection means of the present invention.

【００３１】更に、本実施の形態に係る携帯電話機は、
パケット化部１５と、パケット復元部２０と、データ振
分部２１と、音声読上部２２と、音声復号化・再生部２
３と、合成部２４と、スピーカ２５とを有している。パ
ケット化部１５は、送信する文字データまたは符号化音
声データに通話相手先のアドレスを付加してパケットに
変換しベースバンド処理部３、ＲＦ部２、アンテナ１を
介してパケット交換網に送出する機能を有し、パケット
復元部２０は、受信したパケットを文字データまたは符
号化音声データに復元する機能を有する。パケット化部
１５は本発明のパケット化手段に、パケット復元部２０
は本発明のパケット復元手段にそれぞれ、相当する。Further, the portable telephone according to the present embodiment
A packetizing unit 15, a packet restoring unit 20, a data sorting unit 21, a voice reading unit 22, and a voice decoding / reproducing unit 2
3, a synthesizing unit 24, and a speaker 25. The packetizer 15 adds the address of the other party to the character data or encoded voice data to be transmitted, converts the data into a packet, and sends the packet to the packet switching network via the baseband processor 3, RF unit 2, and antenna 1. The packet restoring unit 20 has a function of restoring a received packet to character data or encoded voice data. The packetizing section 15 includes a packet restoring section 20 in the packetizing means of the present invention.
Respectively correspond to the packet restoration means of the present invention.

【００３２】また、データ振分部２１は、パケット復元
部２０により復元された文字データまたは符号化音声デ
ータを取り込み、文字データを表示部３０及び音声読上
部２２に、また、符号化音声データを音声復号化・再生
部２３に振り分けるように出力する。音声読上部２２
は、データ振分部２１より入力された文字データを音声
として読上げるための音声信号を出力する機能を有し、
音声復号化・再生部２３は、データ振分部２１より入力
された符号化音声データを復号化し、再生する機能を有
する。合成部２４は、音声読上部２２から出力される音
声信号と音声復号化・再生部２３から出力される音声信
号とを合成して音声出力手段としてのスピーカ２５に出
力する。データ振分部２１は本発明のデータ振分手段
に、音声読上部２２は本発明の音声読上手段に、音声復
号化・再生部２３は本発明の音声復号化・再生手段にそ
れぞれ、相当する。The data distribution unit 21 takes in the character data or the encoded voice data restored by the packet restoring unit 20, and puts the character data on the display unit 30 and the voice reading upper part 22, and the encoded voice data. The output is distributed to the audio decoding / reproducing unit 23. Voice reading upper part 22
Has a function of outputting a voice signal for reading out the character data input from the data distribution unit 21 as voice,
The audio decoding / reproducing unit 23 has a function of decoding and reproducing the encoded audio data input from the data distribution unit 21. The synthesizing unit 24 synthesizes the audio signal output from the audio reading unit 22 and the audio signal output from the audio decoding / reproducing unit 23, and outputs the synthesized signal to the speaker 25 as audio output means. The data distribution unit 21 corresponds to the data distribution unit of the present invention, the voice reading upper part 22 corresponds to the voice reading unit of the present invention, and the voice decoding / reproducing unit 23 corresponds to the voice decoding / reproducing unit of the present invention. I do.

【００３３】また、本実施の形態に係る携帯電話機は、
表示部３０と、メモリ制御部３１と、メモリ３２とを有
している。表示部３０はパケット伝送により送受信した
文字データを画面上に表示する。表示部３０には音声認
識部１２から出力される文字データ、データ振分部２１
から出力される文字データ、さらにはメモリから読み出
された文字データが表示されるようになっている。表示
部３０は本発明の表示手段に相当する。Further, the mobile phone according to the present embodiment
It has a display unit 30, a memory control unit 31, and a memory 32. The display unit 30 displays character data transmitted and received by packet transmission on a screen. The display unit 30 displays character data output from the voice recognition unit 12 and the data distribution unit 21.
And character data read from the memory. The display unit 30 corresponds to a display unit of the present invention.

【００３４】メモリ制御部３１は、メモリ３２における
データの書き込み及び読み出しを制御する。メモリ３２
には音声認識部１２及びデータ振分部２１から出力され
る文字データと、音声符号化部１３及びデータ振分部２
１から出力される符号化音声データがメモリ制御部３１
の制御下に書き込まれ、入力部１０の操作に応じて、読
み出され、表示部３０の画面に表示されるようになって
いる。メモリ３２は本発明の記憶手段に相当する。The memory controller 31 controls writing and reading of data in the memory 32. Memory 32
The character data output from the voice recognition unit 12 and the data distribution unit 21 and the voice encoding unit 13 and the data distribution unit 2
1 is output from the memory control unit 31
Under the control of the input unit 10, read out in response to the operation of the input unit 10, and displayed on the screen of the display unit 30. The memory 32 corresponds to storage means of the present invention.

【００３５】上記構成において、携帯電話機では、音声
通話時に送話段におけるマイク１１から入力された音声
信号に対して音声認識部１２により音声認識が行われ、
音声認識部１２が音声認識に成功した場合はその認識結
果を文字データとして表示部３０に出力し、表示部に音
声認識結果である文字データ表示させると共に、セレク
タ１４を介してパケット化部１５に渡し、音声認識部１
２において音声認識に失敗した場合は、その音声認識に
失敗した部分における音声信号を音声符号化部１３に出
力する。音声符号化部１３では入力された音声信号を音
声符号化データに変換してセレクタ１４を介してパケッ
ト化部１５に出力する。In the above configuration, in the portable telephone, voice recognition is performed by the voice recognition unit 12 on the voice signal input from the microphone 11 in the transmitting stage during a voice call,
When the voice recognition unit 12 succeeds in the voice recognition, the recognition result is output as character data to the display unit 30 to display the character data as the voice recognition result on the display unit, and to the packetizing unit 15 via the selector 14. Handover, voice recognition unit 1
If the speech recognition fails in step 2, the speech signal in the portion where the speech recognition failed is output to the speech encoding unit 13. The voice coding unit 13 converts the input voice signal into voice coded data and outputs it to the packetizing unit 15 via the selector 14.

【００３６】パケット化部１５においては、データの種
類が文字データであるか符号化音声データであるかによ
りデータの種類を示す識別情報及び通話相手のアドレス
を付加したパケットを生成し、ベースバンド処理部３に
出力する。ベースバンド処理部３ではパケット化部１５
から受け取ったパケットデータをベースバンド信号に変
換し、ＲＦ部２に出力する。ＲＦ部２では搬送波を上記
ベースバンド信号で変調してアンテナ１を介して図示し
てないパケット交換網を形成する基地局に送出する。The packetizing section 15 generates a packet to which identification information indicating the data type and the address of the other party are added depending on whether the data type is character data or encoded voice data, and performs baseband processing. Output to section 3. In the baseband processing unit 3, the packetizing unit 15
Is converted to a baseband signal and output to the RF unit 2. The RF unit 2 modulates the carrier with the baseband signal and transmits the modulated signal to a base station forming a packet switching network (not shown) via the antenna 1.

【００３７】一方、パケット交換網を形成する基地局か
ら通話相手のパケットを、アンテナ１を介してＲＦ部２
において受信した場合には、ＲＦ部２はベースバンド信
号に変換し、ベースバンド処理部３に出力する。ベース
バンド処理部３ではベースバンド信号をパケットデータ
に変換してパケット復元部２０に出力する。パケット復
元部２０では、パケットに含まれるデータの種類を示す
識別情報に従って受信したパケットを文字データまたは
符号化音声データに復元し、データ振分部２１に出力す
る。データ振分部２１では、入力されたデータが文字デ
ータであれば、文字データを表示部３０及び音声読上部
２２に出力し、また入力されたデータが符号化音声デー
タであれば音声復号化・再生部２３に出力する。On the other hand, a packet of a communication partner is transmitted from a base station forming a packet switching network to an RF unit 2 via an antenna 1.
, The RF unit 2 converts the signal into a baseband signal and outputs the signal to the baseband processing unit 3. The baseband processing unit 3 converts the baseband signal into packet data and outputs the packet data to the packet restoration unit 20. The packet restoration unit 20 restores the received packet to character data or encoded voice data according to the identification information indicating the type of data included in the packet, and outputs the data to the data distribution unit 21. If the input data is character data, the data distribution unit 21 outputs the character data to the display unit 30 and the voice reading upper part 22, and if the input data is coded voice data, decodes the voice. Output to the playback unit 23.

【００３８】データ振分部２１に入力されたデータが文
字データである場合には、表示部３０の画面に、受信し
たパケットの文字データが表示されると共に、音声読上
部２２では文字データを音声として読上げるための音声
信号に変換し、該音声信号を合成部２４に出力する。ま
た、データ振分部２１に入力されたデータが符号化音声
データである場合には、音声復号化・再生部２３では、
符号化音声データを復号化し、音声信号に再生し、合成
部２４に出力する。合成部２４では、音声読上部２２及
び音声復号化・再生部２３から出力される音声信号を合
成し、スピーカ２５に出力する。When the data input to the data sorting unit 21 is character data, the character data of the received packet is displayed on the screen of the display unit 30 and the character reading unit 22 converts the character data into voice data. And outputs the audio signal to the synthesizing unit 24. When the data input to the data distribution unit 21 is encoded audio data, the audio decoding / reproduction unit 23
The encoded audio data is decoded, reproduced as an audio signal, and output to the synthesizing unit 24. The synthesizing unit 24 synthesizes audio signals output from the audio reading unit 22 and the audio decoding / reproducing unit 23 and outputs the synthesized audio signals to the speaker 25.

【００３９】上述したように表示部３０ではパケットの
送受信時に、送受信された文字データが画面上に表示さ
れる。表示部３０において、最新の文字データが有り、
過去の文字データが既に画面に表示されている場合に
は、入力部１０を操作することにより過去の文字データ
をスクロールさせて最新の文字データを表示することが
できる。尚、音声認識部１２において音声認識に失敗し
た部分については符号化音声データとなっている区間に
ついてはそれを示す文字または記号を表示するようにし
てもよい。As described above, when transmitting and receiving packets, the display unit 30 displays the transmitted and received character data on the screen. In the display unit 30, there is the latest character data,
When the past character data is already displayed on the screen, by operating the input unit 10, the past character data can be scrolled to display the latest character data. It should be noted that a character or a symbol indicating the section where the speech recognition has failed in the speech recognition section 12 may be displayed in the section where the encoded speech data is used.

【００４０】また、本実施の形態に係る携帯電話機では
送話段においては文字を入力する手段として話者の発声
と音声認識によるだけでなく、入力部（例えばキーボー
ド）１０により文字データを入力することにより文字デ
ータをパケット伝送することができる。これにより発声
機能に障害がある人が携帯電話機を使用する場合に対応
できる。また受話段では表示部３０に文字表示を行うこ
とにより聴覚に障害がある人に対応できるようになって
いる。Further, in the portable telephone according to the present embodiment, in the transmitting stage, as means for inputting characters, character data is input by an input unit (for example, a keyboard) 10 in addition to voice recognition and voice recognition of a speaker. This allows character data to be transmitted in packets. This makes it possible to cope with a case where a person with a speech function disorder uses a mobile phone. In addition, by displaying characters on the display unit 30 at the receiving stage, it is possible to cope with a person with hearing impairment.

【００４１】また、送受信された文字データは表示部３
０の画面に表示されるだけでなく、同時に所定量を限度
として最新の文字データの順にメモリ制御部３１の制御
下にメモリ３２に記憶される。通話中にユーザが入力部
１０を操作することにより所望の文字データを画面に呼
び出し、内容を確認することができる。表示部３０では
画面表示において例えば画面を複数に分割し、最新の情
報を表示する画面と過去の送受信内容を表示し、ユーザ
が入力部１０を操作することにより画面をスクロールさ
せることができる。The transmitted / received character data is displayed on the display unit 3.
In addition to being displayed on the screen 0, the character data is simultaneously stored in the memory 32 under the control of the memory control unit 31 in the order of the latest character data up to a predetermined amount. When the user operates the input unit 10 during a call, desired character data can be called up on the screen and the content can be confirmed. The display unit 30 divides the screen into, for example, a plurality of screens, displays a screen displaying the latest information and past transmission / reception contents, and allows the user to operate the input unit 10 to scroll the screen.

【００４２】尚、メモリ３２に記憶された文字データは
通話終了時点で自動的に消去するように構成してもよい
し、通話終了後に通話内容を確認できるよう所定量を限
度として保持するよう構成してもよい。本実施の形態で
は、メモリ３２には、パケットの送受信時にメモリ制御
部３１の制御下に文字データに併せて符号化音声データ
が記憶される。ユーザが入力部１０を操作することによ
り、メモリ３０より所望の文字データ及び符号化音声デ
ータを読み出し、これらのデータを音声読上部２２及び
音声復号化・再生部２３に出力することにより合成部２
４を介してスピーカ２５により音声として再生すること
ができる。このように本実施の形態では、音声データを
表示部３０の画面で確認するだけでなく、音声としても
再生できる。The character data stored in the memory 32 may be automatically deleted at the end of the call, or may be held at a predetermined amount so that the contents of the call can be confirmed after the end of the call. May be. In the present embodiment, the encoded voice data is stored in the memory 32 together with the character data under the control of the memory control unit 31 when transmitting and receiving the packet. The user operates the input unit 10 to read out desired character data and coded voice data from the memory 30 and output these data to the voice reading unit 22 and the voice decoding / reproducing unit 23, whereby the synthesizing unit 2 is read.
4 and can be reproduced as audio by the speaker 25. As described above, in the present embodiment, audio data can be reproduced not only on the screen of the display unit 30 but also as audio.

【００４３】尚、本実施の形態では、互いに送話段に音
声認識部を有する携帯電話機同士の通話を前提としてい
るが、通話相手が異なる方式の携帯電話機や、固定回線
網に接続された電話である場合は、基地局において相当
する不特定話者音声認識手段を設けるか、回線交換網と
パケット交換網の間に接続された音声認識機能と音声読
上機能をサービスするゲイトウェイを設けるようにして
もよい。In the present embodiment, it is assumed that mobile phones having a voice recognition unit at the transmission stage of each other have a telephone conversation. However, mobile phones of different types or telephones connected to a fixed line network are used. In such a case, a corresponding speaker-independent voice recognition means should be provided in the base station, or a gateway for providing voice recognition function and voice reading function connected between the circuit switching network and the packet switching network should be provided. You may.

【００４４】以上に説明したように、本実施の形態に係
る携帯電話機によれば、符号化音声データと比較してデ
ータ量の少ない文字データにより送話段におけるマイク
から入力される音声信号を伝送するため、回線資源の使
用量を抑制し、またユーザに対する課金を低減すること
ができる。また、本実施の形態に係る携帯電話機によれ
ば、必要な音声認識部は不特定話者音声認識手段と比較
して実現が容易な特定話者音声認識手段であるため、音
声の認識率が高く、実現の為に必要になるＣＰＵないし
ＤＳＰの演算量、演算性能も低く抑えることができる。As described above, according to the portable telephone according to the present embodiment, the audio signal input from the microphone in the transmitting stage is transmitted by character data having a smaller data amount than the encoded audio data. Therefore, it is possible to suppress the use amount of the line resource and reduce the charge for the user. In addition, according to the mobile phone according to the present embodiment, the required voice recognition unit is a specific speaker voice recognition unit that is easier to realize than an unspecified speaker voice recognition unit. It is high, and the amount of computation and computational performance of the CPU or DSP required for realization can be reduced.

【００４５】更に、本実施の形態に係る携帯電話機によ
れば、音声認識部において音声認識に失敗した部分につ
いては符号化音声データとして伝送することにより、認
識率が１００％でなくても会話内容の欠落を抑制し、現
実的な音声認識性能でも実用的な音声通話を行うことが
できる。また、本実施の形態に係る携帯電話機によれ
ば、通話内容の履歴を画面に表示し、ユーザの操作によ
り参照できるため通話中の会話内容の確認が容易であ
り、使い勝手の良い音声通話手段を提供することができ
る。Further, according to the portable telephone according to the present embodiment, the portion where the speech recognition failed in the speech recognition section is transmitted as coded speech data, so that even if the recognition rate is not 100%, the content of conversation can be reduced. Can be suppressed, and a practical voice call can be performed even with realistic voice recognition performance. In addition, according to the mobile phone according to the present embodiment, the history of the call contents is displayed on the screen and can be referred to by the user's operation, so that it is easy to check the contents of the conversation during the call, and a user-friendly voice call means is provided. Can be provided.

【００４６】また、本実施の形態に係る携帯電話機によ
れば、通話内容の記録も符号化音声データと比較してデ
ータ量の少ない文字情報により行えるため、記憶容量の
少ないメモリで長時間の会話内容を記録し、メモリから
読み出すことにより通話内容を再生することができる。
また、本実施の形態に係る携帯電話機によれば、音声に
よる通話内容を文字化して画面にも表示し、かつキーボ
ード等の入力部により文字データを入力してパケット伝
送することができるので、発声や聴覚に障害がある人で
も音声による通話を行う相手と通話することができる。
（一方が文字による通話、他方が通常の音声による通話
が可能である）Further, according to the portable telephone according to the present embodiment, since the contents of a call can be recorded using character information having a smaller data amount than that of the encoded voice data, a long-time conversation can be performed with a memory having a small storage capacity. The contents of the call can be reproduced by recording the contents and reading the contents from the memory.
In addition, according to the mobile phone according to the present embodiment, since the contents of a voice call can be transcribed and displayed on a screen, and character data can be input and packet transmitted by an input unit such as a keyboard. People with hearing or hearing impairments can also talk to a person who makes a voice call.
(One can call with text and the other can call with normal voice)

【００４７】[0047]

【発明の効果】請求項１に記載の発明によれば、通話時
に送話段の音声を特定話者音声認識し、該音声認識結果
を文字データ化し、該文字データを指定の通話相手にパ
ケット伝送し、また前記音声認識におい音声認識が失敗
した場合には前記文字データと、前記認識失敗部分を符
号化音声データに変換し、該変換された符号化音声デー
タとを指定の通話相手にパケット伝送し、一方、特定の
通話相手からパケット伝送により送信された文字データ
の読み上げ及び符号化音声データの復号かつ再生を適
宜、行うようにしたので、回線資源使用量およびユーザ
への課金の低減を図ることができ、従来の回線交換によ
る音声通話に近い使い勝手の携帯電話機を実現できる。According to the first aspect of the present invention, during a call, the voice of the transmitting stage is recognized as a specific speaker's voice, the voice recognition result is converted into character data, and the character data is transmitted to a designated communication partner in a packet. When the speech recognition fails in the speech recognition, the character data and the recognition failure portion are converted into coded voice data, and the converted coded voice data is packeted to a designated communication partner. On the other hand, the reading of character data transmitted from a specific communication partner by packet transmission and the decoding and reproduction of encoded voice data are performed appropriately, so that the amount of line resources used and the charge to the user are reduced. This makes it possible to realize a mobile phone that is easy to use and is similar to a conventional voice call using circuit switching.

【００４８】請求項２に記載の発明によれば、請求項１
に記載の携帯電話機において、パケット伝送により送受
信した文字データを画面上に表示するようにしたので、
通話内容の確認が容易であり、利便性の向上が図れる。According to the invention described in claim 2, according to claim 1
In the mobile phone described in the above, character data transmitted and received by packet transmission is displayed on the screen,
It is easy to check the contents of the call, and the convenience can be improved.

【００４９】請求項３に記載の発明によれば、音声通話
時に送話段において入力された音声信号を音声認識し、
文字データに変換する音声認識手段と、前記音声認識手
段において音声認識に失敗した部分における音声信号を
符号化音声データに変換する音声符号化手段と、前記音
声認識手段から出力される文字データと前記音声符号化
手段より出力される符号化音声データを選択的に前記パ
ケット化手段に出力する選択手段と、前記パケット復元
手段により復元されたデータの種別に応じて該データを
音声読上手段または音声復号化・再生手段に振り分ける
データ振分手段と、前記データ振分手段より入力された
文字データを音声として読上げる音声読上手段と、前記
データ振分手段より入力された符号化音声データを復号
化し、再生する音声復号化・再生手段と、前記音声読上
手段から出力される音声信号と前記音声復号化・再生手
段から出力される音声信号とを合成して音声出力手段に
出力する合成手段とを有するので、回線資源使用量およ
びユーザへの課金の低減を図ることができ、従来の回線
交換による音声通話に近い使い勝手の携帯電話機を実現
できる。According to the third aspect of the present invention, the voice signal input at the transmission stage during voice communication is recognized by voice,
Voice recognition means for converting to character data, voice coding means for converting a voice signal in a portion where voice recognition failed in the voice recognition means into coded voice data, character data output from the voice recognition means, and Selecting means for selectively outputting coded voice data output from the voice coding means to the packetizing means; and reading out the data according to the type of data restored by the packet restoring means or voice reading means. Data distributing means for distributing the data to the decoding / reproducing means, voice reading means for reading out the character data inputted from the data distributing means as voice, and decoding of the encoded voice data inputted from the data distributing means. Audio decoding / reproducing means for converting and reproducing, an audio signal output from the audio reading means and an audio signal output from the audio decoding / reproducing means Since it has a synthesizing means for synthesizing a voice signal and outputting it to a voice output means, it is possible to reduce the amount of line resources used and the charge to the user, and it is easy to use a portable telephone which is close to a conventional voice call by line switching. Can be realized.

【００５０】請求項４に記載の発明によれば、請求項３
に記載の携帯電話機において、パケット伝送により送受
信した文字データを画面上に表示する表示手段を有する
ので、通話内容の確認が容易であり、利便性の向上が図
れる。According to the invention set forth in claim 4, according to claim 3,
Since the mobile phone described in (1) has display means for displaying character data transmitted and received by packet transmission on the screen, it is easy to check the contents of the call and to improve the convenience.

【００５１】請求項５に記載の発明によれば、請求項３
または４のいずれかに記載の携帯電話機において、音声
認識手段として、特定の利用者の音声を予め学習させた
特定話者音声認識手段を用いるようにしたので、音声の
認識率が高く、特定話者音声認識手段を実現するために
必要になるＣＰＵまたはＤＳＰの演算量、演算性能を低
く抑えることができる。また、音声認識に失敗した部分
については符号化音声データとして伝送するので、音声
認識率が１００％でなくても会話内容の欠落を抑制し、
現実的な音声認識性能でも実用的な音声通話を実現でき
る。According to the invention described in claim 5, according to claim 3,
In the mobile phone according to any one of the first to fourth aspects, the specific speaker voice recognizing means in which a specific user's voice has been learned in advance is used as the voice recognizing means. It is possible to reduce the amount of computation and computational performance of the CPU or DSP required to realize the user voice recognition means. In addition, since the portion where the speech recognition has failed is transmitted as coded speech data, even if the speech recognition rate is not 100%, the loss of the conversation content is suppressed,
Practical voice calls can be realized even with realistic voice recognition performance.

【００５２】請求項６に記載の発明によれば、請求項６
に記載の発明は、請求項４または５のいずれかに記載の
携帯電話機において、更に、送話段において、文字デー
タを入力する文字入力手段を有し、前記選択手段は、前
記文字入力手段より出力される文字データと、前記音声
認識手段から出力される文字データと、前記音声符号化
手段より出力される符号化音声データとのうちいずれか
のデータを選択的に前記パケット化手段に出力するよう
にしたので、通話内容を文字化して文字データを画面に
表示すると同時に、文字入力手段により文字データを入
力することができ、それ故発声や聴覚に障害がある人で
も音声による通話を行う相手と通話することが可能とな
る。According to the invention described in claim 6, according to claim 6,
The portable telephone according to any one of claims 4 and 5, further comprising a character input unit for inputting character data in a transmission stage, wherein the selection unit is provided with a character input unit. Any one of the output character data, the character data output from the voice recognition unit, and the encoded voice data output from the voice encoding unit is selectively output to the packetizing unit. As a result, the contents of the call can be transcribed and the character data can be displayed on the screen, and at the same time, the character data can be input by the character input means. It becomes possible to talk with.

【００５３】請求項７に記載の発明によれば、請求項６
に記載の携帯電話機において、更に、パケット伝送によ
り送受信する文字データを記憶する記憶手段と、各種の
動作を指定する操作手段を有し、前記操作手段の操作に
より前記記憶手段に記憶されている文字データを読み出
し、前記表示手段に表示するようにしたので、通話内容
の記録を符号化音声データに比較してデータ量の少ない
文字情報により行えるため、少ない記憶容量の記憶手段
で長時間の会話内容を記録でき、かつ音声として再生す
ることができる。According to the invention of claim 7, according to claim 6,
The mobile phone according to claim 1, further comprising a storage unit for storing character data to be transmitted and received by packet transmission, and an operation unit for designating various operations, and the character stored in the storage unit by operation of the operation unit. Since the data is read out and displayed on the display means, the contents of the conversation can be recorded by using character information having a smaller data amount as compared with the encoded voice data. Can be recorded and reproduced as audio.

【００５４】請求項８に記載の発明によれば、記憶手段
に文字データに加えて、符号化音声データを記憶させ、
操作手段の操作によりこれらのデータを読み出して音声
読上手段または前記音声復号化・再生手段により音声信
号として再生することにより、文字情報を画面で確認す
るだけでなく、音声として聴き取ることができる。According to the eighth aspect of the present invention, the encoded voice data is stored in the storage means in addition to the character data.
By reading these data by operating the operation means and reproducing the data as a voice signal by the voice reading means or the voice decoding / reproducing means, it is possible not only to confirm the character information on the screen but also to listen to the voice as voice. .

[Brief description of the drawings]

【図１】本発明の実施の形態に係る携帯電話機の構成
を示すブロック図。FIG. 1 is a block diagram showing a configuration of a mobile phone according to an embodiment of the present invention.

[Explanation of symbols]

１アンテナ２ＲＦ部３ベースバンド処理部１０入力部１１マイク１２音声認識部１３音声符号化部１４セレクタ１５パケット化部２０パケット復元部２１データ振分部２２音声読上部２３音声復号化・再生部２４合成部２５スピーカ３０表示部３１メモリ制御部３２メモリ Reference Signs List 1 antenna 2 RF unit 3 baseband processing unit 10 input unit 11 microphone 12 voice recognition unit 13 voice coding unit 14 selector 15 packetization unit 20 packet restoration unit 21 data distribution unit 22 voice reading unit 23 voice decoding / playback unit 24 synthesis unit 25 speaker 30 display unit 31 memory control unit 32 memory

フロントページの続きＦターム(参考） 5K027 AA11 HH20 5K067 AA34 AA41 BB04 BB21 CC08 DD53 DD54 EE02 EE16 EE25 FF02 FF23 FF26 GG01 GG11 HH23 5K101 LL12 NN07 NN08 NN18 SS07 SS08 Continued on the front page F term (reference) 5K027 AA11 HH20 5K067 AA34 AA41 BB04 BB21 CC08 DD53 DD54 EE02 EE16 EE25 FF02 FF23 FF26 GG01 GG11 HH23 5K101 LL12 NN07 NN08 NN18 SS07 SS08

Claims

[Claims]

1. A portable telephone compatible with packet transmission, which recognizes a voice of a transmitting stage during a telephone call as a specific speaker's voice, converts the voice recognition result into character data, and transmits the character data to a designated communication partner. When the speech recognition fails in the speech recognition, the character data and the recognition failure portion are converted into encoded speech data, and the converted encoded speech data is transmitted to a designated communication partner in a packet. A mobile phone which transmits, and on the other hand, appropriately reads out character data transmitted from a specific communication partner by packet transmission and decodes and reproduces encoded voice data.

2. The mobile phone according to claim 1, wherein character data transmitted and received by packet transmission is displayed on a screen.

3. Packetizing means for adding a call destination address to character data or coded voice data to be transmitted, converting the data into a packet, and transmitting the packet to a packet switching network, and converting the received packet into character data or coded voice data. And a packet restoring means for restoring to a mobile phone,
Speech recognition means for recognizing a speech signal inputted in a transmission stage during a voice call and converting the speech signal into character data, and speech for converting a speech signal in a portion where speech recognition failed in the speech recognition means into encoded speech data. Encoding means; character data output from the voice recognition means and encoded voice data output from the voice encoding means; and selecting means for selectively outputting the packetized data to the packetizing means. Data distributing means for distributing the data to voice reading means or voice decoding / reproducing means in accordance with the type of the data, and voice reading means for reading character data input from the data distributing means as voice And audio decoding / reproducing means for decoding and reproducing the encoded audio data input from the data distribution means, and A mobile phone comprising: a synthesizing unit that synthesizes an output audio signal and an audio signal output from the audio decoding / reproducing unit and outputs the synthesized audio signal to an audio output unit.

4. The mobile phone according to claim 3, further comprising display means for displaying character data transmitted and received by packet transmission on a screen.

5. The mobile phone according to claim 3, wherein the voice recognition unit is a specific speaker voice recognition unit that has learned a specific user's voice in advance.

6. The transmitting stage further includes character input means for inputting character data, wherein the selecting means includes character data output from the character input means and character output from the voice recognition means. The data according to claim 4, wherein one of data and encoded audio data output from the audio encoding unit is selectively output to the packetizing unit. Mobile phone.

7. A storage device for storing character data to be transmitted and received by packet transmission, and an operation device for designating various operations, wherein the character data stored in the storage device is stored by the operation of the operation device. The mobile phone according to claim 6, wherein the mobile phone reads the data and displays the read data on the display unit.

8. The storage means stores encoded voice data in addition to the character data transmitted and received by packet transmission, and stores the character data or the character data stored in the storage means by operating the operation means. 8. The mobile phone according to claim 7, wherein the encoded voice data is read and reproduced as a voice signal by the voice reading means or the voice decoding / reproducing means.