JP4232453B2

JP4232453B2 - Call voice text conversion system

Info

Publication number: JP4232453B2
Application number: JP2002366514A
Authority: JP
Inventors: 真広後藤
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2002-12-18
Filing date: 2002-12-18
Publication date: 2009-03-04
Anticipated expiration: 2022-12-18
Also published as: JP2004200985A

Description

【０００１】
【発明の属する技術分野】
本発明は、電話機で通話をする場合の情報伝達の正確性を高めることができるようにした通話音声のテキスト変換システムに関する。
【０００２】
【発明が解決しようとする課題】
携帯電話機などを使って通話をする場合に、使用環境や場所によっては相手の音声が聞き取りにくい場合があり、通話内容を把握するのに何度も聞き返したりするなど意思の疎通が図りにくくなることがある。通信技術を改善することによりこのような問題が解決できる方向に向かうことは予想できるが、現状ではすぐに解決を図ることができない場合がある。
【０００３】
また、お年寄りや耳の不自由な人などにとっては、単に通信技術だけではなく、音声情報の伝達が正常になされていても意思の疎通がはかりにくくなる場合がある。さらには、話者がうっかりしていて、電話をかけて話し合ったことを忘れてしまったり、どんな印象であったかなどについても正確な記憶がなくなる場合もあり、単純に通信技術だけの問題ではない部分があった。
【０００４】
本発明は上記事情に鑑みてなされたものであり、その目的は、電話で通話するときに、相手の言っていることを把握しやすくすることができると共に、意思の疎通を図りやすくすることができる通話音声のテキスト変換システムを提供することにある。
【０００５】
【課題を解決するための手段】
請求項１の発明によれば、電話機を使って対話をする場合に、話者が通話音声を入力すると、相手方に通話音声が伝わると共に、テキスト変換手段により、その通話音声が音声認識処理を経てテキストデータに変換され、音量情報付加手段により、通話音声の音量に応じた文字の大きさを指定する情報が付加されるので、その情報を表示手段に表示することで話者は相手の通話音量に比例した文字の大きさで表現したテキストを見ることができる。これにより、聞き取りにくい場合や聞き漏らした場合などにおいても、そのテキスト情報から相手の通話の状況を認識するための情報が増えることになり、意思の疎通が図りやすくなる。
また、テキストデータ送信手段により、通話後に通話したときのテキストデータを要求に応じて電子メールとして送信することができるので、通話した内容を記録として残すことができ、しかも通話の状況を文字の大きさや感情情報などで補って理解することができるようになる。
【０００６】
請求項２の発明によれば、上記請求項１の発明において、感情推定手段により、通話音声により分析可能な話者の感情を得て、感情情報付加手段によりテキストデータに感情情報を付加するので、その情報を表示することで話者は相手の通話音声に現れる感情を理解して対応することができ、聞き取りにくい場合や聞き漏らした場合などにおいても、そのテキスト情報から相手の通話の状況を認識するための情報が増えることになり、意思の疎通が図りやすくなる。
【０００７】
請求項３の発明によれば、上記各発明において、話者情報付加手段により、変換されたテキストデータを話者別に対応して表示させる話者表示情報をテキストデータに付加するので、聞き取りにくい場合や聞き漏らした場合などにおいても、そのテキスト情報から相手の通話の状況を認識するための情報が増えることになり、意思の疎通が図りやすくなる。
【０００９】
【発明の実施の形態】
以下、本発明を携帯電話機を用いたシステムに適用した場合の一実施形態について図面を参照して説明する。
図１はシステムの全体構成を概略的に示すもので、携帯電話機１ａ、１ｂは、基地局２ａ，２ｂなどを経由して電話網３に接続可能で、交換局４を介して携帯電話同士あるいは有線電話との間の通話が行えるように構成されている。
【００１０】
交換局４にはテキストデータ変換装置としてのサーバ５が接続されており、これは、本発明でいうところの音量情報付加手段、感情情報付加手段、話者情報付加手段、テキストデータ送信手段としての機能を兼ね備えたものである。なお、これらの手段の機能は、サーバ５において後述するソフトウェアを実行することにより実現される。
【００１１】
サーバ５は、インターネット６などのネットワークを介して各種プロバイダなどが運営しているメールサーバ７ａ，７ｂなどにアクセス可能である。メールサーバ７ａ，７ｂは、インターネット６を介して携帯電話機１ａ，１ｂなどの使用者が所持するパソコン８ａ，８ｂからアクセス可能となっている。
【００１２】
次に、本実施形態の作用について図２ないし図４も参照して説明する。
図２は音声情報をテキストデータに変換して表示させる場合の基本的な動作について示している。いま、携帯電話機１ａを使用する話者Ａが携帯電話機１ｂを使用する話者Ｂに電話をかける場合を例にとって説明する。
【００１３】
使用者Ａが携帯電話１ａを使って使用者Ｂの携帯電話機１ｂに電話をかけると、基地局１ａ、通信網３、交換局４から基地局１ｂを経由して携帯電話機１ｂと通話ができるようになる。この状態で、まず、話者Ａが音声入力で「こんにちは」と話すと（ステップＡ１）、その音声情報は携帯電話機１ｂに伝わって音声として出力される（ステップＡ２）と共に、交換局４からサーバ５に入力され、サーバ５においてテキストデータに変換されるようになる。
【００１４】
サーバ５においては、まず受け付けた音声入力信号を音声認識処理を行い（ステップＡ３）、これをテキストデータに変換する（ステップＡ４）。このとき、サーバ５は、受け付けた音声入力信号について後述するような入力音声の解析処理を行って音量や感情などに関するデータの付加を行う（図３参照）。そして、この解析処理を経て得られたテキストデータおよび付加情報を聞き手である話者Ｂの携帯電話機１ｂと音声入力した話者Ａの携帯電話機１ａに共に送信する（ステップＡ４）。
【００１５】
これにより、携帯電話機１ａ，１ｂにおいては、それぞれ受信したテキストデータである「こんにちは」の言葉を表示部９ａ，９ｂに表示させるようになる（図２ステップＡ５，Ａ６およびその表示例を参照）。このとき、携帯電話機１ａ側での表示は、自分が話した言葉のテキストデータであるから、自分を示す「自」の文字がテキストデータの先頭に表示され、携帯電話機１ｂ側での表示は、相手が話した言葉のテキストデータであるから、相手を示す「相」の文字がテキストデータの先頭に表示される。
【００１６】
次に、話者Ｂがこれに答えて、例えば「こちらこそ」と携帯電話機１ｂに音声入力をした場合（ステップＡ７）には、その音声情報が上述と同様にして携帯電話機１ｂから携帯電話機１ａに伝わって音声として出力され（ステップＡ８）、サーバ５においてテキストデータに変換されるようになる。
【００１７】
サーバ５においては、音声認識処理、テキストデータ変換処理を行って（ステップＡ９，１０）、得られたテキストデータおよび付加情報を聞き手である話者Ａの携帯電話機１ａと音声入力した話者Ｂの携帯電話機１ｂに共に送信する（ステップＡ１０）。
【００１８】
携帯電話機１ａ，１ｂにおいては、テキストデータである「こちらこそ」の言葉をそれぞれの表示部に表示させるようになる（図２ステップＡ１１，Ａ１２およびその表示例を参照）。また、携帯電話機１ａ，１ｂでのそれぞれの表示は、「相」、「自」の文字がテキストデータの先頭に表示される。
【００１９】
次に、上述したサーバ５がステップＡ４，Ａ１０で実行したテキスト変換処理において、入力音声の解析処理を行って音量や感情などに関するデータの付加を行う処理について図３を参照して説明する。この解析処理では、サーバ５は、入力された音声データについて、まずその音量解析処理を行う（ステップＢ１）。
【００２０】
ここでは、入力された音声の音量レベルを算出し、予め決められたしきい値で判別することで音量を小、中、大の３つのレベルに判別する。判別した結果に応じて、その対応するテキストの文字サイズを小、中、大となるようにテキストデータへの付加情報として設定する（ステップＢ２〜Ｂ５）。
【００２１】
テキストデータにこの付加情報が設定されている場合には、前述した表示部に表示をする場合に、その表示部が文字サイズを設定して表示することができる場合にこれを付加情報に対応した文字サイズで表示させることができる。ここで、文字サイズの「中」は、通常の会話レベルに相当している。
【００２２】
次に、サーバ５は、感情解析処理を行う（ステップＢ６）。これは、入力音声の口調を解析してその話者の感情を推定しようとするものである。具体的には、感情解析処理そのものについては、例えば、喜怒哀楽などの感情を代表する言葉や会話の抑揚のパターン、あるいは音声の波形パターンや平常時に対する声の高低の違いなどを分析することで推定することができる。
【００２３】
ここでは、サーバ５は、推定する感情の分類を「喜」，「怒」，「哀」，「楽」および「普通」の５つとしている（ステップＢ７）。これらに対応して文字の色を「黄色」，「赤色」，「青色」，「緑色」あるいは「黒色」に設定するように情報を付加することになる（ステップＢ８〜Ｂ１２）。
【００２４】
このような分析を行った結果は、テキストデータに付加情報として付加されるが、この付加情報を表示可能な表示部９ａでは、例えば図４に示すように表示動作を行う。また、付加情報の表示ができない表示部では、通常のサイズの文字を使って表示されるようになっている。
【００２５】
同図（ａ）では、相手から困った口調（「哀」に相当）で話しかけられている場合に、こちら（自分）からは怒った口調（「怒」に相当）で答えているところを示している。相手を示す「相：」表示Ｓ１の後に、驚いた状態を「普通」を示す黒色の大の文字サイズで「えっ、」表示Ｓ２が表示され、このあとに「哀」状態を示す青色で中の文字サイズで「それは困るな」表示Ｓ３が表示される。
【００２６】
これに応じて、自分を示す「自：」表示Ｓ４に続いて、怒った口調を示す赤色文字で「そんなこと言っても」表示Ｓ５の表示の後、赤色文字で大の文字サイズで「ダメ」表示Ｓ６、赤色文字で中の文字サイズで「だよ」表示Ｓ７が表示される。そして、このように、感情に応じて色が変わると共に、声の大きさに応じて文字のサイズが変わるので、視覚的にも相手の話す調子を理解しやすくなる。
【００２７】
一方、同図（ｂ）では、喜びの口調で語られる会話の内容を示す場合を示している。「自：」表示Ｓ８の後、「喜」状態を示す黄色の大の文字サイズで「合格しました」表示Ｓ９が表示され、これに答えて、「相：」表示Ｓ１０の後、「喜」状態を示す黄色の中の文字サイズで「良かったね」表示Ｓ１１が表示されている。
【００２８】
サーバ５は、上述のリアルタイムの処理に加えて、メール配信サービスを行うように構成されている。これは、予め携帯電話機１ａ，１ｂの使用者である話者Ａ，Ｂからメールサービスの依頼を受けている場合や、あるいは通話毎に設定可能なメール配信サービスを希望する旨の要求を受けている場合などに実施されるようになっている。
【００２９】
これは、上述したような通話音声について、音声認識してテキストデータに変換したものに、話者情報を加えて通常のメールにして送信するものである。このとき、テキストデータには音量情報、感情情報を付加してそのまま送信することもできる。メールは、例えば各話者Ａ，Ｂなどが契約しているプロバイダのメールサーバ７ａ，７ｂにインターネット６を経由して配信され、ＰＣ８ａ，８ｂからダウンロードすることで見ることができる。
【００３０】
これにより、実際に通話をしていたときの声の大きさや感情を確認しながら通話内容を読むことができ、その場の雰囲気や状況を把握しやすくなるし、あるいは、通話記録内容をメモなどしていなくても後で受信したメールによって確認することができるようになり、通信情報としても正確なものとすることができるようになる。
【００３１】
メールの配信先は、ＰＣ８ａ，８ｂなどに限らず、例えばモバイル機器であるＰＤＡや、インターネットにアクセス可能なカーナビゲーション装置などに設定することもできる。さらには、携帯電話機１ａ，１ｂそのものに、メールで配信するようにしても良い。
【００３２】
このような本実施形態によれば、通話相手との会話内容を交換機４に接続されたサーバ５により、音声認識処理およびテキストデータ変換処理を行うことで、文字情報として得ることができるので、音声が聞き取りにくい場合でも文字情報を参照して通話の内容を確実に把握できるようになる。また、通話音声の音量や感情に応じて文字の大きさや色を変化させることができるので、相手の通話時の状況をより把握しやすくなり、意思の疎通が図りやすくなる。
【００３３】
また、通話内容を後でメールで文字情報として確認することができるので、通話記録として残す場合も確実性が増し、また、メモなどを取る必要がなくなるので、会話に集中することができるようになる。
【００３４】
本発明は上記しかつ図面に示した実施形態に限定されるものではなく、次のように変形することが出来る。
感情情報は、色以外に、文字のパターンや文字の太さあるいは文字の線の種類、さらには文字の網掛けのパターンなどで表現することもできる。
【００３５】
感情情報を示す色は、文字そのものの色として示すことに加えて、文字の背景色として表示したり、マーカの色として表示することもできる。
感情情報は、話者側で設定スイッチなどを操作して話しているときの気分を示すボタンを操作してそのときの感情を直接相手側に伝えるようにすることもできる。
【００３６】
感情表現や音量表現をする付加情報については、非表示にする機能を設定する機能を持たせても良い。
自分の会話部分については非表示にすることもできる。これはスイッチにより設定するようにすれば良い。
【００３７】
テキストデータをメールで受ける媒体は、ＰＣに限らず、ＰＤＡや他の携帯電話機など適宜の媒体を指定して受けることができる。
【図面の簡単な説明】
【図１】本発明の一実施形態を示す全体構成の概略図
【図２】携帯電話機間およびサーバとの間のやり取りを示すチャート
【図３】音量解析処理および感情解析処理の概略的なフローチャート
【図４】表示画面の例を示す図
【符号の説明】
１ａ，１ｂは携帯電話機（電話機）、２ａ，２ｂは基地局、３は電話網、４は交換局、５はサーバ（テキスト変換手段、音量情報付加手段、感情推定手段、感情情報付加手段、話者情報付加手段）、６はインターネット（ネットワーク）、７ａ，７ｂはメールサーバ、８ａ，８ｂはＰＣ、９ａ，９ｂは表示部である。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a text conversion system for call voice that can improve the accuracy of information transmission when a telephone call is made.
[0002]
[Problems to be solved by the invention]
When making a call using a mobile phone etc., it may be difficult to hear the other party's voice depending on the usage environment and location, and it may be difficult to communicate, such as listening back and forth to grasp the contents of the call. There is. Although it can be expected that such problems can be solved by improving communication technology, there are cases in which it cannot be solved immediately.
[0003]
In addition, it may be difficult for elderly people and people with hearing impairments to communicate not only with communication technology but also with normal transmission of voice information. In addition, there are cases where the speaker is careless, forgets to talk on the phone, and there is a case where the exact memory is lost, etc. was there.
[0004]
The present invention has been made in view of the above circumstances, and its purpose is to make it easier to understand what the other party is saying when making a telephone call and to facilitate communication. An object is to provide a text conversion system for call voice.
[0005]
[Means for Solving the Problems]
According to the first aspect of the present invention, when a speaker inputs a call voice during a conversation using a telephone, the call voice is transmitted to the other party, and the call voice is subjected to a voice recognition process by the text conversion means. Since it is converted into text data and the volume information adding means adds information for specifying the character size according to the volume of the call voice, the speaker can display the call volume of the other party by displaying the information on the display means. You can see text expressed in character size proportional to. As a result, even when it is difficult to hear or when it is missed, information for recognizing the other party's call status increases from the text information, and communication is facilitated.
In addition, the text data transmission means can send the text data when the call is made after the call as an e-mail upon request, so that the contents of the call can be recorded as a record, and the state of the call You will be able to understand it with supplementary information.
[0006]
According to the second aspect of the invention, in the first aspect of the invention, the emotion estimation means obtains the speaker's emotion that can be analyzed by calling voice, and the emotion information addition means adds the emotion information to the text data. By displaying the information, the speaker can understand and respond to emotions appearing in the other party's call voice, and recognize the other party's call status from the text information even when it is difficult to hear or missed. This will increase the amount of information needed to communicate, making it easier to communicate.
[0007]
According to the invention of claim 3, in each of the above inventions, the speaker information adding means adds speaker display information for displaying the converted text data corresponding to each speaker to the text data. Even in the case of being missed, information for recognizing the other party's call status increases from the text information, making it easier to communicate.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment in which the present invention is applied to a system using a mobile phone will be described with reference to the drawings.
FIG. 1 schematically shows the overall configuration of a system. Cellular phones 1a and 1b can be connected to a telephone network 3 via base stations 2a and 2b, etc. It is configured to make a call with a wired telephone.
[0010]
The exchange 4 is connected to a server 5 as a text data converter, which is a volume information adding means, an emotion information adding means, a speaker information adding means, and a text data transmitting means in the present invention. It has a function. The functions of these means are realized by executing software to be described later on the server 5.
[0011]
The server 5 can access mail servers 7a, 7b and the like operated by various providers via a network such as the Internet 6. The mail servers 7a and 7b can be accessed from personal computers 8a and 8b possessed by a user such as the mobile phones 1a and 1b via the Internet 6.
[0012]
Next, the operation of the present embodiment will be described with reference to FIGS.
FIG. 2 shows a basic operation when voice information is converted into text data and displayed. An example will be described in which a speaker A who uses the mobile phone 1a calls a speaker B who uses the mobile phone 1b.
[0013]
When user A calls mobile phone 1b of user B using mobile phone 1a, he / she can talk to mobile phone 1b from base station 1a, communication network 3, and exchange 4 via base station 1b. become. Server in this state, first, when the speaker A speaks a "hello" in the voice input (step A1), the voice information is outputted as sound transmitted to the mobile phone 1b (step A2), from the exchange 4 5 is converted into text data in the server 5.
[0014]
The server 5 first performs voice recognition processing on the received voice input signal (step A3), and converts it into text data (step A4). At this time, the server 5 performs input voice analysis processing as will be described later on the received voice input signal, and adds data relating to volume, emotion, and the like (see FIG. 3). Then, the text data and additional information obtained through this analysis process are transmitted to the mobile phone 1b of the speaker B who is the listener and the mobile phone 1a of the speaker A who has inputted the voice (step A4).
[0015]
Thus, the cellular phone 1a, in 1b, so to display the words of a text data respectively received "Hello" display unit 9a, the 9b (see Figure 2 step A5, A6 and the display example). At this time, since the display on the mobile phone 1a side is the text data of the words spoken by itself, the character “self” indicating itself is displayed at the beginning of the text data, and the display on the mobile phone 1b side is: Since it is text data of the words spoken by the other party, characters of “phase” indicating the other party are displayed at the top of the text data.
[0016]
Next, when the speaker B answers this and, for example, makes a voice input to the mobile phone 1b saying “Here is” (step A7), the voice information is transferred from the mobile phone 1b to the mobile phone 1a in the same manner as described above. Is output as a voice (step A8), and is converted into text data in the server 5.
[0017]
The server 5 performs voice recognition processing and text data conversion processing (steps A9 and A10), and the obtained text data and additional information of the speaker B who is the listener and the speaker B who has input the voice are input. Both are transmitted to the mobile phone 1b (step A10).
[0018]
In the cellular phones 1a and 1b, the word “here is”, which is text data, is displayed on each display unit (see steps A11 and A12 in FIG. 2 and the display example thereof). Further, in the respective displays on the cellular phones 1a and 1b, the characters “phase” and “self” are displayed at the head of the text data.
[0019]
Next, in the text conversion processing executed by the server 5 in steps A4 and A10 described above, processing for performing input speech analysis processing and adding data relating to volume, emotions, etc. will be described with reference to FIG. In this analysis process, the server 5 first performs a volume analysis process on the input voice data (step B1).
[0020]
Here, the volume level of the input voice is calculated and discriminated by a predetermined threshold value to discriminate the volume from three levels of small, medium and large. According to the determined result, the character size of the corresponding text is set as additional information to the text data so as to be small, medium and large (steps B2 to B5).
[0021]
When this additional information is set in the text data, when displaying on the display unit described above, if the display unit can be displayed with the character size set, this corresponds to the additional information. Can be displayed in character size. Here, “medium” of the character size corresponds to a normal conversation level.
[0022]
Next, the server 5 performs emotion analysis processing (step B6). This is to analyze the tone of the input speech and estimate the emotion of the speaker. Specifically, the emotion analysis processing itself is, for example, analyzing words that represent emotions such as emotions, emotional inflection patterns, voice waveform patterns, and differences in voice levels relative to normal times. Can be estimated.
[0023]
Here, the server 5 has five estimated emotion classifications of “joy”, “anger”, “sorrow”, “easy”, and “normal” (step B7). Corresponding to these, information is added so as to set the character color to "yellow", "red", "blue", "green" or "black" (steps B8 to B12).
[0024]
The result of such an analysis is added as additional information to the text data. The display unit 9a capable of displaying this additional information performs a display operation as shown in FIG. 4, for example. In addition, in a display portion where additional information cannot be displayed, characters of a normal size are used for display.
[0025]
In the figure (a), when the other party is speaking in a troubled tone (corresponding to “sorrow”), this (yourself) is answering in an angry tone (corresponding to “angry”). ing. After the "phase:" display S1 indicating the opponent, the surprised state is displayed with a large black character size indicating "normal", and the "Uh" display S2 is displayed, followed by a blue color indicating the "sad" state The display S3 is displayed with the character size of “It is not a problem”.
[0026]
Correspondingly, after “Self:” display S4 indicating oneself, “Do n’t say that” with red character indicating angry tone, display S5, “Red” with large character size in red character is displayed. "S6", a red letter and a "Dayo" display S7 with the middle character size. In this way, the color changes according to the emotion and the character size changes according to the loudness of the voice, so that it is easy to understand the tone of the other party visually.
[0027]
On the other hand, FIG. 5B shows a case in which the content of conversation spoken with a pleasant tone is shown. After “Self:” display S8, a “passed” display S9 is displayed with a large yellow character size indicating a “joy” state. In response to this, after “phase:” display S10, “joy” is displayed. A “good” display S11 is displayed with the character size in yellow indicating the state.
[0028]
The server 5 is configured to perform a mail delivery service in addition to the above-described real-time processing. This is because a request for a mail service is received from speakers A and B who are users of the mobile phones 1a and 1b in advance, or a request for a mail delivery service that can be set for each call is received. It is to be implemented when there is.
[0029]
In this method, the call voice as described above is voice-recognized and converted into text data, and then the speaker information is added to send it as a normal mail. At this time, volume information and emotion information can be added to the text data and transmitted as it is. For example, the mail is distributed via the Internet 6 to the mail servers 7a and 7b of the providers contracted by the speakers A and B, and can be viewed by downloading from the PCs 8a and 8b.
[0030]
This makes it possible to read the content of the call while confirming the loudness and emotions of the actual call, making it easier to understand the atmosphere and situation of the place, or taking notes of the call record. Even if it is not, it can be confirmed later by the received mail, and the communication information can be made accurate.
[0031]
The mail delivery destination is not limited to the PCs 8a and 8b, but may be set to, for example, a PDA that is a mobile device or a car navigation device that can access the Internet. Furthermore, it may be delivered by e-mail to the mobile phones 1a and 1b themselves.
[0032]
According to the present embodiment as described above, the content of the conversation with the other party can be obtained as character information by performing the voice recognition process and the text data conversion process by the server 5 connected to the exchange 4. Even if it is difficult to hear, it becomes possible to reliably grasp the contents of the call by referring to the text information. In addition, since the size and color of the characters can be changed according to the volume and emotion of the call voice, it becomes easier to grasp the situation at the time of the other party's call and to facilitate communication.
[0033]
In addition, since the call contents can be confirmed later as text information by e-mail, it is more reliable when leaving a call record, and it is not necessary to take notes, so that you can concentrate on the conversation. Become.
[0034]
The present invention is not limited to the embodiment described above and shown in the drawings, and can be modified as follows.
In addition to color, emotion information can also be expressed by a character pattern, character thickness or character line type, and character shading pattern.
[0035]
The color indicating emotion information can be displayed as the background color of the character or the color of the marker in addition to the color of the character itself.
Emotion information can be transmitted directly to the other party by operating a button indicating the mood when speaking by operating a setting switch or the like on the speaker side.
[0036]
For additional information that expresses emotion or volume, a function for setting a non-display function may be provided.
You can also hide your own conversations. This may be set by a switch.
[0037]
The medium for receiving text data by e-mail is not limited to a PC, and an appropriate medium such as a PDA or another mobile phone can be designated and received.
[Brief description of the drawings]
FIG. 1 is a schematic diagram of an overall configuration showing an embodiment of the present invention. FIG. 2 is a chart showing exchanges between mobile phones and servers. FIG. 3 is a schematic flowchart of volume analysis processing and emotion analysis processing. FIG. 4 is a diagram showing an example of a display screen.
1a and 1b are mobile phones (telephones), 2a and 2b are base stations, 3 is a telephone network, 4 is an exchange, 5 is a server (text conversion means, volume information addition means, emotion estimation means, emotion information addition means, talk Person information adding means), 6 is the Internet (network), 7a and 7b are mail servers, 8a and 8b are PCs, and 9a and 9b are display units.

Claims

A text conversion means for inputting a call voice of a speaker who talks through a telephone, converting the voice into speech data by performing voice recognition processing;
Volume information adding means for analyzing the volume of the call voice and adding character size information specifying the character size according to the volume to the text data ;
Display means for displaying the text data on the telephone during a call;
A text conversion system for call voice, comprising: text data transmission means for transmitting the text data as an e-mail upon completion of the call in response to a request .

The text conversion system according to claim 1,
Emotion estimation means for estimating a speaker's emotion that can be analyzed from the call voice;
A text conversion system for call speech, comprising: emotion information adding means for adding emotion information specifying a color or pattern according to the emotion of the speaker estimated by the emotion estimation means to the text data .

In the call voice text conversion system according to claim 1 or 2,
A text conversion system for call voice, comprising: speaker information adding means for adding speaker display information for displaying the converted text data corresponding to each speaker to the text data.