JPH0388592A

JPH0388592A - Video telephone system

Info

Publication number: JPH0388592A
Application number: JP22510089A
Authority: JP
Inventors: Tetsuya Murakami; 哲也村上
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1989-08-31
Filing date: 1989-08-31
Publication date: 1991-04-12

Abstract

PURPOSE:To easily and securely grasp the transmission intension of a called party by converting recognized sounds into character information in a character code conversion part, supplying these converted character information to a display part and inserting these character information to one part of a picture. CONSTITUTION:Received sound signals are reproduced by a receiver 2, converted into digital signals by an A/D converter 23 and stored in a sound memory 24 afterwards. The sound signals read from the sound memory 24 are supplied to a sound recognition part 30 and the recognition processing of the sounds is executed. Then, recognized results are supplied to a memory 25 for character code conversion and sound information are converted into the character information. These character information are synthesized to a self-picture or a received picture in a picture synthesizing part 15 and supplied to a monitor 4. Thus, the received sound signals are projected on the monitor 4 as the character information.

Description

【発明の詳細な説明】［産業上の利用分野」この発明は、画像および音声の送受イεを行なうテレビ
電話装置に関し、特に受イεした音声情報を文字情報に
変換して表示できるようにしたテレビ電話装置に関する
。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a video telephone device that transmits and receives images and sounds, and particularly to a videophone device that can convert received audio information into text information and display it. The present invention relates to a videophone device.

［従来の技術］公衆電話回線を利用したテレビ電話装置が普及している
。[Prior Art] Videophone devices using public telephone lines have become widespread.

このテレビ電話装置では、画像はモニタ（ＣＲＴなどの
表示装置）によって認識され、音声は耳によって認識し
ている。In this videophone device, images are recognized by a monitor (a display device such as a CRT), and sounds are recognized by the ears.

［発明が解決しようとする課題］テレビ電話装置はユーザが希望する場所に設置されるも
のであるから、交通量の激しい道路ぎわのように、使用
環境があまり良くない所に設置されることがある。その
ようなときには、受信した音声を十分にＢＥ　ＩＪでき
ない場合が起きる。そのときには、同じ内容を再度相手
先に問い直している。[Problems to be Solved by the Invention] Since videophone devices are installed in a location desired by the user, they may be installed in locations where the usage environment is not very favorable, such as along roadsides with heavy traffic. . In such a case, there may be cases where the received audio cannot be sufficiently BE-IJed. At that time, the same question is asked again to the other party.

また、受信状態が悪いときにも、同じように相手の声が
間き取りにくくなるなことが起きる。Also, when reception conditions are poor, it can be difficult to hear the other person's voice in the same way.

そこで、この発明では、このような点を改善したもので
あって、設置環境や受信状態が悪いところでも、確実に
受信した音声を把握てきるようにしたテレビ電話装置を
提案するものである。In view of this, the present invention proposes a videophone device that improves on these points and is able to reliably grasp the received voice even in a place where the installation environment or reception conditions are poor.

［課題を解決するための手段］上述した課題を解決するため、この発明においては、音
声と画像の情報を夫々電話回線を用いて送受４’ｊるよ
うにしたテレビ電話装置において、音声ＢＰｉｍ部と文
字コード変換部とが設けられ、音声認識部は、受信者の
音声を記録する手段と、予め入力した音声データを基に
記録した音声とを比較参照して認識する手段とを有し、認識された音声が上記文字コード変換部において文字情
報に変換され、変換されたこの文字情報が表示部に供給されて、画像の
一部にこの文字情報が挿入されるようになされたことを
特徴とするものである。[Means for Solving the Problems] In order to solve the above-mentioned problems, the present invention provides a videophone device in which audio and image information are transmitted and received using telephone lines, respectively. and a character code conversion unit, the voice recognition unit includes means for recording the voice of the recipient, and means for comparing and recognizing the recorded voice based on the voice data inputted in advance, The recognized voice is converted into character information in the character code conversion unit, and the converted character information is supplied to the display unit, and the character information is inserted into a part of the image. That is.

［１作　用］受信した音声情報は音声認識部３０で認識され、その認
識結果に基づいて文字コード変換部２５で文字情報（文
字コード）に変換される。[1 Effect] The received voice information is recognized by the voice recognition section 30 and converted into character information (character code) by the character code conversion section 25 based on the recognition result.

この文字情報が画像情報と共にモニタ４に供給されて、
画像の一部に文字情報が挿入される。例えば、スーパー
インポーズされる。This text information is supplied to the monitor 4 together with image information,
Text information is inserted into a part of the image. For example, superimposed.

こうすれば、例えば受信状態が悪かったり、周囲の騒音
が激しく、それによって音声を正確に認識できないとき
でも、音声が文字情報に変換されて画像の一部にモニタ
できるようになるから、音声の内容を的確に把握するこ
とができる。This way, even if the reception is poor or there is a lot of noise in the surrounding area and the voice cannot be recognized accurately, the voice will be converted to text information and can be monitored as part of the image. Be able to understand the content accurately.

［実　施　例］続いて、この発明に係るテレビ電話装置の一例を、第１
図及び第２図を参照して詳細に説明する。[Example] Next, an example of the videophone device according to the present invention will be described in the first example.
This will be explained in detail with reference to the figures and FIG.

第１図は、本発明の基本構成図である。FIG. 1 is a basic configuration diagram of the present invention.

第１図において、１は公衆回線、２は音声を入出力する
受話器である。In FIG. 1, 1 is a public line, and 2 is a receiver for inputting and outputting audio.

３は送信するための画像を取り込むため、装置本体に内
蔵されたピデメカメラである。3 is a Pideme camera built into the main body of the device to capture images for transmission.

４は送信する画像の確認と受信した画像を表示するため
のモニタ（ＣＲＴなどの表示装置）である。Reference numeral 4 denotes a monitor (a display device such as a CRT) for checking images to be transmitted and for displaying received images.

５は操作部（キーボード）であり、これに設けられたテ
ンキーを利用して、直接電話番号などが入力される。Reference numeral 5 denotes an operation unit (keyboard), on which a numeric keypad is used to directly input telephone numbers and the like.

６はコンピュータを内蔵した制御部であり、これは、Ｃ
ＰＵ７、プログラムＲＯＭ８及びワークＲＡＭ９から構
成されている。6 is a control unit with a built-in computer;
It is composed of a PU 7, a program ROM 8, and a work RAM 9.

プログラムＲＯＭ８には、音声認識処理、送信処理、モ
ニタ表示処理などの各種の制御プログラムが格納されて
いるものとする・。したがって、制御部６は操作部５か
ら入力された各種の情報をもとに様々な処理が行なわれ
る。It is assumed that the program ROM 8 stores various control programs such as voice recognition processing, transmission processing, and monitor display processing. Therefore, the control section 6 performs various processes based on various information input from the operation section 5.

操作部５から入力された電話番号によって回線が接続さ
れると、切換スイッチ１４，１８．１９は夫々実線図示
の状態に切換られるので、音声のみ送受信ができる状態
となる。When the line is connected using the telephone number input from the operation unit 5, the changeover switches 14, 18, and 19 are respectively switched to the state shown by solid lines, so that only audio can be transmitted and received.

そして、ビデオカメラ３で撮像した画像情報（自画像）
は　Ａ／Ｄ変換Ｎ１２でディジタル４８号に変換されて
から画像メモリ１３にストアされ、ストアされた画像信
号は切換スイッチ１４、画像合成部１５を経てＤ／Ａ変
換器１６に供給されてアナログ信号に戻されたのち、モ
ニタ４に供給されて自画像が映しだされる。そして、切
換スイッチ１８が実線位置に切換られているときには、
画像情報を送信先に伝送することはできない。And image information (self-portrait) captured by video camera 3
is converted into digital No. 48 by the A/D converter N12 and then stored in the image memory 13, and the stored image signal is supplied to the D/A converter 16 via the changeover switch 14 and the image synthesis section 15, and is converted into an analog signal. After being returned to , it is supplied to the monitor 4 and a self-portrait is displayed. When the changeover switch 18 is switched to the solid line position,
Image information cannot be transmitted to the destination.

画像送信モードに切換られると、切換スイッチ１８が破
線図示のモードに切換られる。When the mode is switched to the image transmission mode, the changeover switch 18 is switched to the mode shown by the broken line.

これで、画像メモリ１３よりリードされた画像情報がＤ
／Ａ変換器２０でアナログ信号に変換され、このアナロ
グ４ｇ号の状態で公衆回線１側に送出される。この画像
情報の送信状態にあるときには、ビデオカメラ３からの
画像信号は画像メモリ１３にメモリされないように、制
御部６によって制御される。Now, the image information read from the image memory 13 is
The signal is converted into an analog signal by the /A converter 20, and sent to the public line 1 side in the form of analog 4g signal. When this image information is being transmitted, the control unit 6 controls the image signal from the video camera 3 so that it is not stored in the image memory 13.

画像の送信が終了すると、画像の送信が終了したことを
送信者に伝えると共に、切換スイッチ１８が自動的に実
線図示の状態に切り換えられて、音声のみ送受信状態と
なる。このときには、画像メモリ１３は書き込みイネー
ブル状態となる。When the transmission of the image is completed, the sender is notified that the transmission of the image has been completed, and the changeover switch 18 is automatically switched to the state shown by the solid line, so that only audio can be transmitted and received. At this time, the image memory 13 is in a write enabled state.

一方、公衆回線１より伝送された相手側の画像情報は切
換スイッチ１９を経てＡ／Ｄ変換器２１に供給されてデ
ィジタル信号に変換されてから画像メモリ２２に蓄えら
れる。画像メモリ２２よりリートされた画像情報は切換
スイッチ１４、画像合成部１５を経てＤ／Ａ変換器１６
に供給されてアナログ信号に再変換される。その後、モ
ニタ４に供給されて相手側の画像かモニタ４上に映出さ
れる。切換スイッチ１４の切換は操作部５によって指示
される。On the other hand, the image information of the other party transmitted through the public line 1 is supplied to the A/D converter 21 via the changeover switch 19, converted into a digital signal, and then stored in the image memory 22. The image information read from the image memory 22 is sent to the D/A converter 16 via the changeover switch 14 and the image composition section 15.
The signal is then supplied to the system and reconverted into an analog signal. Thereafter, the image is supplied to the monitor 4 and the image of the other party is displayed on the monitor 4. Switching of the changeover switch 14 is instructed by the operation unit 5.

受信した音声信号は受話器２で再生されると共に、Ａ／
Ｄ変換器２３によってディジタル信号に変換されたのち
、音声メモυ２４にストア１７きれる。The received audio signal is played back on the handset 2, and the A/
After being converted into a digital signal by the D converter 23, it is stored 17 in the voice memo υ24.

音声メモリ２４よりリードされた音声信号は音声認識部
３０に供給されて、音声の認識処理が実行されると共に
、その認識結果か文字コード変換用のメモリ２５に供給
されて、音声情報が文字情報に変換される。The voice signal read from the voice memory 24 is supplied to the voice recognition unit 30, where voice recognition processing is executed, and the recognition result is also supplied to the memory 25 for character code conversion, where the voice information is converted into character information. is converted to

この文字情報が画像合成部１５て自画像若しくは受信画
像に合成されてモニタ４に供給される。This character information is combined with the self-portrait or the received image by the image combining section 15 and supplied to the monitor 4.

これで、受信した音声信号か文字情報としてモニタ４上
に映し出される。The received audio signal or text information is then displayed on the monitor 4.

音声認識部３０は第２図に示すように構成することがで
きる。The speech recognition section 30 can be configured as shown in FIG.

同図において、入力端子３１に供給された音声信号のデ
ータは音響分析部３２で周波数分析されて、音声のスペ
クトル分析の変化などの様体が調べられる。その分析結
果は情報圧縮部３３で時間軸の正規化か行なわれて発音
速度の変化によるバラツキが吸収される。これらの処理
を行なうことによって、不特定の人物に対し音声認識を
行なうことができる。In the figure, audio signal data supplied to an input terminal 31 is subjected to frequency analysis by an acoustic analysis section 32, and changes in the audio spectrum analysis are examined. The analysis results are normalized on the time axis in the information compression section 33 to absorb variations due to changes in the sounding speed. By performing these processes, voice recognition can be performed for an unspecified person.

一方、音素（音節）学習部３５には、通話の始めに予め
送信者の音声データが幾つか入力されている。そして、
情報圧縮部３３から送られてくる音声データ（前影情報
）が、音素学習部３５からのデータに基づいて音素認識
部３４において音素認識処理、具体的には音韻情報の抽
出処理が実行される。On the other hand, the phoneme (syllable) learning section 35 has some voice data of the sender inputted in advance at the beginning of the call. and,
The audio data (foreshadow information) sent from the information compression unit 33 is subjected to phoneme recognition processing, specifically phoneme information extraction processing, in the phoneme recognition unit 34 based on data from the phoneme learning unit 35. .

音素学習部３５と音素認識部３４とを使用すれば、２度
目以降に同じ前影情報が入力された場合、即座にその音
韻を抽出することができる。By using the phoneme learning section 35 and the phoneme recognition section 34, when the same foreground information is input a second time or later, the phoneme can be extracted immediately.

抽出された音韻情報が単語（文節）認識部３６で、単語
辞書部３７からの単語情報と照合されなから単語の認識
処理か実行きれ、認識された単語情報が出力端子３８に
得られる。この単語情報か文字コードメモリ２５のアド
レスとして供給されることによって、単語情報が文字を
表現するための文字情報（文字コート）に変換される。The extracted phoneme information is compared with the word information from the word dictionary section 37 in the word (clause) recognition section 36, and the word recognition process is completed, and the recognized word information is obtained at the output terminal 38. By supplying this word information as an address to the character code memory 25, the word information is converted into character information (character code) for expressing characters.

このような音声認識処理は制御部６に格納された音声処
理用の制御プログラムに基づいて処理され、また、操作
部５からの指令で、自画像若しくは受信画像に、この文
字情報が合成されてモニタ４上に映し出される。Such voice recognition processing is performed based on a control program for voice processing stored in the control unit 6, and in response to a command from the operation unit 5, this text information is synthesized with the self-portrait or received image and displayed on the monitor. 4 will be displayed on the screen.

文字の画面上における表示位置や、縦書、横書、書体な
どの指定は何れも操作部５で指定できるものとする。文
字のみの表示や画像のみの表示も、勿論操作部５で指定
できる。It is assumed that the display position of characters on the screen, vertical writing, horizontal writing, font, etc. can all be specified using the operation unit 5. Of course, displaying only characters or displaying only images can also be specified using the operation unit 5.

［発明の効果］以上説明したように、この発明の構成によれば、受信し
た音声を文字としても画面に表示することができる。[Effects of the Invention] As explained above, according to the configuration of the present invention, received voice can also be displayed on the screen as text.

したがって、受信環境か悪いところや、受信者かｉ！ｆ
［Ｉ！の場合ても、相手の伝達意思を容易、かっ確実に
把握できる特徴を有する。Therefore, the receiving environment or the recipient's i! f
[I! Even in this case, it has the characteristic of being able to easily and clearly grasp the other party's intention to communicate.

[Brief explanation of drawings]

第１図はこの発明の基本構成図、第２図は音声認識部の系統図である。１４゜１　・２　・３　・４　・５　・６　・１３、　２２　　・１８、　１９　　・２４　・２５　・３０　・３２　・３３　・３４　・・公衆回線・受話器・ビデオカメラ・モニタ・操作部・制御部・画像メモリ・切換スイッチ・音声メモリ・文字コード変換部・音声認識部・音響分析部・情報圧縮部・音素認識部３５　・　・・音素学習部３６　・　・・単語認識部３７　・　・・単語辞書部 Figure 1 is a basic configuration diagram of this invention. Figure 2 is audio It is a system diagram of a recognition part. 14° 1・ 2・ 3・ 4・ 5・ 6・ 13, 22 ・ 18, 19 ・ 24・ 25・ 30・ 32・ 33・ 34・・Public line ・Telephone receiver ·Video camera ·monitor ・Operation unit ・Control unit ・Image memory ・Choice switch ・Voice memory ・Character code conversion section ・Speech recognition section ・Acoustic analysis department ・Information compression section ・Phoneme recognition section 35　・・・Phoneme learning department 36　・・・Word recognition part 37・・・Word dictionary section

Claims

[Claims]

(1) A videophone device configured to transmit and receive audio and image information using telephone lines, respectively, is provided with a voice recognition section and a character code conversion section, and the voice recognition section records the voice of the recipient. and a means for recognizing the voice by comparing and referring to the recorded voice based on the voice data inputted in advance, and the recognized voice is converted into character information in the character code conversion section, and the converted character is converted into character information. 1. A video telephone device characterized in that information is supplied to a display section and the text information is inserted into a part of an image.