JP2015220684A

JP2015220684A - Portable terminal equipment and lip reading processing program

Info

Publication number: JP2015220684A
Application number: JP2014104624A
Authority: JP
Inventors: 正永中村; Masanaga Nakamura
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2014-05-20
Filing date: 2014-05-20
Publication date: 2015-12-07

Abstract

PROBLEM TO BE SOLVED: To provide portable terminal equipment which can be used under a noise environment, and which can be mounted in a non-troublesome manner.SOLUTION: Portable terminal equipment includes: a communication module for transmitting/receiving information with the outside; a microphone; a display part for displaying various information; an imaging device; a control part for, when the volume of a sound to be input to the microphone is equal to or more than a predetermined threshold, allowing the display part to display a screen for selecting a lip reading speech mode at the time of receiving a call, and for, when the lip reading speech mode is selected, switching a speech mode to the lip reading speech mode; and a lip reading processing part for detecting the shape of the lip of a speaker from an image captured by an imaging device, and for converting it into at least one of the voice data and text data of words. The communication module is configured to, when the speech mode is switched to the lip reading speech mode, transmit at least one of the voice data and text data converted by the lip reading processing part to the outside.

Description

本発明は、携帯端末装置および読唇処理プログラムに関する。 The present invention relates to a portable terminal device and a lip reading processing program.

携帯電話やスマートフォンのように、ユーザに携帯されて様々な場所で使用される携帯端末装置が知られている。このような携帯端末装置を騒音環境下で用いる場合、ユーザの声を集音するために、たとえば骨伝導マイクが利用される（特許文献１参照）。 Background Art Mobile terminal devices that are carried by users and used in various places, such as mobile phones and smartphones, are known. When such a portable terminal device is used in a noisy environment, for example, a bone conduction microphone is used to collect the user's voice (see Patent Document 1).

特開２００７−２４３５９１号公報JP 2007-243591 A

しかし、骨伝導マイクは、ユーザに接触させて使用するため、装着が煩わしい。 However, since the bone conduction microphone is used in contact with the user, it is troublesome to wear.

（１）請求項１の発明による携帯端末装置は、外部との情報の送受信を行う通信モジュールと、マイクロフォンと、各種の情報を表示する表示部と、撮像装置と、マイクロフォンに入力される音の大きさがあらかじめ定められた閾値以上である場合には、着信時に読唇通話モードを選択する画面を表示部に表示させ、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える制御部と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理部とを備え、通信モジュールは、読唇通話モードに切り替えられると、読唇処理部で変換した音声データまたはテキストデータの少なくとも一方を外部に送信する。
（２）請求項２の発明による携帯端末装置は、外部との情報の送受信を行う通信モジュールと、マイクロフォンと、イヤホンを接続するイヤホンジャックと、イヤホンジャックにイヤホンが接続されたことを検出するイヤホン挿入検出端子と、各種の情報を表示する表示部と、撮像装置と、イヤホン挿入検出端子でイヤホンジャックにイヤホンが接続されたことを検出され、かつ、着信があった場合には、読唇通話モードを選択する画面を表示部に表示させ、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える制御部と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理部とを備え、通信モジュールは、読唇通話モードに切り替えられると、読唇処理部で変換した音声データまたはテキストデータの少なくとも一方を外部に送信する。
（３）請求項３の発明による携帯端末装置は、外部との情報の送受信を行う通信モジュールと、マイクロフォンと、各種の情報を表示する表示部と、撮像装置と、着信音の出力が禁止されるように設定され、かつ、着信があった場合には、読唇通話モードを選択する画面を表示部に表示させ、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える制御部と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理部とを備え、通信モジュールは、読唇通話モードに切り替えられると、読唇処理部で変換した音声データまたはテキストデータの少なくとも一方を外部に送信する。
（４）請求項４の発明による携帯端末装置は、外部との情報の送受信を行う通信モジュールと、マイクロフォンと、各種の情報を表示する表示部と、少なくともユーザの目元を撮像する第１の撮像装置と、第１の撮像装置とは異なる第２の撮像装置と、第１の撮像装置で撮像して得られた第１の画像に基づいて、ユーザの視線を検出する第１の視線検出部と、第２の撮像装置で撮像して得られた第２の画像に基づいて、第２の画像中の人物の視線を検出する第２の視線検出部と、第１の視線検出部での検出結果、および、第２の視線検出部での検出結果に基づいて、ユーザと第２の画像中の人物とが視線を合わせているか否かを判断する視線判断部と、第２の画像から第２の画像中の人物の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理を行う読唇処理部と、視線判断部でユーザと第２の画像中の人物とが視線を合わせていると判断されると、読唇処理部による読唇処理の開始の許可を申請する申請部と、読唇処理の開始の許可が得られたか否かを判断する許可判断部と、許可判断部で許可が得られたと判断されると、読唇処理部に読唇処理を開始させる制御部とを備える。
（５）請求項６の発明による読唇処理プログラムは、コンピュータに、マイクロフォンに入力される音の大きさがあらかじめ定められた閾値以上である場合には、着信時に読唇通話モードを選択する画面を表示部に表示させる表示手順と、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える通話モード切替手順と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理手順と、読唇通話モードに切り替えられると、変換した音声データまたはテキストデータの少なくとも一方を外部に送信する送信手順とを実行させる。
（６）請求項７の発明による読唇処理プログラムは、コンピュータに、イヤホンジャックにイヤホンが接続されたことを検出され、かつ、着信があった場合には、読唇通話モードを選択する画面を表示部に表示させる表示手順と、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える通話モード切替手順と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理手順と、読唇通話モードに切り替えられると、変換した音声データまたはテキストデータの少なくとも一方を外部に送信する送信手順とを実行させる。
（７）請求項８の発明による読唇処理プログラムは、コンピュータに、着信音の出力が禁止されるように設定され、かつ、着信があった場合には、読唇通話モードを選択する画面を表示部に表示させる表示手順と、読唇通話モードが選択されると通話モードを読唇通話モードに切り替える通話モード切替手順と、読唇通話モードに切り替えられると、撮像装置で撮像して得られた画像から話者の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理手順と、読唇通話モードに切り替えられると、変換した音声データまたはテキストデータの少なくとも一方を外部に送信する送信手順とを実行させる。
（８）請求項９の発明による読唇処理プログラムは、コンピュータに、少なくともユーザの目元を撮像する第１の撮像装置で撮像する第１の撮像手順と、第１の撮像装置とは異なる第２の撮像装置で撮像する第２の撮像手順と、第１の撮像装置で撮像して得られた第１の画像に基づいて、ユーザの視線を検出する第１の視線検出手順と、第２の撮像装置で撮像して得られた第２の画像に基づいて、第２の画像中の人物の視線を検出する第２の視線検出手順と、第１の視線検出手順での検出結果、および、第２の視線検出手順での検出結果に基づいて、ユーザと第２の画像中の人物とが視線を合わせているか否かを判断する視線判断手順と、視線判断手順でユーザと第２の画像中の人物とが視線を合わせていると判断されると、読唇処理の開始の許可を申請する申請手順と、読唇処理の開始の許可が得られたか否かを判断する許可判断手順と、許可判断手順で許可が得られたと判断されると、第２の画像から第２の画像中の人物の唇の形状を検出して言葉の音声データおよびテキストデータの少なくとも一方に変換する読唇処理を行う読唇処理手順とを実行させる。 (1) A portable terminal device according to a first aspect of the present invention includes a communication module that transmits / receives information to / from the outside, a microphone, a display unit that displays various types of information, an imaging device, and a sound input to the microphone. When the size is equal to or larger than a predetermined threshold, a screen for selecting the lip reading call mode is displayed on the display unit when an incoming call is received, and when the lip reading call mode is selected, the control unit switches the call mode to the lip reading call mode; A lip reading processing unit that, when switched to the lip reading call mode, detects the shape of the speaker's lips from an image captured by the imaging device and converts it into at least one of speech data and text data of words, When the communication module is switched to the lip reading call mode, the communication module transmits at least one of voice data or text data converted by the lip reading processing unit to the outside. The
(2) A mobile terminal device according to a second aspect of the present invention is a communication module that transmits and receives information to and from the outside, a microphone, an earphone jack that connects the earphone, and an earphone that detects that the earphone is connected to the earphone jack. When the insertion detection terminal, the display unit for displaying various information, the imaging device, and the earphone insertion detection terminal detect that the earphone is connected to the earphone jack and there is an incoming call, the lip reading call mode From the image obtained by taking an image with the imaging device when the lip reading call mode is selected, and the control unit that switches the call mode to the lip reading call mode when the lip reading call mode is selected. A lip reading processing unit that detects the shape of the speaker's lips and converts it to at least one of speech data and text data of words, and communicates When the module is switched to the lip reading call mode, the module transmits at least one of voice data and text data converted by the lip reading processing unit to the outside.
(3) The portable terminal device according to the invention of claim 3 is prohibited from outputting a ringtone, a communication module that transmits / receives information to / from the outside, a microphone, a display unit that displays various types of information, an imaging device, and the like. When the incoming call is received, a screen for selecting the lip reading call mode is displayed on the display unit, and when the lip reading call mode is selected, the control unit switches the call mode to the lip reading call mode; A lip reading processing unit that detects the shape of a speaker's lips from an image captured by an imaging device and converts it into at least one of speech data and text data when switched to the lip reading call mode, When the module is switched to the lip reading call mode, the module transmits at least one of voice data and text data converted by the lip reading processing unit to the outside.
(4) A portable terminal device according to a fourth aspect of the invention is a first imaging that images at least a user's eye, a communication module that transmits and receives information to and from the outside, a microphone, a display unit that displays various types of information, and the like. A first gaze detection unit that detects the gaze of the user based on the first image obtained by imaging the apparatus, a second imaging device different from the first imaging device, and the first imaging device And a second line-of-sight detection unit that detects the line of sight of a person in the second image based on the second image obtained by imaging with the second imaging device, and a first line-of-sight detection unit From the second image, a line-of-sight determination unit that determines whether the user and the person in the second image are in line of sight based on the detection result and the detection result in the second line-of-sight detection unit Detecting the shape of the lips of the person in the second image, voice data of the words and text When the lip reading processing unit that performs lip reading processing that converts data into at least one of the image data and the gaze determination unit determines that the user and the person in the second image are in line of sight, the lip reading processing unit starts the lip reading processing The lip reading processing section is applied to the lip reading processing section when it is determined that the permission is obtained by the application section for applying for permission, the permission determining section for determining whether permission to start the lip reading processing is obtained, and the permission determining section. A control unit to be started.
(5) The lip reading processing program according to the invention of claim 6 displays on the computer a screen for selecting the lip reading call mode when an incoming call is received when the volume of the sound input to the microphone is equal to or greater than a predetermined threshold. The display procedure to be displayed on the screen, the call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected, and the conversation from the image obtained by the imaging device when switched to the lip reading call mode Lip reading processing procedure that detects the shape of the person's lips and converts them into at least one of speech data and text data, and when switched to the lip reading mode, at least one of the converted speech data or text data is transmitted to the outside The transmission procedure is executed.
(6) The lip reading processing program according to the invention of claim 7 displays a screen for selecting the lip reading call mode when the computer detects that the earphone is connected to the earphone jack and there is an incoming call. The display procedure to be displayed on the screen, the call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected, and the speaker obtained from the image obtained by the imaging device when the lip reading call mode is switched to Lip reading processing procedure that detects the shape of the lips and converts it to at least one of speech data and text data, and transmission that transmits at least one of the converted speech data or text data to the outside when switched to the lip reading call mode And execute the procedure.
(7) The lip reading processing program according to the invention of claim 8 is configured to display a screen for selecting a lip reading call mode when the computer is set to prohibit the output of a ringtone and when there is an incoming call. The display procedure to be displayed on the screen, the call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected, and the speaker obtained from the image obtained by the imaging device when the lip reading call mode is switched to Lip reading processing procedure that detects the shape of the lips and converts it to at least one of speech data and text data, and transmission that transmits at least one of the converted speech data or text data to the outside when switched to the lip reading call mode And execute the procedure.
(8) According to a ninth aspect of the present invention, there is provided a lip reading processing program according to a first imaging procedure in which at least a first imaging device that images at least a user's eyes is captured on a computer, and a second imaging device different from the first imaging device. A second imaging procedure for imaging by the imaging device, a first gaze detection procedure for detecting a user's gaze based on a first image obtained by imaging by the first imaging device, and a second imaging A second gaze detection procedure for detecting a gaze of a person in the second image, a detection result in the first gaze detection procedure, and a second A line-of-sight determination procedure for determining whether or not the user and the person in the second image are in line of sight based on the detection result in the line-of-sight detection procedure, and the user and the second image in the line-of-sight determination procedure When it is determined that the person is in line of sight, the lip reading process starts If it is determined that the permission has been obtained in the permission procedure, the permission determination procedure for determining whether permission to start the lip reading process has been obtained, A lip reading processing procedure for detecting the shape of the lips of the person in the image and performing lip reading processing for converting into at least one of speech data and text data of words.

本発明によれば、煩わしさがない。 According to the present invention, there is no inconvenience.

第１の実施の形態の携帯端末装置のブロック構成図である。It is a block block diagram of the portable terminal device of 1st Embodiment. 第１の実施の形態の携帯端末装置における着信時の動作についてのフローチャートである。It is a flowchart about the operation | movement at the time of the incoming call in the portable terminal device of 1st Embodiment. 図２に示したフローチャートにおけるステップＳ２００のサブルーチンを示すフローチャートである。It is a flowchart which shows the subroutine of step S200 in the flowchart shown in FIG. 第２の実施の形態の携帯端末装置の外観斜視図である。It is an external appearance perspective view of the portable terminal device of 2nd Embodiment. 携帯端末装置の全体構成図である。It is a whole block diagram of a portable terminal device. 第２の実施の形態の読唇処理の動作についてのフローチャートである。It is a flowchart about the operation | movement of the lip reading process of 2nd Embodiment. 申請信号を受信した際に、許可または拒否する処理の動作についてのフローチャートである。It is a flowchart about the operation | movement of the process permitted or denied when an application signal is received. 変形例を示す図である。It is a figure which shows a modification. 変形例を示す図である。It is a figure which shows a modification.

−−−第１の実施の形態−−−
図１〜３を参照して、本発明による携帯端末装置および読唇処理プログラムの第１の実施の形態を説明する。図１は、第１の実施の形態の携帯端末装置のブロック構成図である。図１に例示する携帯端末装置１００は、たとえば携帯電話、スマートフォンなどの携帯可能な情報端末である。携帯端末装置１００は、制御部１０１と、記憶部１０２と、操作部１０３と、送受信部１０４と、呼び出し部１０５と、表示部１０６と、スピーカ１０７と、マイク１０８と、撮像部１１０と、読唇処理部１２０とを備える。 --- First embodiment ---
With reference to FIGS. 1-3, 1st Embodiment of the portable terminal device by this invention and the lip reading processing program is described. FIG. 1 is a block configuration diagram of the mobile terminal device according to the first embodiment. A mobile terminal device 100 illustrated in FIG. 1 is a portable information terminal such as a mobile phone or a smartphone. The mobile terminal device 100 includes a control unit 101, a storage unit 102, an operation unit 103, a transmission / reception unit 104, a calling unit 105, a display unit 106, a speaker 107, a microphone 108, an imaging unit 110, and lip reading And a processing unit 120.

携帯端末装置１００は、たとえばメール機能、ウェブブラウザ機能、音声通話機能、撮影機能などを有する。制御部１０１は、不図示のＣＰＵ、ＲＡＭ、ＲＯＭなどを有し、ＲＯＭに記憶されたプログラムを実行して携帯端末装置１００全体を制御することにより、これらの機能を実現する。 The mobile terminal device 100 has, for example, a mail function, a web browser function, a voice call function, a photographing function, and the like. The control unit 101 includes a CPU, a RAM, a ROM, and the like (not shown), and implements these functions by executing a program stored in the ROM and controlling the mobile terminal device 100 as a whole.

記憶部１０２は、フラッシュメモリなどで構成され、この記憶部１０２には、通話相手の電話番号および名前などが記録される電話帳、送信相手のメールアドレスおよび名前などが記録されるアドレス帳、表示部１０６に文字を表示させるための文字データなどが記憶されている。 The storage unit 102 is configured by a flash memory or the like, and in this storage unit 102, a phone book in which the telephone number and name of the other party are recorded, an address book in which the mail address and name of the other party are recorded, and display Character data for displaying characters on the unit 106 is stored.

操作部１０３は、複数の各種操作キーを有し、電話送信、電話受信、メール送信、メール受信、その他各種機能を実行させるためのキー操作を行うためのものである。 The operation unit 103 has a plurality of various operation keys, and performs key operations for executing telephone transmission, telephone reception, mail transmission, mail reception, and other various functions.

送受信部１０４は、アンテナ（不図示）を介して受信した通信信号（通話信号）を復調して、復調して得られた多重化されたデジタル信号から携帯端末装置１００宛のデジタル信号を抽出してアナログ信号に変換してスピーカ１０７に供給する。また、送受信部１０４は、マイク１０８からの音声信号をデジタル信号に変換して、このデジタル信号を多重化して搬送波を変調して送信する。また、送受信部１０４は、通信信号がメール信号である場合は、所定のメール信号処理により、メール信号の送信およびメール信号の受信を行う。呼び出し部１０５は、着信時の呼び出しを、音声で知らせたり、振動で知らせたり、あるいは光で知らせたりするためのものである。 The transceiver 104 demodulates a communication signal (call signal) received via an antenna (not shown), and extracts a digital signal addressed to the mobile terminal device 100 from the multiplexed digital signal obtained by the demodulation. Are converted into analog signals and supplied to the speaker 107. The transmission / reception unit 104 converts the audio signal from the microphone 108 into a digital signal, multiplexes the digital signal, modulates the carrier wave, and transmits the modulated signal. Further, when the communication signal is a mail signal, the transmission / reception unit 104 transmits the mail signal and receives the mail signal by predetermined mail signal processing. The calling unit 105 is for notifying the incoming call by voice, by vibration, or by light.

表示部１０６は、たとえば液晶モニタで構成され、携帯端末装置１００が各種機能を実施する際に各種画面を表示する。たとえば、メール機能を実施するときは、送信メール編集画面、受信メール表示画面などが表示部１０６に表示される。また、ウェブブラウザ機能を実施するときは、ウェブページ閲覧画面などが表示部１０６に表示される。 The display unit 106 is composed of a liquid crystal monitor, for example, and displays various screens when the mobile terminal device 100 performs various functions. For example, when the mail function is performed, a transmitted mail editing screen, a received mail display screen, and the like are displayed on the display unit 106. When the web browser function is performed, a web page browsing screen or the like is displayed on the display unit 106.

スピーカ１０７は、音声通話時の音声を出力するスピーカである。マイク１０８は、音声通話時に使用されるマイクロフォンである。撮像部１１０は、撮像素子１１１、撮像光学系および信号処理部を有する撮像部であり、撮像素子１１１で撮像した被写体像の静止画像や動画像の画像データを出力する。 The speaker 107 is a speaker that outputs sound during a voice call. The microphone 108 is a microphone used during a voice call. The imaging unit 110 is an imaging unit having an imaging element 111, an imaging optical system, and a signal processing unit, and outputs still image of a subject image captured by the imaging element 111 and image data of a moving image.

読唇処理部１２０は、撮像部１１０から出力される動画像の画像データ（動画像データ）から、被写体の人物の唇の形状や動きを画像解析することで被写体の人物が発した言葉を認識して、認識結果を出力する処理部である。本実施の形態では、読唇処理部１２０は、被写体の人物が発した言葉の認識結果を音声データとして生成して出力する。なお、読唇処理部１２０は、記憶部１０２に記憶されているプログラムを制御部１０１の不図示のＣＰＵが実行することによって実現される。 The lip reading processing unit 120 recognizes words uttered by the subject person from the image data (moving image data) of the moving image output from the imaging unit 110 by analyzing the shape and movement of the lip of the subject person. And a processing unit that outputs a recognition result. In the present embodiment, the lip reading processing unit 120 generates and outputs a speech recognition result of words uttered by the subject person. The lip reading processing unit 120 is realized by a CPU (not shown) of the control unit 101 executing a program stored in the storage unit 102.

このように構成される携帯端末装置１００では、不図示の携帯電話通信網を介して、他の携帯端末装置や固定電話等との音声通話が可能である。また、携帯端末装置１００では、通話中に撮像部１１０で撮像した得られた画像におけるユーザの唇の形や動きに基づいて読唇処理を行うことでユーザの発した言葉を認識して、認識した言葉の音声データを不図示の携帯電話通信網へ出力できる。 The mobile terminal device 100 configured as described above can make a voice call with another mobile terminal device, a fixed phone, or the like via a mobile phone communication network (not shown). In addition, the mobile terminal device 100 recognizes and recognizes the words uttered by the user by performing lip reading processing based on the shape and movement of the user's lips in the obtained image captured by the imaging unit 110 during a call. Voice data of words can be output to a mobile phone communication network (not shown).

本実施の形態の携帯端末装置１００では、音声通話を行う際の設定モードとして、通常通話モードと、読唇通話モードとを備えている。通常通話モードとは、従来の携帯電話やスマートフォンにおける音声通話と同じ動作を行う設定モードである。すなわち、通常通話モードでは、制御部１０１は、ユーザが発する音声をマイク１０８で集音し、送受信部１０４でマイク１０８からの音声信号をデジタル信号に変換して、このデジタル信号を多重化して搬送波を変調して送信するように各部を制御する。 The mobile terminal device 100 according to the present embodiment includes a normal call mode and a lip reading call mode as setting modes for performing a voice call. The normal call mode is a setting mode that performs the same operation as a voice call in a conventional mobile phone or smartphone. That is, in the normal call mode, the control unit 101 collects the voice uttered by the user with the microphone 108, converts the voice signal from the microphone 108 into a digital signal with the transmission / reception unit 104, multiplexes the digital signal, and transmits the carrier wave Each unit is controlled to modulate and transmit.

読唇通話モードとは、マイク１０８で集音した音声に代えて、撮像部１１０からの画像データに基づいて読唇処理部１２０が読唇処理を行って出力する音声データを不図示の携帯電話通信網へ出力する設定モードである。すなわち、読唇通話モードでは、制御部１０１は、撮像部１１０から出力される画像データに基づいて、読唇処理部１２０に読唇処理を行わせて、読唇処理部１２０から出力される音声データのデジタル信号を送受信部１０４で多重化して搬送波を変調して送信するように各部を制御する。 The lip reading call mode refers to voice data output by the lip reading processing unit 120 performing lip reading processing based on image data from the imaging unit 110 instead of the voice collected by the microphone 108 to a mobile phone communication network (not shown). Setting mode to output. That is, in the lip reading call mode, the control unit 101 causes the lip reading processing unit 120 to perform lip reading processing based on the image data output from the imaging unit 110, and the digital signal of the audio data output from the lip reading processing unit 120. Are transmitted and received by the transmission / reception unit 104, and each unit is controlled to modulate and transmit a carrier wave.

なお、いずれの通話モードに設定された場合であっても、制御部１０１は、アンテナ（不図示）を介して受信した通話信号を復調して、復調して得られた多重化されたデジタル信号から携帯端末装置１００宛のデジタル信号を抽出してアナログ信号に変換してスピーカ１０７に供給するように送受信部１０４を制御する。 Note that, regardless of which call mode is set, the control unit 101 demodulates a call signal received via an antenna (not shown), and a multiplexed digital signal obtained by demodulation. The transmission / reception unit 104 is controlled so that a digital signal addressed to the mobile terminal device 100 is extracted, converted into an analog signal, and supplied to the speaker 107.

−−−フローチャート−−−
図２は、本実施の形態の携帯端末装置１００における着信時の動作についてのフローチャートである。本実施の形態の携帯端末装置１００では、着信があると図２に示す処理を行うプログラムが起動されて、制御部１０１で実行される。ステップＳ１０１において、マイク１０８からの信号を入力してステップＳ１０３へ進む。ステップＳ１０３において、ステップＳ１０１で入力したマイク１０８の信号に基づいて、携帯端末装置１００の周囲の騒音の大きさがあらかじめ定められた所定の閾値を超えるか否かを判断する。 --- Flow chart ---
FIG. 2 is a flowchart about the operation at the time of incoming call in portable terminal device 100 of the present embodiment. In mobile terminal device 100 of the present embodiment, when there is an incoming call, a program for performing the processing shown in FIG. 2 is started and executed by control unit 101. In step S101, a signal from the microphone 108 is input, and the process proceeds to step S103. In step S103, based on the signal of the microphone 108 input in step S101, it is determined whether or not the noise level around the mobile terminal device 100 exceeds a predetermined threshold value.

ステップＳ１０３が肯定判断されるとステップＳ１０５へ進み、着信がある旨の表示、および、読唇通話モードでの通話を開始するか否かを問い合わせる旨の表示を表示部１０６に表示させてステップＳ１０７へ進む。ステップＳ１０７において、通話開始の操作入力があるまで待機する。 If an affirmative determination is made in step S103, the process proceeds to step S105, where a display indicating that there is an incoming call and a display indicating whether or not to start a call in the lip reading mode are displayed on the display unit 106, and then the process proceeds to step S107. move on. In step S107, the process waits until an operation input for starting a call is received.

ステップＳ１０７で読唇通話モードでの通話を開始するように操作されるとステップＳ２００のサブルーチンへ進み、通話モードを読唇通話モードに設定し、読唇通話モードでの通話処理を開始してステップＳ１０９ヘ進む。ステップＳ１０９において、終話操作が行われたか否かを判断する。ステップＳ１０９が否定判断されるとステップＳ２００へ戻る。ステップＳ１０９が判断されるとステップＳ１１1へ進み、公知の終話信号を送受信部１０４から出力させて、本プログラムを終了する。 If it is operated in step S107 to start a call in the lip reading call mode, the process proceeds to a subroutine in step S200, the call mode is set to the lip reading call mode, a call process in the lip reading call mode is started, and the process proceeds to step S109. . In step S109, it is determined whether or not an end call operation has been performed. If a negative determination is made in step S109, the process returns to step S200. If step S109 is judged, it will progress to step S111, will output a well-known end signal from the transmission / reception part 104, and will complete | finish this program.

ステップＳ１０３が否定判断されるとステップＳ１１３へ進み、着信がある旨の表示を表示部１０６に表示させてステップＳ１１５へ進む。ステップＳ１１５において、通話開始の操作入力があるまで待機する。ステップＳ１１５で通話を開始するように操作されるとステップＳ３００のサブルーチンへ進み、通話モードを通常通話モードに設定し、通常通話モードでの通話処理を開始してステップＳ１１７ヘ進む。ステップＳ１１７において、終話操作が行われたか否かを判断する。ステップＳ１１７が否定判断されるとステップＳ３００へ戻る。ステップＳ１１７が判断されるとステップＳ１１１へ進む。 If a negative determination is made in step S103, the process proceeds to step S113, a display indicating that there is an incoming call is displayed on the display unit 106, and the process proceeds to step S115. In step S115, the process waits until an operation input for starting a call is received. When the operation is started to start the call in step S115, the process proceeds to a subroutine of step S300, the call mode is set to the normal call mode, the call process in the normal call mode is started, and the process proceeds to step S117. In step S117, it is determined whether or not an end-of-call operation has been performed. If a negative determination is made in step S117, the process returns to step S300. If step S117 is determined, the process proceeds to step S111.

ステップＳ１０７で通常通話モードでの通話を開始するように操作されるとステップＳ３００のサブルーチンへ進む。 If it is operated in step S107 to start a call in the normal call mode, the process proceeds to a subroutine of step S300.

図３は、図２に示したフローチャートにおけるステップＳ２００のサブルーチンを示す図である。ステップＳ３０１において、撮像素子１１１での被写体像の撮像を開始して動画像を取得するように撮像部１１０を制御してステップＳ３０３へ進む。ステップＳ３０３において、撮像部１１０からの動画像データに基づく読唇処理を開始するよう読唇処理部１２０を制御してステップＳ３０５へ進む。 FIG. 3 is a diagram showing a subroutine of step S200 in the flowchart shown in FIG. In step S301, the imaging unit 110 is controlled to start capturing a subject image with the imaging element 111 and acquire a moving image, and the process proceeds to step S303. In step S303, the lip reading processing unit 120 is controlled to start the lip reading processing based on the moving image data from the imaging unit 110, and the process proceeds to step S305.

ステップＳ３０５において、読唇処理部１２０から出力される音声データ、すなわち、被写体の人物が発した言葉の認識結果としての音声データを送受信部１０４で多重化して搬送波を変調して送信するように各部を制御して、メインルーチンのステップＳ１０９へ進む。 In step S305, the audio data output from the lip reading processing unit 120, that is, the audio data as the recognition result of the words uttered by the person of the subject is multiplexed by the transmission / reception unit 104, and the carrier wave is modulated and transmitted. Control proceeds to step S109 of the main routine.

このように構成される携帯端末装置１００では、着信があったときにマイク１０８で集音した周囲の音の大きさに基づいて、周囲の騒音が大きいか否かを制御部１０１が判断する（ステップＳ１０１，Ｓ１０３）。そして、周囲の騒音の大きさがあらかじめ定められた所定の閾値を超えると判断されると（ステップＳ１０３肯定判断）、制御部１０１は、着信がある旨の表示、および、読唇通話モードでの通話を開始するか否かを問い合わせる旨の表示を表示部１０６に表示させる（ステップＳ１０５）。 In the mobile terminal device 100 configured as described above, the control unit 101 determines whether or not the ambient noise is large based on the volume of the ambient sound collected by the microphone 108 when an incoming call is received ( Steps S101 and S103). When it is determined that the ambient noise level exceeds a predetermined threshold value (Yes in step S103), the control unit 101 displays a message indicating that there is an incoming call and calls in the lip reading mode. Is displayed on the display unit 106 (step S105).

ここで、ステップＳ１０３における所定の閾値は、たとえば８０ｄＢ（Ａ）の音圧レベルである。なお本発明は、この値に限定されない。また、この閾値が聴感補正を考慮した値でなくてもよい。 Here, the predetermined threshold value in step S103 is, for example, a sound pressure level of 80 dB (A). The present invention is not limited to this value. Further, this threshold value may not be a value considering auditory sense correction.

携帯端末装置１００のユーザが、操作部１０３を操作することで読唇通話モードでの通話を開始すると、通話モードが読唇通話モードに設定される。通話モードが読唇通話モードに設定されると、制御部１０１は、撮像素子１１１での被写体像の撮像を開始して動画像を取得するように撮像部１１０を制御する（ステップＳ３０１）とともに、撮像部１１０からの動画像データに基づく読唇処理が開始されるよう読唇処理部１２０を制御する（ステップＳ３０３）。 When the user of the mobile terminal device 100 starts a call in the lip reading call mode by operating the operation unit 103, the call mode is set to the lip reading call mode. When the call mode is set to the lip reading call mode, the control unit 101 controls the imaging unit 110 to start capturing a subject image with the imaging element 111 and acquire a moving image (step S301) and image capturing. The lip reading processing unit 120 is controlled to start the lip reading processing based on the moving image data from the unit 110 (step S303).

このとき、携帯端末装置１００のユーザが、自分（ユーザ）の顔が撮像素子１１１で撮像されるように携帯端末装置１００を顔から前方に離した位置で保持し、声を発せずとも言葉を発するように唇を動かせば、読唇処理部１２０が読唇処理を行って言葉を認識する。読唇処理部１２０は、認識した言葉の音声データのデジタル信号を生成して送受信部１０４に出力する。そして、送受信部１０４は、読唇処理部１２０から出力された音声データのデジタル信号を多重化して搬送波を変調して送信する（ステップＳ３０５）。 At this time, the user of the mobile terminal device 100 holds the mobile terminal device 100 at a position away from the face so that the face of the user (user) is imaged by the image sensor 111, and speaks without speaking. If the lips are moved so as to utter, the lip reading processing unit 120 performs the lip reading process to recognize the words. The lip reading processing unit 120 generates a digital signal of the voice data of the recognized word and outputs it to the transmission / reception unit 104. Then, the transmission / reception unit 104 multiplexes the digital signal of the audio data output from the lip reading processing unit 120, modulates the carrier wave, and transmits it (step S305).

これにより、携帯端末装置１００のユーザが、声を発せずとも言葉を発するように唇を動かせば、読唇処理部１２０で言葉が認識されて、音声データとして通話相手に送信される。通話相手側の携帯電話機や固定電話機では、携帯端末装置１００の読唇処理部１２０で生成された音声データが音声としてスピーカから出力される。 As a result, if the user of the mobile terminal device 100 moves his / her lips so that he / she speaks without speaking his / her voice, the lip reading processing unit 120 recognizes the word and transmits it as voice data to the other party. In the mobile phone or fixed phone on the other end of the call, the audio data generated by the lip reading processing unit 120 of the mobile terminal device 100 is output from the speaker as audio.

なお、通話相手側の携帯電話機や固定電話機からの音声は、携帯端末装置１００のスピーカ１０７から出力される。しかし、上述したように周囲の騒音が大きいことや、ユーザの顔が撮像素子１１１で撮像されるように携帯端末装置１００を顔から前方に離した位置で保持することから、不図示のイヤホンを用いることが望ましい。なお、携帯端末装置１００では、従来の携帯電話機やスマートフォンと同様に、不図示のイヤホンが接続されると、通話相手側の携帯電話機や固定電話機からの音声は、携帯端末装置１００のスピーカ１０７からではなく、接続されたイヤホンから出力される。 Note that the voice from the other party's mobile phone or fixed phone is output from the speaker 107 of the mobile terminal device 100. However, since the surrounding noise is large as described above, and the mobile terminal device 100 is held at a position away from the face so that the user's face is captured by the image sensor 111, an unillustrated earphone is attached. It is desirable to use it. Note that, in the mobile terminal device 100, as with a conventional mobile phone or smartphone, when an unillustrated earphone is connected, the sound from the mobile phone or fixed phone on the call partner side is transmitted from the speaker 107 of the mobile terminal device 100. Instead, it is output from the connected earphone.

第１の実施の形態の携帯端末装置１００では、次の作用効果を奏する。
（１）着信があったときにマイク１０８で集音した周囲の騒音の大きさがあらかじめ定められた所定の閾値を超えると判断されると、読唇通話モードでの通話を開始するか否かを問い合わせる旨の表示を表示部１０６に表示させるように構成した。そして、読唇通話モードでの通話を開始するように操作されると通話モードを読唇通話モードに設定し、読唇通話モードでの通話処理を開始するように構成した。これにより、周囲の騒音の影響によってユーザの発する音声を明瞭に集音できないような場合に、容易に読唇通話モードでの通話を開始できるので、利便性が高い。また、骨伝導マイクのような集音装置をユーザの身体に装着する必要がないので煩わしさがない。 The mobile terminal device 100 according to the first embodiment has the following operational effects.
(1) When it is determined that the magnitude of ambient noise collected by the microphone 108 when an incoming call is received exceeds a predetermined threshold value, whether or not to start a call in the lip reading call mode is determined. The display unit 106 is configured to display a display for inquiring. Then, when operated to start a call in the lip reading call mode, the call mode is set to the lip reading call mode, and the call processing in the lip reading call mode is started. Thereby, when the user's voice cannot be clearly collected due to the influence of ambient noise, a call in the lip reading call mode can be easily started, which is highly convenient. Moreover, since it is not necessary to attach a sound collecting device such as a bone conduction microphone to the user's body, there is no inconvenience.

（２）読唇処理部１２０で認識した言葉が音声データのデジタル信号として生成されて出力されるように構成した。これにより、読唇処理による認識結果を通話相手に音声で伝えることができる。したがって、たとえば携帯電話網や、通話相手の携帯電話機や固定電話機に特に変更を加える必要がないので、通話相手が限定されず、携帯端末装置１００の利便性が高い。 (2) The words recognized by the lip reading processing unit 120 are generated and output as digital signals of audio data. Thereby, the recognition result by the lip reading process can be conveyed to the other party by voice. Therefore, for example, since it is not necessary to make any particular changes to the mobile phone network, the mobile phone or the fixed phone of the call partner, the call partner is not limited, and the convenience of the mobile terminal device 100 is high.

（３）読唇処理を行うためのプログラムを記憶部１０２に記憶させて、制御部１０１で読唇処理を行うように構成した。これにより、たとえば、読唇処理機能を備えていない携帯端末装置に読唇処理を行うためのプログラムを記憶させることで、読唇処理を行えるので、携帯端末装置１００の利便性を向上できる。 (3) A program for performing lip reading processing is stored in the storage unit 102, and the control unit 101 performs lip reading processing. Thereby, for example, the lip reading process can be performed by storing a program for performing the lip reading process in a portable terminal device that does not have the lip reading processing function, and thus the convenience of the portable terminal device 100 can be improved.

−−−第２の実施の形態−−−
図４〜７を参照して、本発明による携帯端末装置および読唇処理プログラムの第２の実施の形態を説明する。以下の説明では、第１の実施の形態と同じ構成要素には同じ符号を付して相違点を主に説明する。特に説明しない点については、第１の実施の形態と同じである。本実施の形態では、主に、目の前にいる人物が発した言葉を読唇処理によって認識するように構成した点で、第１の実施の形態と異なる。 --- Second Embodiment ---
A second embodiment of the portable terminal device and the lip reading processing program according to the present invention will be described with reference to FIGS. In the following description, the same components as those in the first embodiment are denoted by the same reference numerals, and different points will be mainly described. Points that are not particularly described are the same as those in the first embodiment. This embodiment is different from the first embodiment in that it is configured to recognize words uttered by a person in front of the eyes by lip reading processing.

図４は、本実施の形態の携帯端末装置２００の外観斜視図である。携帯端末装置２００の眼鏡型のウェアラブル端末装置であり、第１撮像部２１０、第２撮像部２２０、投影部２３０、回路部２４０、操作部１０３、マイク１０８、イヤホン２０４を備えている。第１撮像部２１０は、携帯端末装置２００の前方、すなわち、携帯端末装置２００を装着したユーザの前方の被写体を撮像するための撮影装置であり、後述する撮像素子１１１ａと、撮像光学系２１２とを備えている。 FIG. 4 is an external perspective view of the mobile terminal device 200 of the present embodiment. It is a glasses-type wearable terminal device of the mobile terminal device 200, and includes a first imaging unit 210, a second imaging unit 220, a projection unit 230, a circuit unit 240, an operation unit 103, a microphone 108, and an earphone 204. The first imaging unit 210 is an imaging device for imaging a subject in front of the mobile terminal device 200, that is, in front of the user wearing the mobile terminal device 200, and includes an imaging element 111a and an imaging optical system 212, which will be described later. It has.

第１撮像部２１０は、携帯端末装置２００の前方の被写体からの被写体光が撮像光学系２１２に入射するように、たとえば、メガネ型フレーム２０１のテンプル部分（ツルの部分）の側面に設けられている。 The first imaging unit 210 is provided, for example, on the side surface of the temple portion (vine portion) of the glasses-type frame 201 so that subject light from a subject in front of the mobile terminal device 200 enters the imaging optical system 212. Yes.

第２撮像部２２０は、携帯端末装置２００のメガネ型フレーム２０１から延在するアーム２０２の先端に取り付けられていて、携帯端末装置２００を装着したユーザの口元および目元を撮影するための撮影装置であり、後述する撮像素子１１１ｂと、不図示の撮像光学系とを備えている。また、第２撮像部２２０には、マイク１０８も設けられている。 The second imaging unit 220 is attached to the tip of an arm 202 extending from the glasses-type frame 201 of the mobile terminal device 200, and is an imaging device for capturing the mouth and eyes of the user wearing the mobile terminal device 200. Yes, it includes an imaging element 111b described later and an imaging optical system (not shown). The second imaging unit 220 is also provided with a microphone 108.

投影部２３０は、眼鏡レンズ２０３に設けられたハーフミラー層２０３ａに映像を投影する装置であり、各種情報をハーフミラー層２０３ａに表示する。ユーザは、表示された情報と共に眼鏡レンズ２０３越しに前方を見ることができる。回路部２４０には、制御部１０１が設けられている。 The projection unit 230 is a device that projects an image on the half mirror layer 203a provided in the spectacle lens 203, and displays various information on the half mirror layer 203a. The user can view the front through the eyeglass lens 203 together with the displayed information. The circuit unit 240 is provided with a control unit 101.

メガネ型フレーム２０１のたとえばテンプル部分の側面には、ユーザが操作するための操作部１０３が設けられている。メガネ型フレーム２０１のテンプル部分には、イヤホン２０４が取り付けられている。 For example, on the side surface of the temple portion of the glasses-type frame 201, an operation unit 103 is provided for the user to operate. An earphone 204 is attached to the temple portion of the glasses-type frame 201.

たとえば、第１撮像部２１０が設けられたメガネ型フレーム２０１のテンプル部分側面の部位には、後述する赤外線通信部１３０の赤外線発光部１３１と赤外線受光部１３２とが設けられている。 For example, an infrared light emitting unit 131 and an infrared light receiving unit 132 of an infrared communication unit 130 to be described later are provided at a site on the side of the temple portion of the glasses-type frame 201 where the first imaging unit 210 is provided.

図５は、携帯端末装置２００の全体構成図である。携帯端末装置２００は、制御部１０１と、記憶部１０２と、操作部１０３と、送受信部１０４と、呼び出し部１０５と、投影部２３０と、イヤホン２０４と、マイク１０８と、撮像部１１０と、読唇処理部１２０と、赤外線通信部１３０と、視線検出部１４０とを備える。携帯端末装置２００では、撮像部１１０に上述した２つの撮像素子１１１ａ，１１１ｂが設けられている。赤外線通信部１３０は、周知の赤外線通信を行う通信部であり、上述した赤外線発光部１３１と赤外線受光部１３２とを有する。 FIG. 5 is an overall configuration diagram of the mobile terminal device 200. The mobile terminal device 200 includes a control unit 101, a storage unit 102, an operation unit 103, a transmission / reception unit 104, a calling unit 105, a projection unit 230, an earphone 204, a microphone 108, an imaging unit 110, and a lip reading A processing unit 120, an infrared communication unit 130, and a line-of-sight detection unit 140 are provided. In the mobile terminal device 200, the imaging unit 110 is provided with the two imaging elements 111a and 111b described above. The infrared communication unit 130 is a communication unit that performs known infrared communication, and includes the infrared light emitting unit 131 and the infrared light receiving unit 132 described above.

視線検出部１４０は、ユーザから視認し得る距離にいる相手の視線の方向や、ユーザの視線の方向を検出する検出部である。視線検出部１４０は、既知の視線検出手法により、第１撮像部２１０の撮像素子１１１ａで撮像した得られた画像から、第１撮像部２１０の撮像素子１１１ａで撮像した被写体、すなわち、ユーザから視認し得る距離にいる相手の視線の方向を検出する。また、視線検出部１４０は、既知の視線検出手法により、第２撮像部２２０の撮像素子１１１ｂで撮像した得られた画像から、ユーザの視線の方向を検出する。 The line-of-sight detection unit 140 is a detection unit that detects the direction of the line of sight of the other party at a distance that can be visually recognized by the user and the direction of the line of sight of the user. The line-of-sight detection unit 140 uses a known line-of-sight detection method to visually recognize a subject imaged by the imaging element 111a of the first imaging unit 210, that is, a user, from an image captured by the imaging element 111a of the first imaging unit 210. The direction of the line of sight of the opponent who is at a possible distance is detected. Further, the line-of-sight detection unit 140 detects the direction of the user's line of sight from the obtained image captured by the imaging element 111b of the second imaging unit 220 by a known line-of-sight detection method.

このように構成される携帯端末装置２００では、上述した第１の実施の形態の携帯端末装置１００と同様に、不図示の携帯電話通信網を介して、他の携帯端末装置や固定電話等との音声通話が可能である。携帯端末装置２００では、上述した第１の実施の形態の携帯端末装置１００と同様に、通話中に第２撮像部２２０の撮像素子１１１ｂで撮像した得られた画像におけるユーザの唇の形や動きに基づいて読唇処理を行うことでユーザの発した言葉を認識して、認識した言葉の音声データを不図示の携帯電話通信網へ出力できる。 In the mobile terminal device 200 configured as described above, in the same manner as the mobile terminal device 100 of the first embodiment described above, other mobile terminal devices, fixed telephones, and the like are connected via a mobile phone communication network (not shown). Voice calls are possible. In the mobile terminal device 200, as in the mobile terminal device 100 of the first embodiment described above, the shape and movement of the user's lips in the obtained image captured by the image sensor 111b of the second imaging unit 220 during a call. By performing the lip reading process based on the above, it is possible to recognize the words uttered by the user and output the voice data of the recognized words to a mobile phone communication network (not shown).

また、携帯端末装置２００では、視認し得る距離に存在する相手の発した言葉を認識して、認識した言葉を文字に変換して、眼鏡レンズ２０３のハーフミラー層２０３ａに表示できる。すなわち、本実施の形態の携帯端末装置２００では、携帯端末装置２００をそれぞれ装着した２人のユーザが、騒音が大きい環境下で、互いの視線が合った状態が数秒間続くと、第１撮像部２１０の撮像素子１１１ａで撮像して得られた被写体像に基づく読唇処理が開始される。携帯端末装置２００で行われる処理について、以下に詳細を説明する。 In addition, the mobile terminal device 200 can recognize words uttered by a partner existing at a visually recognizable distance, convert the recognized words into characters, and display them on the half mirror layer 203 a of the eyeglass lens 203. That is, in the mobile terminal device 200 according to the present embodiment, when two users respectively wearing the mobile terminal device 200 are in a noisy environment and in a state where their eyes are aligned with each other for several seconds, the first imaging is performed. Lip reading processing based on the subject image obtained by imaging with the imaging element 111a of the unit 210 is started. Details of processing performed in the mobile terminal device 200 will be described below.

以下の説明では、２人のユーザがそれぞれ携帯端末装置２００を装着し、お互いに視認し得る位置で向かい合っているものとする。説明の便宜上、２人のユーザをそれぞれユーザＡおよびユーザＢとする。ユーザＡが装着する携帯端末装置２００を携帯端末装置２００Ａとし、ユーザＢが装着する携帯端末装置２００を携帯端末装置２００Ｂとする。また、携帯端末装置２００の各部の説明について、携帯端末装置２００Ａと携帯端末装置２００Ｂとで区別をする必要がある場合、携帯端末装置２００Ａについては符号の末尾にＡを付し、携帯端末装置２００Ｂについては符号の末尾にＢを付して説明する。 In the following description, it is assumed that two users wear the mobile terminal device 200 and face each other at positions where they can be visually recognized. For convenience of explanation, it is assumed that two users are user A and user B, respectively. The mobile terminal device 200 worn by the user A is referred to as a mobile terminal device 200A, and the mobile terminal device 200 worn by the user B is referred to as a mobile terminal device 200B. In the description of each part of the mobile terminal device 200, when it is necessary to distinguish between the mobile terminal device 200A and the mobile terminal device 200B, the mobile terminal device 200A is given an A at the end of the reference numeral, and the mobile terminal device 200B. Will be described with a suffix B.

各携帯端末装置２００Ａ，２００Ｂでは、それぞれの制御部１０１は、マイク１０８で集音した周囲の音の大きさに基づいて、周囲の騒音が大きいか否かを制御部１０１が判断する。そして、周囲の騒音の大きさがあらかじめ定められた所定の閾値を超えると判断されると、それぞれの制御部１０１は、第１撮像部２１０の撮像素子１１１ａで携帯端末装置２００の前方の被写体を撮像するように各部を制御する。また、それぞれの制御部１０１は、第２撮像部２２０の撮像素子１１１ｂでユーザの目元および口元を撮像するように各部を制御する。 In each of the mobile terminal devices 200A and 200B, each control unit 101 determines whether or not the surrounding noise is large based on the volume of the surrounding sound collected by the microphone 108. When it is determined that the ambient noise level exceeds a predetermined threshold value, each control unit 101 uses the imaging element 111a of the first imaging unit 210 to select a subject in front of the mobile terminal device 200. Each part is controlled so as to capture an image. In addition, each control unit 101 controls each unit so that the imaging element 111b of the second imaging unit 220 images the user's eyes and mouth.

携帯端末装置２００Ａの制御部１０１Ａは、撮像素子１１１ａで撮像して得られた画像に基づいて、視線検出部１４０で視線検出処理を行わせる。そして、制御部１０１Ａは、視線検出部１４０での視線検出処理の結果に基づいて、携帯端末装置２００Ａの前方の被写体の中に、ユーザＡ、すなわち、携帯端末装置２００Ａを注視する人物が存在するか否かを判断する。 The control unit 101A of the mobile terminal device 200A causes the line-of-sight detection unit 140 to perform line-of-sight detection processing based on an image obtained by imaging with the imaging element 111a. Then, based on the result of the line-of-sight detection process in the line-of-sight detection unit 140, the control unit 101A includes a person who is gazing at the user A, that is, the mobile terminal device 200A, among subjects in front of the mobile terminal device 200A. Determine whether or not.

また、制御部１０１Ａは、撮像素子１１１ｂで撮像して得られた画像に基づいて、視線検出部１４０でユーザＡについての視線検出処理を行わせる。そして、制御部１０１Ａは、視線検出部１４０での視線検出処理の結果、および、撮像素子１１１ａで撮像して得られた画像に基づいて、ユーザＡが携帯端末装置２００の前方の被写体のどの部分を注視しているかを特定する。 In addition, the control unit 101A causes the line-of-sight detection unit 140 to perform line-of-sight detection processing on the user A based on an image obtained by imaging with the image sensor 111b. Then, the control unit 101A determines which part of the subject in front of the mobile terminal device 200 the user A is based on the result of the line-of-sight detection process in the line-of-sight detection unit 140 and the image obtained by imaging with the imaging element 111a. Identify whether you are watching.

そして、制御部１０１Ａは、携帯端末装置２００Ａを注視する人物が存在し、かつ、ユーザＡが当該人物の顔を注視しているか否かを判断する。携帯端末装置２００Ａを注視する人物が存在し、かつ、ユーザＡが当該人物の顔を注視している状態が、たとえば数秒間継続していると判断されると、制御部１０１Ａは、前方の相手に対して読唇処理を開始するかどうかを尋ねる表示、すなわち読唇処理開始問合せ表示画面を眼鏡レンズ２０３のハーフミラー層２０３ａに表示するよう各部を制御する。 Then, the control unit 101A determines whether there is a person gazing at the mobile terminal device 200A and whether the user A is gazing at the person's face. If it is determined that there is a person watching the mobile terminal device 200A and the user A is watching the face of the person, for example, for several seconds, the control unit 101A Each part is controlled to display on the half mirror layer 203a of the eyeglass lens 203 a display asking whether or not to start the lip reading process, that is, a lip reading process start inquiry display screen.

たとえば、ユーザＡの操作部１０３の操作によって読唇処理の開始が指示されると、制御部１０１Ａは、赤外線通信を開始して、読唇処理開始の許可を申請する申請信号を送信するように赤外線通信部１３０Ａを制御する。 For example, when the start of the lip reading process is instructed by the operation of the operation unit 103 of the user A, the control unit 101A starts the infrared communication and transmits the application signal for applying for permission to start the lip reading process. Control unit 130A.

たとえば、ユーザＢが装着する携帯端末装置２００Ｂでは、携帯端末装置２００Ａからの申請信号を赤外線通信部１３０Ｂで受信すると、制御部１０１Ｂは、携帯端末装置２００Ａからの読唇処理開始の申請を許可するか否かを選択するための選択画面を眼鏡レンズ２０３Ｂのハーフミラー層２０３ａＢに表示するよう各部を制御する。 For example, in the portable terminal device 200B worn by the user B, when the application signal from the portable terminal device 200A is received by the infrared communication unit 130B, the control unit 101B permits the application for starting the lip reading process from the portable terminal device 200A. Each part is controlled so that a selection screen for selecting whether or not is displayed on the half mirror layer 203aB of the spectacle lens 203B.

たとえば、ユーザＢの操作部１０３の操作によって携帯端末装置２００Ａからの読唇処理開始の申請が許可されると、制御部１０１Ｂは、読唇処理開始の申請を許可する許可信号を送信するように赤外線通信部１３０Ａを制御する。また、ユーザＢの操作部１０３の操作によって携帯端末装置２００Ａからの読唇処理開始の申請が拒否されると、制御部１０１Ｂは、読唇処理開始の申請を許可しない不許可信号を送信するように赤外線通信部１３０Ａを制御する。 For example, when the application for starting the lip reading process from the portable terminal device 200A is permitted by the operation of the operation unit 103 of the user B, the control unit 101B performs infrared communication so as to transmit a permission signal permitting the application for starting the lip reading process. Control unit 130A. Further, when the application for starting the lip reading process from the portable terminal device 200A is rejected by the operation of the operation unit 103 of the user B, the control unit 101B transmits the non-permission signal not permitting the application for starting the lip reading process. Controls the communication unit 130A.

携帯端末装置２００Ａでは、申請信号を送信後の所定の待機時間以内に、たとえば、申請信号を送信後２０秒以内に携帯端末装置２００Ｂからの許可信号を受信すると、制御部１０１Ａは、読唇処理を開始するよう各部を制御する。すなわち、制御部１０１Ａは、第１撮像部２１０の撮像素子１１１ａで撮像して得られたユーザＢの動画像データに基づく読唇処理を開始するよう読唇処理部１２０を制御する。そして、制御部１０１Ａは、読唇処理によって認識した言葉を文字に変換して、眼鏡レンズ２０３のハーフミラー層２０３ａに表示するよう各部を制御する。 When the mobile terminal device 200A receives the permission signal from the mobile terminal device 200B within a predetermined waiting time after transmitting the application signal, for example, within 20 seconds after transmitting the application signal, the control unit 101A performs the lip reading process. Control each part to start. That is, the control unit 101A controls the lip reading processing unit 120 to start the lip reading process based on the moving image data of the user B obtained by imaging with the imaging element 111a of the first imaging unit 210. Then, the control unit 101A controls each unit so that the words recognized by the lip reading process are converted into characters and displayed on the half mirror layer 203a of the eyeglass lens 203.

なお、申請信号を送信後、所定の待機時間以内に携帯端末装置２００Ｂからの不許可信号を受信すると、制御部１０１Ａは、読唇処理の開始を中止する。また、申請信号を送信後、所定の待機時間を超えても携帯端末装置２００Ｂからの許可信号または不許可信号を受信できなかった場合、制御部１０１Ａは、読唇処理の開始を中止する。 Note that if the non-permission signal is received from the mobile terminal device 200B within a predetermined waiting time after transmitting the application signal, the control unit 101A stops the lip reading process. In addition, after transmitting the application signal, if the permission signal or the disapproval signal from the mobile terminal device 200B cannot be received even after a predetermined waiting time has elapsed, the control unit 101A stops the lip reading process.

読唇処理の開始後、たとえば、ユーザＡが顔を大きく動かすなどして撮像素子１１１ａでの撮像範囲からユーザＢの口元が外れてしまった場合、制御部１０１Ａは、撮像素子１１１ａで撮像して得られた画像に基づいて、ユーザＢの口元を再認識する処理を行うよう各部を制御する。そして、たとえば１０秒程度の所定時間内にユーザＢの口元の再認識に成功した場合には、制御部１０１Ａは、読唇処理を継続するよう各部を制御する。 After the start of the lip reading process, for example, when the user A moves his / her face greatly and the mouth of the user B deviates from the imaging range of the imaging device 111a, the control unit 101A obtains an image by the imaging device 111a. Based on the obtained image, each unit is controlled to perform processing for re-recognizing the mouth of the user B. Then, for example, when the re-recognition of the mouth of the user B is successful within a predetermined time of about 10 seconds, the control unit 101A controls each unit to continue the lip reading process.

また、所定時間を超えてもユーザＢの口元を再認識できなかった場合には、制御部１０１Ａは、読唇処理を終了するよう各部を制御する。操作部１０３の操作によって読唇処理の開始の中止が指示された場合にも、制御部１０１Ａは、読唇処理を終了するよう各部を制御する。 If the user B's mouth cannot be re-recognized after the predetermined time, the control unit 101A controls each unit to end the lip reading process. Even when the start of the lip reading process is instructed by the operation of the operation unit 103, the control unit 101A controls each unit to end the lip reading process.

なお、上述の説明では、ユーザＡが装着する携帯端末装置２００Ａについての動作を主に説明したが、ユーザＢが装着する携帯端末装置２００Ｂについても携帯端末装置２００Ａと同じである。 In the above description, the operation of the mobile terminal device 200A worn by the user A is mainly described, but the mobile terminal device 200B worn by the user B is the same as the mobile terminal device 200A.

−−−フローチャート−−−
図６は、携帯端末装置２００における上述した読唇処理の動作についてのフローチャートである。本実施の形態の携帯端末装置２００では、不図示の電源スイッチがオンされると図６に示す処理を行うプログラムが起動されて、制御部１０１で定期的に実行される。ステップＳ１０１およびステップＳ１０３の動作については、第１の実施の形態における図２のフローチャートのステップＳ１０１およびステップ１０３と同じである。 --- Flow chart ---
FIG. 6 is a flowchart of the operation of the lip reading process described above in the mobile terminal device 200. In the mobile terminal device 200 of the present embodiment, when a power switch (not shown) is turned on, a program for performing the processing shown in FIG. 6 is started and periodically executed by the control unit 101. The operations in steps S101 and S103 are the same as those in steps S101 and 103 in the flowchart of FIG. 2 in the first embodiment.

ステップＳ１０３が肯定判断されるとステップＳ１５１へ進み、撮像素子１１１ａ，１１１ｂによる撮像を開始させてステップＳ１５３へ進む。ステップＳ１５３において、撮像素子１１１ａで撮像して得られた画像に基づいて、視線検出部１４０で視線検出処理を行わせてステップＳ１５５へ進む。ステップＳ１５５において、ステップＳ１５３での視線検出処理の結果、携帯端末装置２００を注視する人物が存在するか否かを判断する。 If an affirmative determination is made in step S103, the process proceeds to step S151 to start imaging by the image sensors 111a and 111b, and the process proceeds to step S153. In step S153, the line-of-sight detection unit 140 performs line-of-sight detection processing based on the image obtained by imaging with the image sensor 111a, and the process proceeds to step S155. In step S155, it is determined whether or not there is a person watching the mobile terminal device 200 as a result of the line-of-sight detection process in step S153.

ステップＳ１５５が肯定判断されるとステップＳ１５７へ進み、撮像素子１１１ｂで撮像して得られた画像に基づいて、視線検出部１４０でユーザについての視線検出処理を行わせてステップＳ１５９へ進む。ステップＳ１５９において、ステップＳ１５７での視線検出処理の結果、および、撮像素子１１１ａで撮像した携帯端末装置２００の前方の被写体の画像に基づいて、ユーザが携帯端末装置２００の前方の被写体のどの部分を注視しているかを特定してステップＳ１６１へ進む。 If an affirmative determination is made in step S155, the process proceeds to step S157, and the line-of-sight detection process for the user is performed by the line-of-sight detection unit 140 based on the image obtained by imaging with the image sensor 111b, and the process proceeds to step S159. In step S159, based on the result of the line-of-sight detection process in step S157 and the image of the subject in front of the mobile terminal device 200 imaged by the image sensor 111a, which part of the subject in front of the mobile terminal device 200 is selected by the user. It is determined whether the user is gazing, and the process proceeds to step S161.

ステップＳ１６１において、ステップＳ１５９での処理結果に基づいて、ステップＳ１５３で存在すると判断した携帯端末装置２００を注視する人物の顔を、ユーザが注視しているか否かを判断する。ステップＳ１６１が肯定判断されるとステップＳ１６３へ進み、携帯端末装置２００を注視する人物が携帯端末装置２００を継続して注視し、かつ、当該人物の顔をユーザが注視する状態が所定時間継続したか否かを判断する。 In step S161, based on the processing result in step S159, it is determined whether or not the user is gazing at the face of the person gazing at the mobile terminal device 200 determined to be present in step S153. If an affirmative determination is made in step S161, the process proceeds to step S163, and a state in which a person watching the mobile terminal device 200 continuously watches the mobile terminal device 200 and the user watches the face of the person continues for a predetermined time. Determine whether or not.

ステップＳ１６３が肯定判断されるとステップＳ１６５へ進み、前方の相手に対して読唇処理を開始するかどうかを尋ねる読唇処理開始問合せ表示画面をハーフミラー層２０３ａに表示させてステップＳ１６７へ進む。 If an affirmative determination is made in step S163, the process proceeds to step S165, and a lip reading process start inquiry display screen asking whether or not to start the lip reading process for the front partner is displayed on the half mirror layer 203a, and the process proceeds to step S167.

ステップＳ１６７において、操作部１０３への操作入力があるまで待機する。ステップＳ１６７で、操作部１０３の操作によって読唇処理の開始が指示されたと判断されると、ステップＳ１６９へ進み、読唇処理開始の許可を申請する申請信号を送信するように赤外線通信部１３０を制御してステップＳ１７１へ進む。 In step S167, the process waits until there is an operation input to the operation unit 103. If it is determined in step S167 that start of the lip reading process is instructed by the operation of the operation unit 103, the process proceeds to step S169, and the infrared communication unit 130 is controlled to transmit an application signal for applying for permission to start the lip reading process. Then, the process proceeds to step S171.

ステップＳ１７１において、所定の待機時間内に読唇処理開始の申請を許可する許可信号を受信したか否かを判断する。ステップＳ１７１が肯定判断されるとステップＳ１７３へ進み、撮像素子１１１ａで撮像した携帯端末装置２００の前方の被写体の画像に基づいて、読唇処理を開始するよう読唇処理部１２０を制御してステップＳ１７５へ進む。ステップＳ１７５において、読唇処理部１２０での読唇処理によって認識した言葉を文字に変換して、ハーフミラー層２０３ａに表示するよう各部を制御してステップＳ１７７へ進む。 In step S171, it is determined whether or not a permission signal permitting application for starting the lip reading process is received within a predetermined waiting time. If an affirmative determination is made in step S171, the process proceeds to step S173, and the lip reading processing unit 120 is controlled to start the lip reading process based on the image of the subject in front of the mobile terminal device 200 imaged by the image sensor 111a, and the process proceeds to step S175. move on. In step S175, the words recognized by the lip reading process in the lip reading processing unit 120 are converted into characters and each part is controlled to display on the half mirror layer 203a, and the process proceeds to step S177.

ステップＳ１７７において、撮像素子１１１ａで撮像した被写体像の画像に基づいて、読唇処理の対象となる人物を引き続き認識できているか否かを判断する。ステップＳ１７７が肯定判断されるとステップＳ１７９へ進み、読唇処理を終了するように操作部１０３への操作入力がなされたか否かを判断する。ステップＳ１７９が否定判断されるとステップＳ１７３へ戻る。 In step S177, based on the image of the subject image picked up by the image sensor 111a, it is determined whether or not the person who is the target of the lip reading process can be continuously recognized. If an affirmative determination is made in step S177, the process proceeds to step S179, and it is determined whether or not an operation input to the operation unit 103 has been made so as to end the lip reading process. If a negative determination is made in step S179, the process returns to step S173.

ステップＳ１７９が肯定判断されると、本プログラムを終了する。
ステップＳ１７７が否定判断されるとステップＳ１８１へ進み、撮像素子１１１ａで撮像した被写体像の画像に基づいて、読唇処理の対象となる人物の口元を認識できない状態が所定時間を超えて継続したか否かを判断する。 If a positive determination is made in step S179, the program ends.
If a negative determination is made in step S177, the process proceeds to step S181, and based on the image of the subject image picked up by the image pickup device 111a, whether or not the state of being able to recognize the mouth of the person subject to the lip reading process has continued beyond a predetermined time Determine whether.

ステップＳ１８１が否定判断されると、ステップＳ１８３へ進み、撮像素子１１１ａで撮像した被写体像の画像に基づいて、読唇処理の対象となる人物の口元を再認識できたか否かを判断する。ステップＳ１８３が肯定判断されるとステップＳ１７９へ進む。
ステップＳ１８３が否定判断されるとステップＳ１８１へ戻る。
ステップＳ１８１が肯定判断されると本プログラムを終了する。 If a negative determination is made in step S181, the process proceeds to step S183, and it is determined whether or not the mouth of the person who is the subject of the lip reading process can be re-recognized based on the image of the subject image captured by the image sensor 111a. If a positive determination is made in step S183, the process proceeds to step S179.
If a negative determination is made in step S183, the process returns to step S181.
If an affirmative decision is made in step S181, the program ends.

ステップＳ１７１が否定判断されると、すなわち、許可信号を受信しないまま所定の待機時間が経過したか、または、所定の待機時間内に不許可信号を受信すると、本プログラムを終了する。
ステップＳ１６７において、操作部１０３の操作によって読唇処理の開始の中止が指示されたと判断されると、本プログラムを終了する。
ステップＳ１６３で所定時間が経過していない場合はステップＳ１６１へ戻る。 If a negative determination is made in step S171, that is, if the predetermined standby time has elapsed without receiving the permission signal, or if the non-permission signal is received within the predetermined standby time, the program is terminated.
In step S167, when it is determined that the start of the lip reading process is instructed by the operation of the operation unit 103, the program is terminated.
If the predetermined time has not elapsed in step S163, the process returns to step S161.

ステップＳ１６３で所定時間が経過する前に、携帯端末装置２００を注視していた人物が携帯端末装置２００を注視しなくなったか、または、当該人物の顔をユーザが注視しなくなった場合にはステップＳ１０１へ戻る。
ステップＳ１６１が否定判断されると、ステップＳ１０１へ戻る。
ステップＳ１５５が否定判断されると、ステップＳ１０１へ戻る。
ステップＳ１０３が否定判断されると本プログラムを終了する。 If the person who has been gazing at the mobile terminal device 200 stops gazing at the mobile terminal device 200 before the predetermined time elapses at step S163, or if the user is not gazing at the face of the person, step S101 is performed. Return to.
If a negative determination is made in step S161, the process returns to step S101.
If a negative determination is made in step S155, the process returns to step S101.
If a negative determination is made in step S103, the program is terminated.

図７は、他の携帯端末装置２００からの読唇処理開始の許可を申請する申請信号を受信した際に、許可または拒否する処理の動作についてのフローチャートである。本実施の形態の携帯端末装置２００では、不図示の電源スイッチがオンされると図７に示す処理を行うプログラムが起動されて、制御部１０１で実行される。ステップＳ４０１において、他の携帯端末装置２００からの申請信号を受信するまで待機する。 FIG. 7 is a flowchart of the operation of the process of permitting or rejecting when receiving an application signal for applying for permission to start the lip reading process from another mobile terminal device 200. In portable terminal device 200 of the present embodiment, when a power switch (not shown) is turned on, a program for performing the processing shown in FIG. 7 is started and executed by control unit 101. In step S401, the process waits until an application signal from another mobile terminal device 200 is received.

ステップＳ４０１において、他の携帯端末装置２００からの申請信号を受信するとステップＳ４０３へ進み、他の携帯端末装置２００のユーザからの読唇処理開始の申請を許可するか否かの選択画面をハーフミラー層２０３ａに表示させてステップＳ４０５へ進む。ステップＳ４０５において、操作部１０３への操作入力があるまで待機する。ステップＳ４０５で、操作部１０３の操作によって読唇処理の開始の申請が許可されたと判断されると、ステップＳ４０７へ進み、読唇処理開始の申請を許可する許可信号を送信するように赤外線通信部１３０を制御して本プログラムを終了する。 In step S401, when an application signal from another mobile terminal device 200 is received, the process proceeds to step S403, and a selection screen as to whether or not an application for starting the lip reading process from the user of the other mobile terminal device 200 is permitted is displayed in the half mirror layer. The information is displayed on 203a, and the process proceeds to step S405. In step S405, the process waits until there is an operation input to the operation unit 103. If it is determined in step S405 that the application for starting the lip reading process is permitted by the operation of the operation unit 103, the process proceeds to step S407, and the infrared communication unit 130 is set to transmit a permission signal permitting the application for starting the lip reading process. Control and end this program.

ステップＳ４０５で、操作部１０３の操作によって読唇処理の開始の申請が拒否されたと判断されると、ステップＳ４０９へ進み、読唇処理開始の申請を許可しない不許可信号を送信するように赤外線通信部１３０を制御して本プログラムを終了する。 If it is determined in step S405 that the application for starting the lip reading process has been rejected by the operation of the operation unit 103, the process proceeds to step S409, where the infrared communication unit 130 transmits a non-permission signal that does not permit the application for starting the lip reading process. To finish this program.

第２の実施の形態の携帯端末装置２００では、第１の実施の形態の作用効果に加えて、次の作用効果を奏する。
（１）騒音が大きい環境下で、携帯端末装置２００をそれぞれ装着した２人のユーザ同士の視線が合った状態が数秒間続くと、読唇処理開始の許可を申請する申請信号を出力するように構成した。そして、相手側からの許可信号を受信すると、読唇処理を開始するように構成した。これにより、お互いに視認し得る位置で向かい合っているが周囲の騒音の影響によって相手の発する音声を明瞭に聞き取れないような場合に、容易に読唇処理を開始できるので、利便性が高い。また、相手側の携帯端末装置２００からの許可信号の受信をもって読唇処理を開始するように構成しているので、他人のプライバシーを保護できる。 The mobile terminal device 200 according to the second embodiment has the following operational effects in addition to the operational effects of the first embodiment.
(1) In a noisy environment, if a line of sight of two users each wearing the portable terminal device 200 continues for several seconds, an application signal for applying for permission to start the lip reading process is output. Configured. And when the permission signal from the other party is received, the lip reading process is started. This makes it easy to start the lip reading process when the voices of the other party cannot be clearly heard due to the influence of surrounding noise, but facing each other at positions where they can be visually recognized, which is highly convenient. Further, since the lip reading process is started upon receipt of the permission signal from the counterpart mobile terminal device 200, the privacy of others can be protected.

（２）読唇処理開始の許可を申請する申請信号を受信すると、読唇処理開始の申請を許可するか否かを選択するための選択画面を眼鏡レンズ２０３のハーフミラー層２０３ａに表示するように構成した。これにより、読唇されることに対してユーザの意志を反映できるので、ユーザのプライバシーを保護できる。 (2) Upon receipt of an application signal for applying for permission to start lip reading processing, a selection screen for selecting whether to permit application for starting lip reading processing is displayed on the half mirror layer 203a of the eyeglass lens 203. did. Thereby, since a user's will can be reflected with respect to being lip read, a user's privacy can be protected.

（３）撮像して取得した画像に基づいて視線検出処理を行うことで、携帯端末装置２００Ａを注視する人物が存在し、かつ、ユーザＡが当該人物の顔を注視している状態が所定時間継続すると、読唇処理開始問合せ画面を眼鏡レンズ２０３のハーフミラー層２０３ａに表示するように構成した。これにより、簡単な装置構成によって、読唇処理を開始させたいと考えるユーザの意志を検出できるので、コスト増を抑制できる。 (3) By performing a line-of-sight detection process based on an image acquired by imaging, a state in which there is a person watching the mobile terminal device 200A and the user A is watching the face of the person for a predetermined time If it continues, it comprised so that the lip reading process start inquiry screen might be displayed on the half mirror layer 203a of the spectacle lens 203. FIG. Thereby, since the user's will to start the lip reading process can be detected with a simple device configuration, an increase in cost can be suppressed.

−−−変形例−−−
（１）上述した第１の実施の形態では、読唇処理部１２０で認識した言葉が音声データとして生成されて出力されるように構成したが、本発明はこれに限定されない。たとえば、読唇処理部１２０で認識した言葉が音声データに代えて、または、音声データとともに、テキストデータとして生成されて出力されるように構成してもよく、上述した作用効果と同様の作用効果を奏する。なお、この場合には、携帯端末装置１００からのテキストデータが、携帯端末装置１００との通話相手の端末装置の表示部へ通話中に表示されるように、通話相手の端末装置が構成されていることが望ましい。または、携帯端末装置１００からのテキストデータに基づいて音声読み上げを行うことで、通話相手の端末装置のユーザに音声で通知するように、通話相手の端末装置が構成されていることが望ましい。 ---- Modified example ---
(1) In the first embodiment described above, the words recognized by the lip reading processing unit 120 are generated and output as voice data, but the present invention is not limited to this. For example, the words recognized by the lip reading processing unit 120 may be configured to be generated and output as text data instead of the voice data or together with the voice data. Play. In this case, the call partner terminal device is configured so that text data from the mobile terminal device 100 is displayed during a call on the display unit of the call partner terminal device with the mobile terminal device 100. It is desirable. Alternatively, it is desirable that the call partner terminal device is configured so as to notify the user of the call partner terminal device by voice by reading out the voice based on the text data from the mobile terminal device 100.

または、携帯端末装置１００からのテキストデータを携帯電話通信網側で音声データに変換して、変換した音声データを通話相手の端末装置に送信するようにしてもよい。 Alternatively, text data from the mobile terminal device 100 may be converted into voice data on the mobile phone communication network side, and the converted voice data may be transmitted to the terminal device of the call partner.

（２）上述した第１の実施の形態では、着信があったときにマイク１０８で集音した周囲の騒音の大きさがあらかじめ定められた所定の閾値を超える場合に、読唇通話モードでの通話を選択できるように構成しているが、本発明はこれに限定されない。たとえば、携帯端末装置１００で着信音の出力が禁止される周知のマナーモードに設定されている場合に着信があると、読唇通話モードでの通話を選択できるように構成してもよい。すなわち、携帯端末装置１００でマナーモード設定時に着信があった場合にも、周囲の騒音が大きかった場合と同様の着信動作を行って、読唇通話モードを選択できるようにしてもよい。このように構成することで、たとえば大きな音や声を出すことが憚られる公共の場などでも、発声することなく通話できるので、マナーの面からも好ましい。 (2) In the first embodiment described above, when the magnitude of the ambient noise collected by the microphone 108 when an incoming call is received exceeds a predetermined threshold value, the call in the lip reading call mode However, the present invention is not limited to this. For example, when the mobile terminal device 100 is set to a well-known manner mode in which the output of ringtones is prohibited, a call in the lip reading call mode may be selected when there is an incoming call. That is, even when there is an incoming call when the mobile terminal device 100 is set in the manner mode, the same lip reading call mode may be selected by performing the same incoming call operation as when the surrounding noise is loud. This configuration is preferable from the aspect of manners because, for example, a public place where a loud sound or voice can be heard can be talked without speaking.

また、たとえば、図８に示す携帯端末装置１００のイヤホンジャック１５１に不図示のイヤホンが差し込まれていることが検出されているときに着信があった場合にも、周囲の騒音が大きかった場合と同様の着信動作を行って、読唇通話モードを選択できるようにしてもよい。なお、制御部１０１は、イヤホンジャック１５１に不図示のイヤホンが差し込まれているか否かを、イヤホンジャック１５１に設けられたイヤホン挿入検出端子１５１ａからの出力に基づいて判断する。 Further, for example, even when there is an incoming call when it is detected that an earphone (not shown) is inserted into the earphone jack 151 of the mobile terminal device 100 shown in FIG. A similar incoming call operation may be performed to select the lip reading call mode. Note that the control unit 101 determines whether an earphone (not shown) is inserted into the earphone jack 151 based on an output from the earphone insertion detection terminal 151a provided in the earphone jack 151.

大きな騒音環境下では、イヤホンを用いないと音声の聞き取りが困難であることが考えられる。したがって、上述のように構成することで、読唇通話モードでの通話を行うことが望ましいシチュエーションで、読唇通話モードでの通話を開始するか否かを問い合わせる旨の表示を適切に表示できる。これにより、携帯端末装置１００の利便性を向上できる。 In a loud noise environment, it is considered that it is difficult to listen to voice unless earphones are used. Therefore, by configuring as described above, it is possible to appropriately display a display for inquiring whether to start a call in the lip reading call mode in a situation where it is desirable to make a call in the lip reading call mode. Thereby, the convenience of the portable terminal device 100 can be improved.

また、たとえば、図９に示す携帯端末装置１００のＧＰＳ受信機１５２からの現在位置に関する情報に基づいて、たとえば、電車の路線に沿って移動しているなど、公共交通機関による移動中であるか否かを判断するようにしてもよい。そして、公共交通機関による移動中であると判断される場合に着信があったときにも、周囲の騒音が大きかった場合と同様の着信動作を行って、読唇通話モードを選択できるようにしてもよい。このように構成することで、たとえば公共交通機関の利用時に着信があっても、公共交通機関の利用中であって後に電話をかけ直す旨を発声することなく通話相手に音声で通知できる。これにより、あらかじめ設定されているメッセージによる応答のような画一的な返答でなく、着信時の状況に応じてユーザが適切に応答できるため、利便性が高い。 Further, for example, based on the information about the current position from the GPS receiver 152 of the mobile terminal device 100 shown in FIG. 9, is it moving along public trains, for example, moving along a train route? It may be determined whether or not. And when it is determined that the vehicle is moving by public transport, the incoming call operation is the same as when the surrounding noise is loud so that the lip reading mode can be selected. Good. With this configuration, for example, even when an incoming call is received when using public transportation, it is possible to notify the other party of the call by voice without saying that the user is using the public transportation and wants to call again later. Accordingly, since the user can appropriately respond according to the situation at the time of the incoming call instead of a uniform response such as a response by a preset message, the convenience is high.

（３）上述した第１の実施の形態では、記憶部１０２に記憶されているプログラムを制御部１０１の不図示のＣＰＵが実行することによって読唇処理部１２０を構成しているが、本発明はこれに限定されない。たとえば、上述した読唇処理を行う回路を設けてもよい。 (3) In the first embodiment described above, the lip reading processing unit 120 is configured by the CPU (not shown) of the control unit 101 executing the program stored in the storage unit 102. It is not limited to this. For example, a circuit for performing the lip reading process described above may be provided.

（４）上述した第２の実施の形態では、ユーザの操作入力を操作部１０３で受け付けるように構成しているが、本発明はこれに限定されない。たとえば、マイク１０８で集音したユーザの音声コマンドに基づいて、制御部１０１で操作入力の内容を判定し、判定結果に応じて各部を制御するように構成してもよい。 (4) In the second embodiment described above, the operation unit 103 is configured to accept user operation input, but the present invention is not limited to this. For example, the control unit 101 may determine the content of the operation input based on a user's voice command collected by the microphone 108 and control each unit according to the determination result.

（５）上述した第２の実施の形態では、申請信号を送信後の所定の待機時間以内に他の携帯端末装置２００からの許可信号を受信すると、制御部１０１が読唇処理を開始するよう各部を制御するように構成しているが、本発明はこれに限定されない。たとえば、申請信号を送信後の所定の待機時間以内に他の携帯端末装置２００からの許可信号を受信すると、制御部１０１が、前方の相手から読唇処理の開始が許可された旨の表示を眼鏡レンズ２０３のハーフミラー層２０３ａに表示するよう各部を制御するように構成してもよい。 (5) In the second embodiment described above, each unit is configured to start the lip reading process when the control unit 101 receives a permission signal from another portable terminal device 200 within a predetermined waiting time after transmitting the application signal. However, the present invention is not limited to this. For example, when receiving a permission signal from another portable terminal device 200 within a predetermined waiting time after transmitting the application signal, the control unit 101 displays a display indicating that the start of the lip reading process is permitted from the front partner. You may comprise so that each part may be controlled so that it may display on the half mirror layer 203a of the lens 203. FIG.

（６）上述した第２の実施の形態では、読唇処理によって認識した言葉を文字に変換して、眼鏡レンズ２０３のハーフミラー層２０３ａに表示するように構成しているが、本発明はこれに限定されない。たとえば、読唇処理によって認識した言葉を音声データに変換して、イヤホン２０４から音声として出力するように構成してもよい。すなわち、読唇処理部１２０での読唇処理によって認識した言葉を制御部１０１が音声に変換して音声信号を生成し、生成した音声信号をイヤホン２０４に出力するように各部を制御するように構成してもよい。 (6) In the second embodiment described above, the words recognized by the lip reading process are converted into characters and displayed on the half mirror layer 203a of the spectacle lens 203. However, the present invention is not limited to this. It is not limited. For example, the words recognized by the lip reading process may be converted into voice data and output from the earphone 204 as voice. That is, the control unit 101 converts the words recognized by the lip reading processing in the lip reading processing unit 120 into speech and generates an audio signal, and controls each unit so as to output the generated audio signal to the earphone 204. May be.

（７）上述した第２の実施の形態の携帯端末装置２００において、ユーザの読唇処理を行う際の認識精度向上のために、たとえば、音声通話を行っている際に、撮像素子１１１ｂで撮像して得られた画像と、マイク１０８からの音声信号とに基づく学習をするように構成してもよい。すなわち、撮像素子１１１ｂで撮像して得られた画像に基づく画像解析によって検出したユーザの唇の形や動きと、マイク１０８からの音声信号に基づく音声認識の結果とに基づいて、制御部１０１で学習するように構成してもよい。
（８）上述した各実施の形態および変形例は、それぞれ組み合わせてもよい。 (7) In the mobile terminal device 200 according to the second embodiment described above, for example, when performing a voice call, an image is picked up by the image sensor 111b in order to improve the recognition accuracy when the user's lip reading process is performed. The learning may be performed based on the image obtained in this way and the sound signal from the microphone 108. That is, based on the shape and movement of the user's lips detected by image analysis based on the image obtained by imaging with the imaging element 111b and the result of voice recognition based on the voice signal from the microphone 108, the control unit 101 You may comprise so that it may learn.
(8) You may combine each embodiment and modification which were mentioned above, respectively.

なお、本発明の特徴的な機能を損なわない限り、本発明は、上述した実施の形態における構成に何ら限定されない。 Note that the present invention is not limited to the configurations in the above-described embodiments as long as the characteristic functions of the present invention are not impaired.

１００，２００携帯端末装置、１０１制御部、１０４送受信部、１０６表示部、１０８マイク、１１０撮像部、１１１，１１１ａ，１１１ｂ撮像素子、１２０読唇処理部、１３０赤外線通信部、１３１赤外線発光部、１３２赤外線受光部、１４０視線検出部、１５１イヤホンジャック、１５１ａイヤホン挿入検出端子、２０３眼鏡レンズ、２０３ａハーフミラー層、２１０第１撮像部、２２０第２撮像部、２３０投影部 100, 200 portable terminal device, 101 control unit, 104 transmission / reception unit, 106 display unit, 108 microphone, 110 imaging unit, 111, 111a, 111b imaging element, 120 lip reading processing unit, 130 infrared communication unit, 131 infrared light emitting unit, 132 Infrared light receiving unit, 140 Line of sight detection unit, 151 Earphone jack, 151a Earphone insertion detection terminal, 203 Eyeglass lens, 203a Half mirror layer, 210 First imaging unit, 220 Second imaging unit, 230 Projection unit

Claims

A communication module for sending and receiving information to and from the outside;
A microphone,
A display unit for displaying various types of information;
An imaging device;
When the volume of sound input to the microphone is equal to or greater than a predetermined threshold value, a screen for selecting the lip reading call mode is displayed on the display unit when an incoming call is received, and when the lip reading call mode is selected, the call is made. A control unit for switching the mode to the lip reading mode;
A lip reading processing unit that detects the shape of a speaker's lips from an image obtained by imaging with the imaging device and converts the lip reading mode into at least one of speech data and text data when switched to the lip reading call mode; ,
When the communication module is switched to the lip reading call mode, the communication module transmits at least one of voice data or text data converted by the lip reading processing unit to the outside.

A communication module for sending and receiving information to and from the outside;
A microphone,
An earphone jack to connect the earphone,
An earphone insertion detection terminal for detecting that an earphone is connected to the earphone jack;
A display unit for displaying various types of information;
An imaging device;
When it is detected that an earphone is connected to the earphone jack at the earphone insertion detection terminal and there is an incoming call, a screen for selecting a lip reading call mode is displayed on the display unit, and the lip reading call mode is A control unit that switches the call mode to the lip reading call mode when selected,
A lip reading processing unit that detects the shape of a speaker's lips from an image obtained by imaging with the imaging device and converts the lip reading mode into at least one of speech data and text data when switched to the lip reading call mode; ,
When the communication module is switched to the lip reading call mode, the communication module transmits at least one of voice data or text data converted by the lip reading processing unit to the outside.

A communication module for sending and receiving information to and from the outside;
A microphone,
A display unit for displaying various types of information;
An imaging device;
When the ring tone output is set to be prohibited and there is an incoming call, a screen for selecting the lip reading call mode is displayed on the display unit, and when the lip reading call mode is selected, the call mode is set. A control unit to switch to lip reading mode;
A lip reading processing unit that detects the shape of a speaker's lips from an image obtained by imaging with the imaging device and converts the lip reading mode into at least one of speech data and text data when switched to the lip reading call mode; ,
When the communication module is switched to the lip reading call mode, the communication module transmits at least one of voice data or text data converted by the lip reading processing unit to the outside.

A communication module for sending and receiving information to and from the outside;
A microphone,
A display unit for displaying various types of information;
A first imaging device that images at least a user's eyes;
A second imaging device different from the first imaging device;
A first line-of-sight detection unit that detects the line of sight of the user based on a first image obtained by imaging with the first imaging device;
A second line-of-sight detection unit that detects the line of sight of a person in the second image based on a second image obtained by imaging with the second imaging device;
Whether the user and the person in the second image are in line of sight based on the detection result in the first line-of-sight detection unit and the detection result in the second line-of-sight detection unit A line-of-sight determination unit to determine;
A lip reading processing unit for performing a lip reading process for detecting the shape of the lips of the person in the second image from the second image and converting it into at least one of speech data and text data of words;
When it is determined by the line-of-sight determination unit that the user and the person in the second image are in line of sight, an application unit that applies for permission to start the lip reading process by the lip reading processing unit;
A permission determining unit that determines whether permission to start the lip reading process is obtained;
A portable terminal device comprising: a control unit that causes the lip reading processing unit to start the lip reading process when the permission determination unit determines that the permission has been obtained.

The mobile terminal device according to claim 4,
The application unit has an application signal output unit that outputs an application signal for applying for permission to start the lip reading process to the outside of the mobile terminal device,
A permission signal receiving unit for receiving a permission signal for permitting an application to start the lip reading process from the outside of the mobile terminal device;
When the permission determination unit receives the permission signal by the permission signal receiving unit, the permission determination unit determines that permission to start the lip reading process has been obtained.

On the computer,
A display procedure for displaying on the display unit a screen for selecting a lip reading call mode when an incoming call is received when the volume of sound input to the microphone is equal to or greater than a predetermined threshold;
A call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected;
When switched to the lip reading call mode, a lip reading processing procedure for detecting the shape of a speaker's lips from an image obtained by imaging with an imaging device and converting it into at least one of speech data and text data of words;
A lip reading processing program for executing a transmission procedure of transmitting at least one of the converted voice data and text data to the outside when switched to the lip reading call mode.

On the computer,
When it is detected that an earphone is connected to the earphone jack and there is an incoming call, a display procedure for displaying a screen for selecting a lip reading call mode on the display unit,
A call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected;
When switched to the lip reading call mode, a lip reading processing procedure for detecting the shape of a speaker's lips from an image obtained by imaging with an imaging device and converting it into at least one of speech data and text data of words;
A lip reading processing program for executing a transmission procedure of transmitting at least one of the converted voice data and text data to the outside when switched to the lip reading call mode.

On the computer,
Display procedure for displaying a screen for selecting the lip reading call mode on the display unit when the ringtone output is set to be prohibited and there is an incoming call;
A call mode switching procedure for switching the call mode to the lip reading call mode when the lip reading call mode is selected;
When switched to the lip reading call mode, a lip reading processing procedure for detecting the shape of a speaker's lips from an image obtained by imaging with an imaging device and converting it into at least one of speech data and text data of words;
A lip reading processing program for executing a transmission procedure of transmitting at least one of the converted voice data and text data to the outside when switched to the lip reading call mode.

On the computer,
A first imaging procedure for imaging with a first imaging device that images at least the user's eyes;
A second imaging procedure for imaging with a second imaging device different from the first imaging device;
A first gaze detection procedure for detecting the gaze of the user based on a first image obtained by imaging with the first imaging device;
A second line-of-sight detection procedure for detecting the line of sight of a person in the second image based on a second image obtained by imaging with the second imaging device;
Whether the user and the person in the second image are in line of sight based on the detection result in the first visual line detection procedure and the detection result in the second visual line detection procedure Gaze judgment procedure to judge,
When it is determined that the user and the person in the second image are in line of sight in the line of sight determination procedure, an application procedure for applying for permission to start the lip reading process;
A permission determination procedure for determining whether permission to start the lip reading process is obtained;
When it is determined that the permission is obtained in the permission determination procedure, the shape of the lips of the person in the second image is detected from the second image and converted into at least one of speech data of words and text data A lip reading processing program for executing a lip reading processing procedure for performing lip reading processing.