JP2015115926A

JP2015115926A - Portable terminal device, lip-reading communication method, and program

Info

Publication number: JP2015115926A
Application number: JP2013259166A
Authority: JP
Inventors: 弘文仲地; Hirofumi Nakachi
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2013-12-16
Filing date: 2013-12-16
Publication date: 2015-06-22

Abstract

PROBLEM TO BE SOLVED: To provide a technique suitable to transmit the contents of a speech extracted based on a lip movement to a communication partner.SOLUTION: A portable terminal device includes: an image acquisition unit for acquiring the image of a lip movement; a character extraction unit for extracting characters corresponding to the lip movement on the basis of the image acquired by the image acquisition unit; a voice generation unit for generating voice by using the characters extracted by the character extraction unit; and a communication unit for transmitting the voice generated by the voice generation unit to a first communication destination.

Description

本発明は、携帯端末装置、読唇通信方法、及びプログラムに関する。 The present invention relates to a mobile terminal device, a lip reading communication method, and a program.

近年、口の動きから発話の内容を読み取る技術に関する研究が進んでいる。 In recent years, research on techniques for reading the content of utterances from mouth movements has progressed.

特許文献１には、「少なくとも口唇領域を含む顔画像を取得する撮像手段と、取得画像から口唇領域を抽出する領域抽出手段と、抽出された口唇領域より形状特徴量を計測する特徴量計測手段と、登録モードにおいて計測されたキーワード発話シーンの特徴量を登録するキーワードDBと、認識モードにおいて、前記登録されているキーワードの特徴量と、文章の発話シーンを対象として計測された特徴量とを比較することにより口唇の発話内容を認識する認識処理を行って、文章の中からキーワードを認識するワードスポッティング読唇を行う判断手段と、前記判断手段が行った認識結果を表示する表示手段とを備える」ワードスポッティング読唇装置が開示されている。 Japanese Patent Laid-Open No. 2004-133867 discloses that “an imaging unit that acquires a face image including at least a lip region, a region extraction unit that extracts a lip region from the acquired image, and a feature amount measurement unit that measures a shape feature amount from the extracted lip region A keyword DB for registering the feature amount of the keyword utterance scene measured in the registration mode, and a feature amount of the registered keyword and a feature amount measured for the utterance scene of the sentence in the recognition mode. It comprises a judging means for performing recognition processing for recognizing the utterance content of the lips by comparing and performing word spotting lip reading for recognizing a keyword from the sentence, and a display means for displaying the recognition result made by the judging means. A word spotting lip reading device is disclosed.

特開２０１２−５９０１７号公報JP 2012-59017 A

ところで、電話による会話を行う際、話者の状況によっては発声が困難な場合がある。例えば話者の身体的状況により発声が難しい場合や、発声することで周囲に迷惑を及ぼす場所（例えば電車内等）に話者が存在している場合等がこれにあたる。また、騒音が著しい場所で会話を行う際、話者の発した声が騒音に紛れて相手に伝わらない場合がある。 By the way, when carrying out a telephone conversation, it may be difficult to speak depending on the situation of the speaker. For example, this may be the case when it is difficult to speak due to the physical condition of the speaker, or when the speaker is present in a place that causes trouble to the surroundings by speaking (such as in a train). Further, when a conversation is performed in a place where noise is significant, the voice of the speaker may be mixed with the noise and not transmitted to the other party.

特に、携帯電話を使用して通話を行う場合には、周囲の状況が変化するため、このような状況に陥りやすい。 In particular, when a phone call is made using a mobile phone, the surrounding situation changes, so that the situation is likely to occur.

特許文献１に開示された技術は、口の動きから抽出された発話内容を通信相手に伝えることを考慮していない。 The technique disclosed in Patent Document 1 does not consider transmitting utterance content extracted from mouth movements to a communication partner.

本発明は、上記の点に鑑みてなされたものであって、口の動きから抽出された発話内容を通信相手に伝えるのに好適な技術の提供を目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a technique suitable for transmitting the utterance content extracted from mouth movements to a communication partner.

本願は、上記課題を解決する手段を複数含んでいるが、その例を挙げるならば、以下の通りである。 The present application includes a plurality of means for solving the above-described problems, and examples thereof are as follows.

上記課題を解決するため、本発明の携帯端末装置は、口の動きの画像を取得する画像取得部と、前記画像取得部により取得された前記画像に基づいて、前記口の動きに対応する文字を抽出する文字抽出部と、前記文字抽出部により抽出された前記文字を用いて音声を生成する音声生成部と、前記音声生成部により生成された前記音声を第１の通信先に送信する通信部と、を備えることを特徴とする。 In order to solve the above-described problem, a mobile terminal device according to the present invention includes an image acquisition unit that acquires an image of mouth movement, and a character corresponding to the movement of the mouth based on the image acquired by the image acquisition unit. A character extraction unit that extracts a voice, a voice generation unit that generates a voice using the characters extracted by the character extraction unit, and a communication that transmits the voice generated by the voice generation unit to a first communication destination And a section.

また、上記課題を解決するため、本発明の携帯端末装置は、前記画像取得部により取得された前記画像と、前記文字抽出部により抽出された前記文字とを表示する表示部を備えることを特徴としてもよい。 Moreover, in order to solve the said subject, the portable terminal device of this invention is provided with the display part which displays the said image acquired by the said image acquisition part, and the said character extracted by the said character extraction part, It is characterized by the above-mentioned. It is good.

また、上記課題を解決するため、本発明の携帯端末装置は、前記表示部により表示された前記文字の訂正を受け付ける入力部を備え、前記音声生成部は、前記入力部が受け付けた訂正後の前記文字を用いて前記音声を生成することを特徴としてもよい。 Moreover, in order to solve the said subject, the portable terminal device of this invention is equipped with the input part which receives the correction of the said character displayed by the said display part, The said audio | voice production | generation part is after the correction which the said input part received The voice may be generated using the characters.

また、上記課題を解決するため、本発明の携帯端末装置は、メッセージと第２の通信先とを関連付けた通信先情報を記憶する記憶部を備え、前記通信部は、前記文字抽出部が抽出した前記文字に前記メッセージが含まれる場合に、前記通信先情報において該メッセージと関連付けられた前記第２の通信先に発信を行うことを特徴としてもよい。 Moreover, in order to solve the said subject, the portable terminal device of this invention is provided with the memory | storage part which memorize | stores the communication destination information which linked | related the message and the 2nd communication destination, and the said character extraction part extracted the said communication part. When the message is included in the character, the call may be made to the second communication destination associated with the message in the communication destination information.

また、上記課題を解決するため、本発明の携帯端末装置は、前記通信先情報に基づいた発信であるＳＯＳ発信の指示を受け付けるＳＯＳ発信指示受付部を備え、前記通信部は、前記第１の通信先との通信中に前記ＳＯＳ発信指示受付部が前記ＳＯＳ発信の指示を受け付け、さらに前記文字抽出部が前記メッセージを含む前記文字を抽出した場合に、前記第１の通信先との接続を切断して前記第２の通信先に発信を行うことを特徴としてもよい。 In order to solve the above-described problem, the mobile terminal device of the present invention includes an SOS transmission instruction reception unit that receives an instruction of SOS transmission that is transmission based on the communication destination information, and the communication unit includes the first When the SOS transmission instruction reception unit receives the SOS transmission instruction during communication with the communication destination, and the character extraction unit extracts the character including the message, the connection with the first communication destination is established. It is good also as a characteristic to cut | disconnect and to transmit to the said 2nd communication destination.

また、上記課題を解決するため、前記通信部は、さらに前記画像取得部が所定の形の口の前記画像を所定時間分取得した場合に、前記第２の通信先に発信を行うことを特徴としてもよい。 Further, in order to solve the above-described problem, the communication unit further transmits the second communication destination when the image acquisition unit acquires the image of the mouth having a predetermined shape for a predetermined time. It is good.

また、上記課題を解決するため、前記表示部は、連続通話モードか単発発信モードかの選択を受け付けるためのホーム画面をさらに表示し、前記通信部は、前記連続通話モード選択時には前記第１の通信先との通信が確立した後に前記画像取得部が取得した前記画像に基づいて生成された前記音声を該第１の通信先に送信し、前記単発発信モード選択時には前記画像取得部が前記画像を取得した後に通信が確立した前記第１の通信先に前記音声を送信することを特徴としてもよい。 In order to solve the above problem, the display unit further displays a home screen for accepting selection of a continuous call mode or a single call mode, and the communication unit is configured to select the first call mode when the continuous call mode is selected. The sound generated based on the image acquired by the image acquisition unit after communication with the communication destination is established is transmitted to the first communication destination, and the image acquisition unit is configured to select the image when the single transmission mode is selected. The voice may be transmitted to the first communication destination with which communication has been established after acquiring.

また、上記課題を解決するため、本発明の携帯端末装置は、当該携帯端末装置周辺の周辺音声を取得する音声取得部を備え、前記通信部は、前記音声生成部により生成された前記音声に加えて前記音声取得部により取得された前記周辺音声を前記第１の通信先に送信することを特徴としてもよい。 Moreover, in order to solve the said subject, the portable terminal device of this invention is equipped with the audio | voice acquisition part which acquires the periphery audio | voice of the said portable terminal device periphery, The said communication part adds the said audio | voice produced | generated by the said audio | voice production | generation part. In addition, the peripheral sound acquired by the sound acquisition unit may be transmitted to the first communication destination.

また、上記課題を解決するため、前記通信部は、前記音声取得部により取得された前記周辺音声が所定の音量以上である場合に、当該周辺音声を送信することを特徴としてもよい。 In order to solve the above-mentioned problem, the communication unit may transmit the peripheral sound when the peripheral sound acquired by the sound acquisition unit is equal to or higher than a predetermined volume.

また、上記課題を解決するため、本発明の読唇通信方法は、口の動きの画像を取得する画像取得手順と、前記画像取得手順において取得された前記画像に基づいて、前記口の動きに対応する文字を抽出する文字抽出手順と、前記文字抽出手順において抽出された前記文字を用いて音声を生成する音声生成手順と、前記音声生成手順において生成された前記音声を第１の通信先に送信する通信手順と、を備えることを特徴とする。 In order to solve the above-described problem, the lip reading communication method of the present invention supports an image acquisition procedure for acquiring an image of mouth movement and the mouth movement based on the image acquired in the image acquisition procedure. Character extraction procedure for extracting characters to be generated, a voice generation procedure for generating voice using the characters extracted in the character extraction procedure, and the voice generated in the voice generation procedure are transmitted to a first communication destination And a communication procedure.

また、上記課題を解決するため、本発明のプログラムは、コンピューターを携帯端末装置として機能させるプログラムであって、口の動きの画像を取得する画像取得手順と、前記画像取得手順において取得された前記画像に基づいて、前記口の動きに対応する文字を抽出する文字抽出手順と、前記文字抽出手順において抽出された前記文字を用いて音声を生成する音声生成手順と、前記音声生成手順において生成された前記音声を第１の通信先に送信する通信手順と、をコンピュータに実行させることを特徴とする。 In order to solve the above problems, a program according to the present invention is a program that causes a computer to function as a mobile terminal device, and includes an image acquisition procedure for acquiring an image of mouth movements and the image acquisition procedure acquired in the image acquisition procedure. A character extraction procedure for extracting a character corresponding to the movement of the mouth based on an image, a voice generation procedure for generating a voice using the character extracted in the character extraction procedure, and a voice generation procedure And a communication procedure for transmitting the voice to the first communication destination.

本発明によれば、口の動きから抽出された発話内容を通信相手に伝えるのに好適な技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique suitable for telling the communicating party the utterance content extracted from the movement of the mouth can be provided.

上記した以外の課題、構成、及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.

携帯端末装置の機能ブロック図である。It is a functional block diagram of a portable terminal device. ＳＯＳ設定情報の一例を示す図である。It is a figure which shows an example of SOS setting information. 携帯端末装置の機能を実現するコンピューターのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the computer which implement | achieves the function of a portable terminal device. 連続通話処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a continuous call process. 単発通話処理の流れを示すフローチャート（その１）である。It is a flowchart (the 1) which shows the flow of a single call process. 単発通話処理の流れを示すフローチャート（その２）である。It is a flowchart (the 2) which shows the flow of a single call process. 連続通話モード中のＳＯＳ通信処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the SOS communication process in continuous call mode. 単発通話モード中のＳＯＳ通信処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the SOS communication process in single call mode. ホーム画面の一例を示す図である。It is a figure which shows an example of a home screen. 発信画面の一例を示す図である。It is a figure which shows an example of a transmission screen. 読唇画面Ａの一例を示す図である。It is a figure which shows an example of the lip reading screen. 読唇画面Ｂの一例を示す図である。It is a figure which shows an example of the lip reading screen. 読唇通信設定画面の一例を示す図である。It is a figure which shows an example of a lip reading communication setting screen. ＳＯＳ設定画面の一例を示す図である。It is a figure which shows an example of a SOS setting screen. 読唇履歴表示画面の一例を示す図である。It is a figure which shows an example of the lip reading log | history display screen.

以下、図面に基づいて本発明の実施形態の例を説明する。 Hereinafter, examples of embodiments of the present invention will be described with reference to the drawings.

図１は、携帯端末装置１０の機能ブロック図である。携帯端末装置１０は、可搬性のある情報処理装置であって、例えばＰＣ、ＰＤＡ（Personal Digital Assistant）、又はスマートフォンである。携帯端末装置１０は、唇の動きに応じて発話内容を読み取る、いわゆる読唇機能を有する。 FIG. 1 is a functional block diagram of the mobile terminal device 10. The portable terminal device 10 is a portable information processing device, and is, for example, a PC, a PDA (Personal Digital Assistant), or a smartphone. The mobile terminal device 10 has a so-called lip reading function that reads the utterance content according to the movement of the lips.

携帯端末装置１０は、制御部１１０と、記憶部１２０とを備える。制御部１１０は、本携帯端末装置１０における読唇機能に基づく通信を制御する。記憶部１２０は、読唇機能に基づく通信に必要な情報を記憶する。 The mobile terminal device 10 includes a control unit 110 and a storage unit 120. The control unit 110 controls communication based on the lip reading function in the mobile terminal device 10. The storage unit 120 stores information necessary for communication based on the lip reading function.

制御部１１０は、画像取得部１１１と、文字抽出部１１２と、音声生成部１１３と、表示部１１４と、通信部１１５と、入力部１１６と、ＳＯＳ発信指示受付部１１７と、音声取得部１１８とを備える。 The control unit 110 includes an image acquisition unit 111, a character extraction unit 112, a voice generation unit 113, a display unit 114, a communication unit 115, an input unit 116, an SOS transmission instruction reception unit 117, and a voice acquisition unit 118. With.

画像取得部１１１は、後述するカメラユニットを用いて画像を取得する。取得する画像は、話者の口元の画像であって、話者が発話のために唇を動かす様子を撮影した画像である。取得する画像は、実際に声を発した状態の画像であっても、声を発せずに口だけを動かした画像であってもよい。以下、発話とは、発声の有無に関わらず口を動かした状態を示す。 The image acquisition unit 111 acquires an image using a camera unit described later. The acquired image is an image of the speaker's mouth, and is an image of the speaker moving his lips for utterance. The acquired image may be an image in a state where a voice is actually produced or an image obtained by moving only the mouth without producing a voice. Hereinafter, utterance refers to a state in which the mouth is moved regardless of the presence or absence of utterance.

文字抽出部１１２は、発話している画像から文字を抽出する。唇の動きから文字を抽出する処理には、例えば唇の所定の部分に座標を付し、移動量に基づいて文字を判定するなど、公知の方法が用いられる。なお、唇の動きから文字を抽出する技術については、例えば「齊藤剛史、小西亮介、「唇および口内領域形状に基づくトラジェクトリ特徴量による読唇」、情報科学技術フォーラム一般講演論文集6(3)、39-40、2007-08-22」を用いる。この際、必要に応じて、予め記憶部１２０の図示しない領域に記憶させた、話者の唇の動きの基本モデルを用いてもよい。 The character extraction unit 112 extracts characters from the uttered image. For the process of extracting characters from the movement of the lips, a known method is used, for example, by attaching coordinates to a predetermined portion of the lips and determining the character based on the amount of movement. In addition, about the technology to extract characters from lip movement, for example, “Takeshi Saito, Ryosuke Konishi,“ Lip reading by trajectory features based on lip and mouth area shape ”, Information Science and Technology Forum General Lecture Collection 6 (3), 39-40, 2007-08-22 "is used. At this time, if necessary, a basic model of the lip movement of the speaker stored in advance in an area (not shown) of the storage unit 120 may be used.

音声生成部１１３は、文字から音声データを生成する。文字から音声データを生成する処理については、公知の技術を用いるため、ここでは詳述しない。表示部１１４は、後述するホーム画面１４０や、読唇画面等の表示画面を、表示装置に対して表示させる。表示画面には、画像取得部１１１により取得された画像や、文字抽出部１１２により抽出された文字を示すテキストが含まれる。 The voice generation unit 113 generates voice data from characters. The processing for generating speech data from characters is not described in detail here because a known technique is used. The display unit 114 displays a display screen such as a home screen 140 or a lip reading screen described later on the display device. The display screen includes an image acquired by the image acquisition unit 111 and text indicating characters extracted by the character extraction unit 112.

通信部１１５は、携帯端末装置１０と他の電話との通信及び通話を制御する。具体的には、通信部１１５は、入力操作に基づいて他の電話に対して発信を行う。また、通信部１１５は、他の電話からの通信を着信する。また、詳しくは後述するが、通信部１１５は、所定の場合にＳＯＳ設定情報に記憶されたＳＯＳ通信先に発信を行う。 The communication unit 115 controls communication and a call between the mobile terminal device 10 and another telephone. Specifically, the communication unit 115 makes a call to another telephone based on the input operation. The communication unit 115 receives communications from other telephones. Further, as will be described in detail later, the communication unit 115 makes a call to the SOS communication destination stored in the SOS setting information in a predetermined case.

入力部１１６は、タッチパネル等の入力装置を用いた入力処理を制御する。例えば入力部１１６は、文字抽出部１１２により抽出された文字の訂正の入力を受け付ける。 The input unit 116 controls input processing using an input device such as a touch panel. For example, the input unit 116 receives an input for correcting the character extracted by the character extraction unit 112.

ＳＯＳ発信指示受付部１１７は、読唇による通信を行う際に話者に生じた事情等による、ＳＯＳの発信の指示を受け付ける。ＳＯＳ発信指示は、通信先との通話中に受け付けてもよいし、通話中でないタイミングで受け付けてもよい。 The SOS transmission instruction reception unit 117 receives an instruction to transmit an SOS due to circumstances or the like that have occurred to the speaker when performing lip-reading communication. The SOS transmission instruction may be accepted during a call with the communication destination, or may be accepted at a timing when the call is not being made.

音声取得部１１８は、本携帯端末装置１０が備える図示しないマイクロフォンを介して入力された話者の声を含む周辺音声を取得する。音声取得部１１８により取得された音声が所定の音量以上である場合、通信部１１５は周辺音声を通信先に送信する。 The voice acquisition unit 118 acquires peripheral voice including a speaker's voice input via a microphone (not shown) included in the mobile terminal device 10. When the sound acquired by the sound acquisition unit 118 is equal to or higher than a predetermined volume, the communication unit 115 transmits the peripheral sound to the communication destination.

記憶部１２０は、ＳＯＳ設定情報１２１を記憶する。ＳＯＳ設定情報１２１は、文字抽出部１１２により抽出された文字が所定のメッセージを含む場合に、発信するＳＯＳ通信先を記憶させた情報である。 The storage unit 120 stores SOS setting information 121. The SOS setting information 121 is information that stores an SOS communication destination to be transmitted when a character extracted by the character extraction unit 112 includes a predetermined message.

本実施形態では、画像取得部１１１が保持者の口元の画像を取得し、文字抽出部１１２により口元の画像から文字が抽出され、音声生成部１１３により文字から音声が生成される。通信部１１５は生成された音声を通信先に送信する。これにより、例え話者が発声しなくとも、話者が口の動きで示した文字を通信先に伝えることができる。また、騒音等で話者の声が相手に伝わらない場合であっても、音声生成部１１３により生成された音声を通信先に伝えるとともに、周囲の音の送信を制御することにより、話者の話した内容をより明りょうに通信先に伝えることができる。 In the present embodiment, the image acquisition unit 111 acquires an image of the holder's mouth, the character extraction unit 112 extracts characters from the mouth image, and the speech generation unit 113 generates speech from the characters. The communication unit 115 transmits the generated voice to the communication destination. Thus, even if the speaker does not utter, the character indicated by the speaker's mouth movement can be transmitted to the communication destination. In addition, even when the speaker's voice is not transmitted to the other party due to noise or the like, the voice generated by the voice generation unit 113 is transmitted to the communication destination, and the transmission of surrounding sounds is controlled, so that the speaker's voice is controlled. It is possible to convey the spoken content to the communication destination more clearly.

図２は、ＳＯＳ設定情報１２１の一例を示す図である。ＳＯＳ設定情報１２１は、読唇パターンと、ＳＯＳ通信先とを関連付けた情報である。 FIG. 2 is a diagram illustrating an example of the SOS setting information 121. The SOS setting information 121 is information that associates the lip reading pattern with the SOS communication destination.

読唇パターンは、文字の組み合わせにより構成されるパターンを示す情報である。ＳＯＳ通信先は、話者により読唇パターンが発話された場合の通信先を示す情報である。ＳＯＳ通信先には、例えば警察機関や消防機関への緊急通報用電話番号のほか、通院している病院の電話番号等、任意の発信先が予め登録される。 The lip reading pattern is information indicating a pattern constituted by a combination of characters. The SOS communication destination is information indicating a communication destination when a lip reading pattern is uttered by a speaker. In the SOS communication destination, for example, an arbitrary call destination such as a telephone number for an emergency call to a police agency or a fire fighting agency, a telephone number of a hospital that visits a hospital, and the like are registered in advance.

次に、携帯端末装置１０のハードウェア構成例について説明する。 Next, a hardware configuration example of the mobile terminal device 10 will be described.

図３は、携帯端末装置１０の機能を実現するコンピューターのハードウェア構成例を示す図である。携帯端末装置１０は、ＣＰＵ（Central Processing Unit）１３０、補助記憶装置１３１、ネットワークＩ／Ｆ（Interface）１３２、メモリ１３３、入力Ｉ／Ｆ１３４、出力Ｉ／Ｆ１３５を備える。入力Ｉ／Ｆ１３４にはカメラユニット１３７が接続され、出力Ｉ／Ｆ１３５にはタッチパネル１３８が接続される。各構成要素はバスにより接続されている。 FIG. 3 is a diagram illustrating a hardware configuration example of a computer that realizes the function of the mobile terminal device 10. The mobile terminal device 10 includes a CPU (Central Processing Unit) 130, an auxiliary storage device 131, a network I / F (Interface) 132, a memory 133, an input I / F 134, and an output I / F 135. A camera unit 137 is connected to the input I / F 134, and a touch panel 138 is connected to the output I / F 135. Each component is connected by a bus.

ＣＰＵ１３０は、メモリ１３３又は補助記憶装置１３１に記録されたプログラムに従って処理を実行する。制御部１１０を構成する各部は、ＣＰＵ１３０がプログラムを実行することにより各々の機能が実現される。 The CPU 130 executes processing according to a program recorded in the memory 133 or the auxiliary storage device 131. Each unit constituting the control unit 110 is realized by the CPU 130 executing a program.

補助記憶装置１３１は、例えばＨＤＤ（Hard Disk Drive）や、ＣＤ-Ｒ（Compact Disc- Recordable）、ＤＶＤ-ＲＡＭ（Digital Versatile Disk-Random Access Memory）等の書き込み及び読み出し可能な記憶メディア及び記憶メディア駆動装置等である。ネットワークＩ／Ｆ１３２は、携帯端末装置１０をネットワークに接続するためのインターフェイスである。 The auxiliary storage device 131 is, for example, an HDD (Hard Disk Drive), a CD-R (Compact Disc-Recordable), a DVD-RAM (Digital Versatile Disk-Random Access Memory), etc. Device. The network I / F 132 is an interface for connecting the mobile terminal device 10 to the network.

メモリ１３３は、ＲＡＭ（Random Access Memory）又はフラッシュメモリ等の記憶装置であり、プログラムやデータが一時的に読み出される記憶エリアとして機能する。入力Ｉ／Ｆ１３４は、各入力装置を携帯端末装置１０に接続するためのインターフェイスである。出力Ｉ／Ｆ１３５は、表示装置等の各出力装置を携帯端末装置１０に接続するためのインターフェイスである。 The memory 133 is a storage device such as a RAM (Random Access Memory) or a flash memory, and functions as a storage area from which programs and data are temporarily read. The input I / F 134 is an interface for connecting each input device to the mobile terminal device 10. The output I / F 135 is an interface for connecting each output device such as a display device to the mobile terminal device 10.

カメラユニット１３７は、ＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等のセンサを用いた撮像素子で撮像した映像をデジタルデータとして取得し、メモリ１３３又は補助記憶装置１３１に記録する撮影装置である。カメラユニット１３７は、動画の撮影が可能である。タッチパネル１３８は、入力装置と出力装置とを兼ね備えた装置であって、入力Ｉ／Ｆ１３４と出力Ｉ／Ｆ１３５とに接続される。タッチパネル１３８は、例えば静電容量方式のセンサーと、ＧＵＩ（Graphical User Interface）等を表示する液晶ディスプレイとからなる装置である。 The camera unit 137 acquires an image captured by an image sensor using a sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) as digital data and records it in the memory 133 or the auxiliary storage device 131. It is. The camera unit 137 can shoot a moving image. The touch panel 138 is a device having both an input device and an output device, and is connected to the input I / F 134 and the output I / F 135. The touch panel 138 is a device that includes, for example, a capacitive sensor and a liquid crystal display that displays a GUI (Graphical User Interface) or the like.

記憶部１２０は、メモリ１３３又は補助記憶装置１３１によりその機能が実現される。また、記憶部１２０は、ネットワーク上の記憶装置（図示しない）によってその機能が実現されてもよい。 The function of the storage unit 120 is realized by the memory 133 or the auxiliary storage device 131. The function of the storage unit 120 may be realized by a storage device (not shown) on the network.

なお、携帯端末装置１０の各構成要素の処理は、１つのハードウェアで実行されてもよいし、複数のハードウェアで実行されてもよい。また、携帯端末装置１０の各構成要素の処理は、１つのプログラムで実現されてもよいし、複数のプログラムで実現されてもよい。 In addition, the process of each component of the portable terminal device 10 may be executed by one hardware, or may be executed by a plurality of hardware. Moreover, the process of each component of the portable terminal device 10 may be implement | achieved by one program, and may be implement | achieved by the some program.

図９は、ホーム画面１４０の一例を示す図である。本携帯端末装置１０では、読唇機能を用いた通話として、連続通話モードと、単発通話モードとを有している。詳しくは後述するが、連続通話モードは、通話相手との会話を行うモードであって、単発通話モードは読唇により生成したメッセージを相手に送信し、通信先からの返答を想定しないモードである。 FIG. 9 is a diagram illustrating an example of the home screen 140. The mobile terminal device 10 has a continuous call mode and a single call mode as calls using the lip reading function. As will be described in detail later, the continuous call mode is a mode in which a conversation with a call partner is performed, and the single call mode is a mode in which a message generated by lip reading is transmitted to the other party and a response from the communication destination is not assumed.

ホーム画面１４０は、連続通話モード又は単発通話モードによる通話の開始を受け付けるための画面である。ホーム画面１４０は、通話モード選択領域１４１と、設定ボタン１４２とを有する。通話モード選択領域１４１は、連続通話モードか単発通話モードかを選択可能に表示する。設定ボタン１４２は、連続通話モードと単発通話モードとの設定項目を表示するためのボタンである。設定項目については後に詳述する。 Home screen 140 is a screen for accepting the start of a call in continuous call mode or single call mode. The home screen 140 includes a call mode selection area 141 and a setting button 142. The call mode selection area 141 displays whether the continuous call mode or the single call mode can be selected. The setting button 142 is a button for displaying setting items for the continuous call mode and the single call mode. The setting items will be described in detail later.

図４は、連続通話処理の流れを示すフローチャートである。ホーム画面１４０の通話モード選択領域１４１において、連続通話モードの選択を受け付けると、本フローチャートによる処理が開始される。 FIG. 4 is a flowchart showing the flow of continuous call processing. When selection of the continuous call mode is accepted in the call mode selection area 141 of the home screen 140, the processing according to this flowchart is started.

まず、表示部１１４は、タッチパネル１３８に対して発信画面１５０を表示する（ステップＳ１１）。 First, the display unit 114 displays the transmission screen 150 on the touch panel 138 (step S11).

図１０は、発信画面１５０の一例を示す図である。発信画面１５０は、通話モード表示領域１５１と、番号選択領域１５２と、発信ボタン１５３とを備える。 FIG. 10 is a diagram illustrating an example of the transmission screen 150. The call screen 150 includes a call mode display area 151, a number selection area 152, and a call button 153.

通話モード表示領域１５１には、連続通話モードであることを示す情報が表示される。番号選択領域１５２には、通信先の電話番号を選択することができるよう、番号が選択可能に表示される。発信ボタン１５３は、選択された電話番号に対して発信を行う際に選択を受け付ける。 In the call mode display area 151, information indicating the continuous call mode is displayed. In the number selection area 152, a number is displayed so as to be selectable so that the telephone number of the communication destination can be selected. The call button 153 accepts selection when making a call to the selected telephone number.

説明を図４に戻す。次に、入力部１１６は、発信画面１５０に対する電話番号の入力と発信ボタン１５３の選択を受け付ける（ステップＳ１２）。 Returning to FIG. Next, the input unit 116 accepts input of a telephone number for the call screen 150 and selection of the call button 153 (step S12).

次に、通信部１１５は、選択された電話番号の通信先に対して発信する（ステップＳ１３）。 Next, the communication unit 115 transmits to the communication destination of the selected telephone number (step S13).

次に、通信部１１５は、通信先との通話を開始する（ステップＳ１４）。なお、通話を開始するのは、通信先が発信に対して応答し、通信が確立した場合である。通信先が応答しない場合、表示部１１４はタッチパネル１３８に、通信先が応答しないことを示す情報を表示し、処理を終了する。 Next, the communication unit 115 starts a call with the communication destination (step S14). The call is started when the communication destination responds to the call and communication is established. When the communication destination does not respond, the display unit 114 displays information indicating that the communication destination does not respond on the touch panel 138, and ends the process.

次に、画像取得部１１１は、画像の取得を開始する（ステップＳ１５）。画像取得部１１１は、通信先との通話が終了するまで画像の取得を継続する。 Next, the image acquisition unit 111 starts image acquisition (step S15). The image acquisition unit 111 continues to acquire images until the call with the communication destination ends.

次に、表示部１１４は、タッチパネル１３８に読唇画面Ａ１６０を表示する（ステップＳ１６）。なお、ステップＳ１５及びステップＳ１６の処理は、本フローチャートの順序に限定されない。 Next, the display unit 114 displays the lip reading screen A160 on the touch panel 138 (step S16). Note that the processing of step S15 and step S16 is not limited to the order of this flowchart.

図１１は、読唇画面Ａ１６０の一例を示す図である。読唇画面Ａ１６０は、通話種別表示領域１６１と、画像表示領域１６２と、テキスト表示領域１６３と、終了ボタン１６４とを含む。 FIG. 11 is a diagram showing an example of the lip reading screen A160. Lip reading screen A 160 includes a call type display area 161, an image display area 162, a text display area 163, and an end button 164.

通話種別表示領域１６１は、現在の通話モードを表示する領域である。通話種別表示領域１６１には、連続通話モードであることを示す情報が表示される。本実施形態では、読唇により生成される音声が通信先に送信され、周辺音声の通信先への送信が制限されるが、所定の条件を満たす場合には、生成される音声と周辺音声とが併せて送信される。通話種別表示領域１６１には、読唇により生成される音声を通信先に送信し、周辺音声の送信を制限するか、又は生成された音声と併せて周辺音声を送信するかの種別を示す情報が表示されてもよい。所定の条件に関しては、後に詳述する。 The call type display area 161 is an area for displaying the current call mode. In the call type display area 161, information indicating the continuous call mode is displayed. In the present embodiment, the sound generated by the lip reading is transmitted to the communication destination, and the transmission of the peripheral sound to the communication destination is limited. However, when the predetermined condition is satisfied, the generated sound and the peripheral sound are Also sent together. In the call type display area 161, information indicating the type of whether the voice generated by the lip reading is transmitted to the communication destination and the transmission of the peripheral voice is restricted or the peripheral voice is transmitted together with the generated voice. May be displayed. The predetermined condition will be described in detail later.

なお、図１１では、通話種別表示領域１６１に、「連続通話モード（読唇のみ）」と表示されている。これは、連続通話モードであって、周辺音声の送信が制限され、読唇により生成された音声のみが通信先に送信される種別であることを示す。 In FIG. 11, “continuous call mode (lip reading only)” is displayed in the call type display area 161. This indicates that the continuous call mode is a type in which the transmission of the surrounding voice is restricted and only the voice generated by the lip reading is transmitted to the communication destination.

画像表示領域１６２には、画像取得部１１１がステップＳ１５で取得を開始した画像がリアルタイムに表示される。 In the image display area 162, the image acquired by the image acquisition unit 111 in step S15 is displayed in real time.

テキスト表示領域１６３には、文字抽出部１１２が抽出した文字が表示される。テキスト表示領域１６３は選択可能に表示され、後述するＳＯＳ通信処理を開始する際に選択を受け付ける。終了ボタン１６４は、通話を終了する際に選択を受け付けるボタンである。 In the text display area 163, characters extracted by the character extraction unit 112 are displayed. The text display area 163 is displayed so as to be selectable, and accepts the selection when starting the SOS communication process described later. The end button 164 is a button for accepting selection when the call is ended.

説明を図４に戻す。次に、通信部１１５は、接続の切断指示を受け付けたか否かを判定する（ステップＳ１７）。読唇画面Ａ１６０の終了ボタン１６４が選択されると、通信部１１５は接続の切断指示を受け付けたと判定する。 Returning to FIG. Next, the communication unit 115 determines whether a connection disconnection instruction has been received (step S17). When the end button 164 of the lip reading screen A160 is selected, the communication unit 115 determines that a connection disconnection instruction has been received.

通信部１１５が、接続の切断指示を受け付けたと判定しない場合（ステップＳ１７で「ＮＯ」の場合）、文字抽出部１１２は、画像取得部１１１により取得された画像を参照し、話者の唇の動きを検出したか否かを判定する（ステップＳ１８）。 When the communication unit 115 does not determine that a connection disconnection instruction has been received (in the case of “NO” in step S17), the character extraction unit 112 refers to the image acquired by the image acquisition unit 111, and It is determined whether or not a motion has been detected (step S18).

文字抽出部１１２が、話者の唇の動きを検出した場合（ステップＳ１７で「ＹＥＳ」の場合）、文字抽出部１１２は、画像から文字を抽出し、テキストデータを生成する（ステップＳ１９）。 When the character extraction unit 112 detects the movement of the speaker's lips ("YES" in step S17), the character extraction unit 112 extracts characters from the image and generates text data (step S19).

次に、表示部１１４は、文字抽出部１１２が生成したテキストデータを読唇画面Ａ１６０に表示する（ステップＳ２０）。テキストデータは、読唇画面Ａ１６０のテキスト表示領域１６３に表示される。 Next, the display unit 114 displays the text data generated by the character extraction unit 112 on the lip reading screen A160 (step S20). The text data is displayed in the text display area 163 of the lip reading screen A160.

次に、文字抽出部１１２は、生成したテキストデータを保存する（ステップＳ２１）。テキストデータは、記憶部１２０の中の図示しないテキストデータ記憶領域に記憶される。 Next, the character extraction unit 112 stores the generated text data (step S21). The text data is stored in a text data storage area (not shown) in the storage unit 120.

次に、音声生成部１１３は、文字抽出部１１２が生成したテキストデータに基づいて音声を生成する（ステップＳ２２）。なお、ステップＳ２０〜ステップＳ２２の処理については、本フローチャートの順序に限定されない。 Next, the voice generation unit 113 generates a voice based on the text data generated by the character extraction unit 112 (step S22). In addition, about the process of step S20-step S22, it is not limited to the order of this flowchart.

次に、通信部１１５は、ステップＳ２２で生成された音声を通信先に送信する（ステップＳ２３）。なお、前述したように、通信部１１５は周辺音声の通信先への送信を制限している。 Next, the communication unit 115 transmits the voice generated in step S22 to the communication destination (step S23). Note that, as described above, the communication unit 115 restricts transmission of peripheral audio to a communication destination.

次に、文字抽出部１１２は、所定時間以上唇の動きが停止したか否かを判定する（ステップＳ２４）。 Next, the character extraction unit 112 determines whether or not the movement of the lips has stopped for a predetermined time (step S24).

文字抽出部１１２が、所定時間以上唇の動きが停止したと判定した場合（ステップＳ２４で「ＹＥＳ」の場合）、文字抽出部１１２は処理をステップＳ１７に戻す。文字抽出部１１２が、所定時間以上唇の動きが停止したと判定しない場合（ステップＳ２４で「ＮＯ」の場合）、文字抽出部１１２は処理をステップＳ１９に戻し、再度文字抽出処理を行う。 If the character extraction unit 112 determines that the movement of the lips has stopped for a predetermined time or more (in the case of “YES” in step S24), the character extraction unit 112 returns the process to step S17. If the character extraction unit 112 does not determine that the movement of the lips has stopped for a predetermined time or more (in the case of “NO” in step S24), the character extraction unit 112 returns the process to step S19 and performs the character extraction process again.

通信部１１５が、接続の切断指示を受け付けたと判定した場合（ステップＳ１７で「ＹＥＳ」の場合）、通信部１１５は、接続を切断する（ステップＳ２５）。通信部１１５は、その後本フローチャートの処理を終了する。 When it is determined that the communication unit 115 has received a connection disconnection instruction (“YES” in step S17), the communication unit 115 disconnects the connection (step S25). The communication unit 115 thereafter ends the process of this flowchart.

なお、通信先との通話中に、音声取得部１１８は話者の声を含む周辺音声を取得している。周辺音声の音量が予め定められた一定量以上になると、音声取得部１１８は通信部１１５に通知を行う。通信部１１５は、音声取得部１１８からの通知に基づいて、音声生成部１１３により生成された音声に加え、音声取得部１１８が取得した周辺音声を通信先に送信する。同様に、周辺音声が所定の音量以下になった場合、通信部１１５は周辺音声の送信を制限し、音声生成部１１３により生成された音声のみを通信先に送信する。これにより、話者が発声の有無等によって音声の送信又は非送信を切り替えることができ、利便性が向上する。 Note that the voice acquisition unit 118 acquires ambient voice including the voice of the speaker during a call with the communication destination. When the surrounding sound volume exceeds a predetermined amount, the sound acquisition unit 118 notifies the communication unit 115. Based on the notification from the voice acquisition unit 118, the communication unit 115 transmits the peripheral voice acquired by the voice acquisition unit 118 to the communication destination in addition to the voice generated by the voice generation unit 113. Similarly, when the surrounding sound falls below a predetermined volume, the communication unit 115 restricts the transmission of the surrounding sound and transmits only the sound generated by the sound generating unit 113 to the communication destination. Thereby, the voice can be switched between transmission and non-transmission depending on the presence or absence of the utterance, and the convenience is improved.

付言すれば、周辺音声の通信先への送信を制限するか否かは、話者の入力に依るものであってもよい。例えば、入力部１１６が、読唇画面Ａ１６０の通話種別表示領域１６１の選択を受け付けることで、周辺音声の送信又は非送信を切り替えてもよい。さらに、読唇により生成された音声を通信先に送信せず、通常の電話と同様に周辺音声のみを通信先に送信するよう、選択を受け付けるものであってもよい。 In other words, whether or not to limit the transmission of the surrounding voice to the communication destination may depend on the input of the speaker. For example, the input unit 116 may switch between transmission and non-transmission of peripheral voices by receiving selection of the call type display area 161 of the lip reading screen A160. Further, the selection may be accepted so that the voice generated by the lip reading is not transmitted to the communication destination, but only the peripheral voice is transmitted to the communication destination in the same manner as a normal telephone.

上述の実施形態では、携帯端末装置１０から通信先に発信を行うことにより連続通話処理を開始しているが、通信先が携帯端末装置１０に発信を行い、携帯端末装置１０で発信を着信することにより通話が開始される場合についても、同様の処理が行われる。 In the above-described embodiment, continuous call processing is started by making a call from the mobile terminal device 10 to the communication destination, but the communication destination makes a call to the mobile terminal device 10 and receives the call at the mobile terminal device 10. The same processing is also performed when a call is started.

本実施形態では、口の動きから発話内容を抽出し、音声にして通信先に送信する。これにより、発声が困難な場合であっても会話を行うことができる。この際、自分の口の動きが撮影された画像を参照し、どのような文字が抽出されたかを確認することができる。これにより、話者は口の動かし方によりどのように文字が抽出されるかを認識することができ、読唇による発話内容の抽出の精度が向上する。 In the present embodiment, the utterance content is extracted from the movement of the mouth, and is transmitted as a voice to the communication destination. Thereby, it is possible to have a conversation even when it is difficult to speak. At this time, it is possible to confirm what character has been extracted by referring to an image of the movement of his / her mouth. As a result, the speaker can recognize how the characters are extracted depending on how the mouth is moved, and the accuracy of extraction of the utterance content by lip reading is improved.

図５は、単発通話処理の流れを示すフローチャート（その１）である。ホーム画面１４０の通話モード選択領域１４１において、単発通話モードの選択を受け付けると、本フローチャートによる処理が開始される。 FIG. 5 is a flowchart (part 1) showing the flow of single call processing. When the selection of the single call mode is accepted in the call mode selection area 141 of the home screen 140, the processing according to this flowchart is started.

まず、表示部１１４は、発信画面１５０を表示する（ステップＳ３１）。 First, the display part 114 displays the transmission screen 150 (step S31).

次に、通信部１１５は、電話番号の入力と発信ボタン１５３の選択を受け付ける（ステップＳ３２）。ステップＳ３１及びステップＳ３２で行われる処理は、ステップＳ１１及びステップＳ１２で行われる処理と同様であるため、説明を省略する。 Next, the communication unit 115 accepts input of a telephone number and selection of the call button 153 (step S32). Since the processing performed in step S31 and step S32 is the same as the processing performed in step S11 and step S12, description thereof is omitted.

次に、画像取得部１１１は、カメラユニット１３７を用いた画像の取得を開始する（ステップＳ３３）。画像取得部１１１は、後に説明するテキスト表示領域の選択を受け付けるまで、画像の取得を継続する。 Next, the image acquisition unit 111 starts acquiring an image using the camera unit 137 (step S33). The image acquisition unit 111 continues to acquire an image until selection of a text display area described later is received.

次に、表示部１１４は、読唇画面Ｂ１７０を表示する。 Next, the display unit 114 displays the lip reading screen B170.

図１２は、読唇画面Ｂ１７０の一例を示す図である。読唇画面Ｂ１７０は、通話種別表示領域１７１と、画像表示領域１７２と、テキスト表示領域１７３と、入力終了ボタン１７４と、キャンセルボタン１７５とを含む。通話種別表示領域１７１と、画像表示領域１７２とに表示される対象は、読唇画面Ａ１６０と同様であるため、説明を省略する。 FIG. 12 shows an example of the lip reading screen B170. Lip reading screen B170 includes a call type display area 171, an image display area 172, a text display area 173, an input end button 174, and a cancel button 175. Since the objects displayed in the call type display area 171 and the image display area 172 are the same as those in the lip reading screen A160, description thereof is omitted.

テキスト表示領域１７３には、文字抽出部１１２が画像から抽出した文字をテキスト化したものが表示される。テキスト表示領域１７３は、選択可能に表示される。テキスト表示領域１７３は、その選択方法によって処理が異なる。例えば、読唇によるテキストデータの生成中にテキスト表示領域が選択された場合には、テキストデータの生成を終了する。テキストデータの生成終了後に、テキスト表示領域１７３がタップなど短い継続時間で選択された場合には、図示しないテキスト訂正画面が表示され、テキストの修正が可能となる。選択が長押しなど所定の長さを超える継続時間で行われた場合には、後述するＳＯＳ通信処理が開始される。 In the text display area 173, the text extracted from the image by the character extraction unit 112 is displayed. The text display area 173 is displayed in a selectable manner. The processing of the text display area 173 differs depending on the selection method. For example, when the text display area is selected during the generation of text data by lip reading, the generation of text data is terminated. When the text display area 173 is selected with a short duration such as a tap after the generation of the text data is completed, a text correction screen (not shown) is displayed and the text can be corrected. When the selection is performed for a duration exceeding a predetermined length, such as a long press, an SOS communication process described later is started.

なお、テキストデータの生成の終了や、テキスト訂正画面への画面遷移、又はＳＯＳ通信処理の開始は、上述の実施形態に限定されるものではない。例えば、読唇画面Ｂ１７０に、テキスト訂正専用のボタンや、ＳＯＳ通信処理開始専用のボタンを設けてもよい。 Note that the end of the generation of the text data, the screen transition to the text correction screen, or the start of the SOS communication process is not limited to the above-described embodiment. For example, a button dedicated to text correction or a button dedicated to starting SOS communication processing may be provided on the lip reading screen B170.

入力終了ボタン１７４は、読唇による文字の入力の終了を受け付けるためのボタンである。キャンセルボタン１７５は、入力をキャンセルする場合に選択を受け付けるボタンである。キャンセルボタン１７５が選択されると、表示画面は読唇画面Ｂ１７０からホーム画面１４０に遷移する。 The input end button 174 is a button for accepting the end of character input by lip reading. The cancel button 175 is a button for accepting selection when canceling input. When the cancel button 175 is selected, the display screen changes from the lip reading screen B170 to the home screen 140.

説明を図５に戻す。次に、文字抽出部１１２は、画像取得部１１１により取得された画像に基づいて、唇の動きを検出したか否かを判定する（ステップＳ３５）。文字抽出部１１２が、唇の動きを検出しない場合（ステップＳ３５で「ＮＯ」の場合）、文字抽出部１１２は、処理をステップＳ３５に戻し、唇の動きを検出するまで画像を監視する。 Returning to FIG. Next, the character extraction unit 112 determines whether or not the movement of the lips is detected based on the image acquired by the image acquisition unit 111 (step S35). When the character extraction unit 112 does not detect the lip movement (in the case of “NO” in step S35), the character extraction unit 112 returns the process to step S35 and monitors the image until the lip movement is detected.

文字抽出部１１２が、唇の動きを検出した場合（ステップＳ３５で「ＹＥＳ」の場合）、文字抽出部１１２は、取得画像からテキストデータを生成する（ステップＳ３６）。ステップＳ３６及びステップＳ３７で行われる処理については、ステップＳ１９及びステップＳ２０の処理と同様であるため、説明を省略する。 When the character extraction unit 112 detects the movement of the lips (if “YES” in step S35), the character extraction unit 112 generates text data from the acquired image (step S36). The processing performed in step S36 and step S37 is the same as the processing in step S19 and step S20, and thus description thereof is omitted.

次に、文字抽出部１１２は、表示している読唇画面Ｂ１７０に含まれるテキスト表示領域１７３の選択を受け付けたか否かを判定する（ステップＳ３８）。テキスト表示領域１７３には、話者の唇の動きから抽出された文字が表示されているが、読唇による文字の抽出を終了させる場合に、テキスト表示領域１７３の選択を受け付ける。 Next, the character extraction unit 112 determines whether selection of the text display area 173 included in the displayed lip reading screen B170 has been received (step S38). In the text display area 173, characters extracted from the movement of the speaker's lips are displayed. When the extraction of characters by lip reading is to be terminated, selection of the text display area 173 is accepted.

文字抽出部１１２が、テキスト表示領域１７３の選択を受け付けたと判定した場合（ステップＳ３８で「ＹＥＳ」の場合）、文字抽出部１１２は、テキストデータの生成を終了する（ステップＳ３９）。 If the character extraction unit 112 determines that the selection of the text display area 173 has been accepted (“YES” in step S38), the character extraction unit 112 ends the generation of text data (step S39).

文字抽出部１１２が、テキスト表示領域１７３の選択を受け付けたと判定しない場合（ステップＳ３８で「ＮＯ」の場合）、文字抽出部１１２は、処理をステップＳ３５に戻す。 If the character extraction unit 112 does not determine that the selection of the text display area 173 has been received (“NO” in step S38), the character extraction unit 112 returns the process to step S35.

次に、文字抽出部１１２は、テキスト表示領域１７３の選択をさらに受け付けたか否かを判定する（ステップＳ４０）。 Next, the character extraction unit 112 determines whether or not further selection of the text display area 173 has been received (step S40).

文字抽出部１１２が、テキスト表示領域１７３の選択を受け付けたと判定した場合（ステップＳ３８で「ＹＥＳ」の場合）、文字抽出部１１２は、テキストデータの訂正を受け付ける（ステップＳ４１）。具体的には、文字抽出部１１２は、テキスト表示領域１７３の選択を受け付けると、図示しないテキスト訂正画面を表示する。テキスト訂正画面は、表示された文字の訂正を受け付けるための画面であって、例えばキーボード等の入力画面である。文字抽出部１１２は、テキスト訂正画面への入力を受け付ける。 If the character extraction unit 112 determines that the selection of the text display area 173 has been accepted (“YES” in step S38), the character extraction unit 112 accepts correction of text data (step S41). Specifically, when receiving the selection of the text display area 173, the character extraction unit 112 displays a text correction screen (not shown). The text correction screen is a screen for accepting correction of displayed characters, and is an input screen such as a keyboard. The character extraction unit 112 receives an input to the text correction screen.

文字抽出部１１２が、テキスト表示領域１７３の選択を受け付けたと判定しない場合（ステップＳ３８で「ＮＯ」の場合）、文字抽出部１１２は、処理をステップＳ４２に進める。 If the character extraction unit 112 does not determine that the selection of the text display area 173 has been accepted (“NO” in step S38), the character extraction unit 112 advances the process to step S42.

次に、文字抽出部１１２は、読唇画面Ｂ１７０に表示された入力終了ボタン１７４の選択を受け付ける（ステップＳ４２）。 Next, the character extraction unit 112 accepts selection of the input end button 174 displayed on the lip reading screen B170 (step S42).

次に、文字抽出部１１２は、テキストデータを記憶部１２０のテキストデータ記憶領域に保存する（ステップＳ４３）。 Next, the character extraction unit 112 stores the text data in the text data storage area of the storage unit 120 (step S43).

図６は、単発通話処理の流れを示すフローチャート（その２）である。 FIG. 6 is a flowchart (part 2) showing the flow of the single call processing.

次に、通信部１１５は、通信先に対して発信する（ステップＳ４４）。通信部１１５は、ステップＳ３２で入力を受け付けた電話番号に対して発信する。 Next, the communication part 115 transmits with respect to a communication destination (step S44). Communication unit 115 makes a call to the telephone number received in step S32.

次に、通信部１１５は、通信先から応答があるか否かを判定する（ステップＳ４５）。 Next, the communication unit 115 determines whether or not there is a response from the communication destination (step S45).

通信部１１５が、通信先から応答があると判定した場合（ステップＳ４５で「ＹＥＳ」の場合）、通信部１１５は、通信先との通信を確立し、通話を開始する（ステップＳ４６）。 If the communication unit 115 determines that there is a response from the communication destination (“YES” in step S45), the communication unit 115 establishes communication with the communication destination and starts a call (step S46).

次に、音声生成部１１３は、ステップＳ４３でテキストデータ記憶領域に保存されたテキストデータから音声データを生成する（ステップＳ４７）。なお、ステップＳ４５の処理は、通信先との通話開始前に行われてもよい。ステップＳ４３のテキストデータ保存後に行われるものであればよい。 Next, the voice generation unit 113 generates voice data from the text data stored in the text data storage area in step S43 (step S47). Note that the process of step S45 may be performed before the start of a call with the communication destination. Anything can be used as long as it is performed after the text data is saved in step S43.

次に、通信部１１５は、ステップＳ４７で生成した音声データを通信先に送信する。 Next, the communication unit 115 transmits the audio data generated in step S47 to the communication destination.

次に、通信部１１５は、接続を切断する（ステップＳ４９）。その後、通信部１１５は本フローチャートの処理を終了する。 Next, the communication unit 115 disconnects the connection (step S49). Thereafter, the communication unit 115 ends the process of this flowchart.

通信部１１５が、通信先から応答があると判定しない場合（ステップＳ４５で「ＮＯ」の場合）、通信部１１５は、発信回数が所定数に達したか否かを判定する（ステップＳ５０）。 If the communication unit 115 does not determine that there is a response from the communication destination (“NO” in step S45), the communication unit 115 determines whether or not the number of outgoing calls has reached a predetermined number (step S50).

通信部１１５が、発信回数が所定数に達したと判定した場合（ステップＳ５０で「ＹＥＳ」の場合）、通信部１１５は本フローチャートの処理を終了する。 When the communication unit 115 determines that the number of outgoing calls has reached the predetermined number (in the case of “YES” in step S50), the communication unit 115 ends the process of this flowchart.

通信部１１５が、発信回数が所定数に達したと判定しない場合（ステップＳ５０で「ＮＯ」の場合）、通信部１１５は、再度通信先に対して発信する（ステップＳ５１）。その後、通信部１１５は処理をステップＳ４５に戻す。 When the communication unit 115 does not determine that the number of transmissions has reached the predetermined number (in the case of “NO” in step S50), the communication unit 115 transmits again to the communication destination (step S51). Thereafter, the communication unit 115 returns the process to step S45.

本実施形態では、単発通話モードの際には読唇に基づいて生成された音声を通信先に送信し、通話を終了する。その際に、読唇に基づいて生成された文字を訂正することができる。これにより、例えば発声できない状況にある話者が応答を必要としないメッセージを送信したい場合等、話者の都合に応じた読唇による通信を実現することができる。 In the present embodiment, in the single call mode, the voice generated based on the lip reading is transmitted to the communication destination, and the call is terminated. At that time, the character generated based on the lip reading can be corrected. Thereby, for example, when a speaker who cannot speak is desired to transmit a message that does not require a response, communication by lip reading according to the convenience of the speaker can be realized.

図７は、連続通話モード中のＳＯＳ通信処理の流れを示すフローチャートである。ＳＯＳ通信とは、読唇により抽出した文字が、ＳＯＳ設定情報１２１に予め設定されたメッセージと対応する場合に、所定の連絡先に通知することを示す。本フローチャートによる処理は、後述する読唇通信設定画面においてＳＯＳ通信がＯＮに設定されており、かつ連続通話モードにおいて通信先との通話が行われている状態で実行される。なお、連続通話モードにおいて通信先との通話が行われていない場合は、後述する単発通話モード中のＳＯＳ通信処理と同様の処理が行われる。タッチパネル１３８には、読唇画面Ａ１６０が表示されている。 FIG. 7 is a flowchart showing the flow of SOS communication processing during the continuous call mode. The SOS communication indicates that a predetermined contact is notified when characters extracted by lip reading correspond to a message preset in the SOS setting information 121. The processing according to this flowchart is executed in a state where SOS communication is set to ON in the lip reading communication setting screen described later and a call with the communication destination is performed in the continuous call mode. In addition, when a call with the communication destination is not performed in the continuous call mode, a process similar to the SOS communication process in the single call mode described later is performed. On the touch panel 138, a lip reading screen A160 is displayed.

まず、ＳＯＳ発信指示受付部１１７が、読唇画面Ａ１６０のテキスト表示領域１６３の選択を受け付ける（ステップＳ６１）。 First, the SOS transmission instruction receiving unit 117 receives the selection of the text display area 163 of the lip reading screen A160 (step S61).

次に、文字抽出部１１２は、画像取得部１１１が取得した画像に基づいて、予め定められた一定時間以上話者が口を閉じた状態であるか否かを判定する（ステップＳ６２）。この判定は、仮にテキスト表示領域１６３の選択が話者の意図しないものであった場合に、予測しないＳＯＳ通信が発信されることを防ぐために行われる。なお、判定に用いられる口の形は閉じた状態に限られず、予め定められた所定の形であればよい。 Next, based on the image acquired by the image acquisition unit 111, the character extraction unit 112 determines whether or not the speaker has closed his mouth for a predetermined period of time (step S62). This determination is performed to prevent an unexpected SOS communication from being transmitted if the selection of the text display area 163 is not intended by the speaker. Note that the shape of the mouth used for the determination is not limited to the closed state, and may be any predetermined shape.

文字抽出部１１２が、一定時間以上話者が口を閉じた状態であると判定しない場合（ステップＳ６２で「ＮＯ」の場合）、文字抽出部１１２は、本フローチャートの処理を終了する。具体的には、文字抽出部１１２が、テキスト表示領域１６３選択後所定時間内に、一定時間口を閉じた状態であることを検出しない場合に、本フローチャートの処理を終了する。テキスト表示領域１６３の選択は、ＳＯＳ通信が意図されて行われたものではなく、例えば頬が当たるなどして偶然選択されたものと考えられるので、携帯端末装置１０では連続通話モードによる通話が引き続き行われる。 If the character extraction unit 112 does not determine that the speaker has closed his mouth for a certain period of time (“NO” in step S62), the character extraction unit 112 ends the processing of this flowchart. Specifically, when the character extraction unit 112 does not detect that the mouth is closed for a certain period of time within a predetermined time after the text display area 163 is selected, the process of this flowchart is terminated. The selection of the text display area 163 is not intended for SOS communication, but is considered to have been selected by chance, for example, by hitting the cheek. Done.

文字抽出部１１２が、一定時間以上話者が口を閉じた状態であると判定した場合（ステップＳ６２で「ＹＥＳ」の場合）、文字抽出部１１２は、画像取得部１１１が取得した画像に基づいて、唇の動きを検出したか否かを判定する（ステップＳ６３）。文字抽出部１１２が、唇の動きを検出したと判定しない場合（ステップＳ６３で「ＮＯ」の場合）、文字抽出部１１２は処理を再度ステップＳ６３の処理を行い、唇の動きを検出するまで画像を監視する。 When the character extraction unit 112 determines that the speaker has closed his mouth for a certain period of time (“YES” in step S62), the character extraction unit 112 is based on the image acquired by the image acquisition unit 111. Then, it is determined whether or not the movement of the lips has been detected (step S63). When the character extraction unit 112 does not determine that the movement of the lips has been detected (“NO” in step S63), the character extraction unit 112 performs the process again in step S63, and the image is detected until the movement of the lips is detected. To monitor.

文字抽出部１１２が、唇の動きを検出したと判定した場合（ステップＳ６３で「ＹＥＳ」の場合）、文字抽出部１１２は、取得画像からテキストデータを作成する（ステップＳ６４）。 If the character extraction unit 112 determines that the movement of the lips has been detected (“YES” in step S63), the character extraction unit 112 creates text data from the acquired image (step S64).

次に、文字抽出部１１２は、ＳＯＳ設定情報１２１の読唇パターンを検出したか否かを判定する（ステップＳ６５）。具体的には、文字抽出部１１２は、ステップＳ６４で作成したテキストデータを用いてＳＯＳ設定情報１２１を参照し、ＳＯＳ設定情報１２１の読唇パターンのいずれかがテキストデータに含まれるか否かを判定する。例えば、作成したテキストデータに、「たすけて」という文字が含まれる場合、文字抽出部１１２は、図２に示すＳＯＳ設定情報１２１の読唇パターンの一つがテキストデータに含まれると判定する。 Next, the character extraction unit 112 determines whether or not a lip reading pattern in the SOS setting information 121 has been detected (step S65). Specifically, the character extraction unit 112 refers to the SOS setting information 121 using the text data created in step S64, and determines whether any of the lip reading patterns of the SOS setting information 121 is included in the text data. To do. For example, when the created text data includes a character “Takesuke”, the character extraction unit 112 determines that one of the lip reading patterns of the SOS setting information 121 illustrated in FIG. 2 is included in the text data.

文字抽出部１１２が、ＳＯＳ設定情報１２１の読唇パターンを検出したと判定しない場合（ステップＳ６５で「ＮＯ」の場合）、文字抽出部１１２は本フローチャートの処理を終了する。 When the character extraction unit 112 does not determine that the lip reading pattern of the SOS setting information 121 has been detected (“NO” in step S65), the character extraction unit 112 ends the process of this flowchart.

文字抽出部１１２が、ＳＯＳ設定情報１２１の読唇パターンを検出したと判定した場合（ステップＳ６５で「ＹＥＳ」の場合）、通信部１１５は、継続している接続を切断する。（ステップＳ６６）。 If the character extraction unit 112 determines that the lip reading pattern of the SOS setting information 121 has been detected (“YES” in step S65), the communication unit 115 disconnects the ongoing connection. (Step S66).

次に、通信部１１５は、ＳＯＳ設定情報１２１のＳＯＳ通信先に発信する（ステップＳ６７）。具体的には、通信部１１５は、ステップＳ６４で作成したテキストデータに含まれる読唇パターンとＳＯＳ設定情報１２１において関連付けられたＳＯＳ通信先に対して発信する。上述の例で説明すると、テキストデータに「たすけて」という文字が含まれるため、通信部１１５は図２のＳＯＳ設定情報１２１において読唇データ「たすけて」と関連付けられたＳＯＳ通信先である「000-0000-****」に対して発信を行う。 Next, the communication part 115 transmits to the SOS communication destination of the SOS setting information 121 (step S67). Specifically, the communication unit 115 transmits to the SOS communication destination associated with the lip reading pattern included in the text data created in step S64 and the SOS setting information 121. In the above example, since the text data includes the characters “Takesuke”, the communication unit 115 is “000-0000”, which is the SOS communication destination associated with the lip reading data “Takesuke” in the SOS setting information 121 of FIG. -**** ".

次に、通信部１１５は、発信番号との通話を開始する（ステップＳ６８）。その後、通信部１１５は本フローチャートの処理を終了する。なお、通話は通信先の応答があった場合に開始され、通信先の応答がない場合は、通信部１１５は通話を行わずに本フローチャートの処理を終了する。ＳＯＳ通信先への発信は一度に限定されるものではなく、予め定められた一定数、ＳＯＳ通信先への発信を行うものであってもよい。 Next, the communication unit 115 starts a call with the calling number (step S68). Thereafter, the communication unit 115 ends the process of this flowchart. The call is started when there is a response from the communication destination, and when there is no response from the communication destination, the communication unit 115 ends the process of this flowchart without performing the call. Transmission to the SOS communication destination is not limited to one time, and a predetermined number of transmissions to the SOS communication destination may be performed.

本実施形態により、通信先との会話中に緊急な事情が発生した場合に、直ちに接続を切断してＳＯＳの通信先に発信を行う。この際、話者は発声する必要がないため、周囲に気づかれずにＳＯＳ通信先に発信することができる。 According to the present embodiment, when an urgent situation occurs during a conversation with a communication destination, the connection is immediately disconnected and a call is made to the SOS communication destination. At this time, since the speaker does not need to speak, the speaker can make a call to the SOS communication destination without being noticed by the surroundings.

図８は、単発通話モード中のＳＯＳ通信処理の流れを示すフローチャートである。本フローチャートによる処理は、後述する読唇通信設定画面１８０においてＳＯＳ通信がＯＮに設定されており、かつ単発通話モードであって、通信先との通話が行われていない状態で実行される。タッチパネル１３８には、読唇画面Ｂ１７０が表示されている。なお、単発通話モードであって通信先との間で通話が行われている場合は、図７に示す連続通話モード中のＳＯＳ通信処理と同様の処理が行われる。 FIG. 8 is a flowchart showing the flow of SOS communication processing during the single call mode. The processing according to this flowchart is executed in a state where the SOS communication is set to ON on the lip reading communication setting screen 180, which will be described later, and in the single call mode, and no call is made with the communication destination. On the touch panel 138, a lip reading screen B170 is displayed. In the single call mode and when a call is made with the communication destination, the same process as the SOS communication process in the continuous call mode shown in FIG. 7 is performed.

まず、ＳＯＳ発信指示受付部１１７は、読唇画面Ｂ１７０のテキスト表示領域１７３（長押し）の選択を受け付ける（ステップＳ７１）。ＳＯＳ発信指示受付部１１７は、予め定められた一定時間以上継続してテキスト表示領域１７３が選択された場合に、ＳＯＳ発信指示を受け付ける。 First, the SOS transmission instruction receiving unit 117 receives a selection of the text display area 173 (long press) of the lip reading screen B170 (step S71). The SOS transmission instruction receiving unit 117 receives an SOS transmission instruction when the text display area 173 is selected continuously for a predetermined time or more.

次に、文字抽出部１１２は、画像取得部１１１が取得した画像に基づいて、一定時間以上口を閉じた状態であるか否かを判定する（ステップＳ７２）。ステップＳ７２からステップＳ７５までの間で行われる処理は、ステップＳ６２からステップＳ６５までの間で行われる処理と同様であるため、説明を省略する。 Next, based on the image acquired by the image acquisition unit 111, the character extraction unit 112 determines whether or not the mouth has been closed for a certain period of time (step S72). Since the process performed between step S72 and step S75 is the same as the process performed between step S62 and step S65, description thereof is omitted.

次に、通信部１１５は、ＳＯＳ設定情報１２１のＳＯＳ通信先に発信する（ステップＳ７６）。ステップＳ７６及びステップＳ７７で行われる処理については、ステップＳ６７及びステップＳ６８で行われる処理と同様であるため、説明を省略する。 Next, the communication part 115 transmits to the SOS communication destination of the SOS setting information 121 (step S76). The processing performed in step S76 and step S77 is the same as the processing performed in step S67 and step S68, and thus description thereof is omitted.

本実施形態では、単発通話モードであっても、所定のメッセージが読唇されることによって登録された通信先に対して発信が行われる。例えば単発通話モードで読唇を行っている最中に緊急の事情が発生した場合等に、至急予め登録された通信先に発信することができ、利便性が向上する。 In the present embodiment, even in the single call mode, a call is made to a registered communication destination by reading a predetermined message. For example, when an emergency situation occurs during the lip reading in the single call mode, it is possible to make a call to a communication destination registered in advance, and convenience is improved.

図１３は、読唇通信設定画面１８０の一例を示す図である。図９に示すホーム画面１４０において、設定ボタン１４２が選択された場合に、本図の読唇通信設定画面１８０に表示画面が遷移する。 FIG. 13 is a diagram illustrating an example of the lip reading communication setting screen 180. When the setting button 142 is selected on the home screen 140 shown in FIG. 9, the display screen transitions to the lip reading communication setting screen 180 in this figure.

読唇通信設定画面１８０は、設定項目として、リトライ設定ボタン１８１と、ＳＯＳ設定ボタン１８２と、読唇履歴表示ボタン１８３とを含む。 Lip reading communication setting screen 180 includes a retry setting button 181, an SOS setting button 182, and a lip reading history display button 183 as setting items.

リトライ設定ボタン１８１は、単発通話モードにおいて通信先に発信が行われたにも関わらず、通信先からの応答がない場合に、再発信を行う回数の上限を設定するためのボタンである。リトライ設定ボタン１８１を選択することにより、図示しないリトライ回数入力画面が表示され、リトライ回数の上限の設定が可能となる。 The retry setting button 181 is a button for setting an upper limit of the number of times to perform a re-transmission when there is no response from the communication destination in spite of the transmission to the communication destination in the single call mode. By selecting the retry setting button 181, a retry count input screen (not shown) is displayed, and an upper limit of the retry count can be set.

ＳＯＳ設定ボタン１８２は、ＳＯＳ設定情報１２１を生成する場合に選択を受け付けるボタンである。ＳＯＳ設定ボタン１８２が選択されると、後述するＳＯＳ設定画面１９０へと表示画面が遷移する。 The SOS setting button 182 is a button for accepting selection when the SOS setting information 121 is generated. When the SOS setting button 182 is selected, the display screen changes to an SOS setting screen 190 described later.

読唇履歴表示ボタン１８３は、文字抽出部１１２により抽出された文字の履歴を表示するためのボタンである。読唇履歴表示ボタン１８３が選択されると、後述する読唇履歴表示画面へと表示画面が遷移する。 The lip reading history display button 183 is a button for displaying a history of characters extracted by the character extraction unit 112. When the lip reading history display button 183 is selected, the display screen transitions to a lip reading history display screen described later.

図１４は、ＳＯＳ設定画面１９０の一例を示す図である。ＳＯＳ設定画面１９０は、ＯＮ／ＯＦＦボタン１９１と、ＳＯＳ設定表示領域１９２とを含む。 FIG. 14 is a diagram illustrating an example of the SOS setting screen 190. The SOS setting screen 190 includes an ON / OFF button 191 and an SOS setting display area 192.

ＯＮ／ＯＦＦボタン１９１は、ＳＯＳ通信処理を行うか否かの選択を受け付けるボタンである。ＯＮ／ＯＦＦボタン１９１がＯＦＦに設定されている場合は、図７及び図８に示すＳＯＳ通信処理は開始されない。 The ON / OFF button 191 is a button for accepting selection as to whether or not to perform SOS communication processing. When the ON / OFF button 191 is set to OFF, the SOS communication process shown in FIGS. 7 and 8 is not started.

ＳＯＳ設定表示領域１９２は、ＳＯＳ設定情報１２１を表示する領域である。ＳＯＳ設定画面１９０では、ＳＯＳ設定情報１２１の生成及び編集が可能である。例えば、ＳＯＳ設定画面１９０において新規登録ボタンを選択することで、新たな読唇パターンとＳＯＳ通信先との組み合わせが入力可能となる。変更ボタンを選択することで、現在表示されているＳＯＳ設定情報１２１の編集が可能となる。入力部１１６は、新規登録又は編集された情報を用いてＳＯＳ設定情報１２１を生成する。 The SOS setting display area 192 is an area for displaying the SOS setting information 121. On the SOS setting screen 190, the SOS setting information 121 can be generated and edited. For example, by selecting a new registration button on the SOS setting screen 190, a combination of a new lip reading pattern and an SOS communication destination can be input. By selecting the change button, the currently displayed SOS setting information 121 can be edited. The input unit 116 generates the SOS setting information 121 using newly registered or edited information.

図１５は、読唇履歴表示画面２００の一例を示す図である。読唇履歴表示画面２００には、記憶部１２０のテキストデータ記憶領域に格納された文字の履歴情報が表示される。なお、テキストデータが訂正された場合は、訂正された文字が表示される。 FIG. 15 is a diagram illustrating an example of the lip reading history display screen 200. The lip reading history display screen 200 displays character history information stored in the text data storage area of the storage unit 120. If the text data is corrected, the corrected character is displayed.

読唇履歴表示画面２００は、項番と、モード表示領域（図１５において「Ｍ」と表示）、日付と、時刻と、リトライ回数と、テキストと、削除入力受付領域とを含む。 Lip reading history display screen 200 includes an item number, a mode display area (displayed as “M” in FIG. 15), a date, a time, the number of retries, a text, and a deletion input acceptance area.

項番は、履歴情報を識別するための識別情報である。モード表示領域は、読唇又はテキスト訂正画面により入力された文字が連続通話モードにおいて入力されたのか、又は単発通話モードで入力されたのかを示す情報が表示される。日付及び時刻は、文字が入力された日付及び時刻が表示される。リトライ回数は、入力された文字について通信先に発信された回数が表示される。例えば図１５の項番「１」の履歴情報において、リトライ回数は「３／５」と格納されているが、これは上述の読唇通信設定画面１８０において設定されたリトライ回数の上限が「５」であるのに対し、「３」回発信された後に通信先に送信されたか、又は発信がキャンセルされたことを示す。 The item number is identification information for identifying history information. In the mode display area, information indicating whether the characters input on the lip reading or text correction screen are input in the continuous call mode or the single call mode is displayed. As the date and time, the date and time when characters are input are displayed. As the number of retries, the number of times the input character is transmitted to the communication destination is displayed. For example, in the history information of item number “1” in FIG. 15, the number of retries is stored as “3/5”. This is because the upper limit of the number of retries set on the lip reading communication setting screen 180 is “5”. On the other hand, it is transmitted to the communication destination after being transmitted “3” times, or the transmission is canceled.

テキストには、入力された文字が表示される。削除入力受付領域は、表示された履歴情報の削除指示を受け付けるための領域であって、例えば削除するテキストの選択を受け付けるチェックボックスが表示される。削除入力受付領域において削除するテキストが選択され、実行ボタンが選択されると、チェックのついた履歴情報が削除される。 The input character is displayed in the text. The deletion input reception area is an area for receiving an instruction to delete the displayed history information. For example, a check box for receiving selection of text to be deleted is displayed. When the text to be deleted is selected in the deletion input receiving area and the execution button is selected, the history information with a check is deleted.

以上、本実施形態では、発話の画像に基づいて文字の抽出が行われ、抽出された文字を用いて生成された音声を通信先に送信することにより、発声をせずとも意図した内容を相手に伝えることができる。また、通話中に緊急の事情が生じた場合であっても、予め登録された通信先に直ちに連絡することができ、利便性が向上する。 As described above, in the present embodiment, characters are extracted based on the utterance image, and the voice generated using the extracted characters is transmitted to the communication destination, so that the intended content can be obtained without speaking. Can tell. Further, even when an emergency situation occurs during a call, it is possible to immediately contact a pre-registered communication destination, which improves convenience.

以上、本発明に係る実施形態の説明を行ってきたが、本発明は、上記した実施形態の一例に限定されるものではなく、様々な変形例が含まれる。 As mentioned above, although embodiment which concerns on this invention has been demonstrated, this invention is not limited to an example of above-described embodiment, Various modifications are included.

例えば、本携帯端末装置１０は、読唇により抽出した文字に基づいてＥメールを生成し、通信先に送信するものであってもよい。 For example, the mobile terminal device 10 may generate an e-mail based on characters extracted by lip reading and transmit it to a communication destination.

また例えば、文字抽出部１１２は、画像取得部１１１が取得した画像に基づいて文字を抽出するものであったが、例えばさらに音声取得部１１８が取得した音声を補助的に用いて、文字を抽出するものであってもよい。これにより、発話画像のみに基づいて文字を抽出する場合に比べて、さらに抽出の精度が向上する。 In addition, for example, the character extraction unit 112 extracts characters based on the image acquired by the image acquisition unit 111. However, for example, the character extraction unit 112 further extracts characters by using the audio acquired by the audio acquisition unit 118 as an auxiliary. You may do. Thereby, the accuracy of extraction is further improved as compared with the case where characters are extracted based only on the speech image.

また例えば、上記した実施形態の一例は、本発明を分かり易くするために詳細に説明したものであり、本発明は、ここで説明した全ての構成を備えるものに限定されない。また、ある実施形態の一例の構成の一部を他の一例の構成に置き換えることが可能である。また、ある実施形態の一例の構成に他の一例の構成を加えることも可能である。また、各実施形態の一例の構成の一部について、他の構成の追加・削除・置換をすることもできる。また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、図中の制御線や情報線は、説明上必要と考えられるものを示しており、全てを示しているとは限らない。ほとんど全ての構成が相互に接続されていると考えてもよい。 Further, for example, the above-described exemplary embodiment has been described in detail for easy understanding of the present invention, and the present invention is not limited to the one having all the configurations described here. A part of the configuration of an example of an embodiment can be replaced with the configuration of another example. Moreover, it is also possible to add the structure of another example to the structure of an example of a certain embodiment. In addition, for a part of the configuration of an example of each embodiment, another configuration can be added, deleted, or replaced. Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. In addition, the control lines and information lines in the figure indicate what is considered necessary for the description, and do not necessarily indicate all of them. It can be considered that almost all configurations are connected to each other.

また、上記の携帯端末装置１０の機能構成は、理解を容易にするために、主な処理内容に応じて分類したものである。構成要素の分類の仕方や名称によって、本願発明が制限されることはない。携帯端末装置１０の構成は、処理内容に応じて、さらに多くの構成要素に分類することもできる。また、１つの構成要素がさらに多くの処理を実行するように分類することもできる。 In addition, the functional configuration of the mobile terminal device 10 is classified according to main processing contents in order to facilitate understanding. The present invention is not limited by the way of classification and names of the constituent elements. The configuration of the mobile terminal device 10 can also be classified into more components depending on the processing content. Moreover, it can also classify | categorize so that one component may perform more processes.

１０：携帯端末装置、１１０：制御部、１１１：画像取得部、１１２：文字抽出部、１１３：音声生成部、１１４：表示部、１１５：通信部、１１６：入力部、１１７：ＳＯＳ発信指示受付部、１１８：音声取得部、１２０：記憶部、１２１：ＳＯＳ設定情報、１３０：ＣＰＵ、１３１：補助記憶装置、１３２：ネットワークＩ／Ｆ、１３３：メモリ、１３４：入力Ｉ／Ｆ、１３５：出力Ｉ／Ｆ、１３６：タッチパネル、１３７：カメラユニット、１３８：ディスプレイ、１４０：ホーム画面、１４１：通話モード選択領域、１４２：設定ボタン、１５０：発信画面、１５１：通話モード表示領域、１５２：番号選択領域、１５３：発信ボタン、１６０：読唇画面Ａ、１６１：通話種別表示領域、１６２：画像表示領域、１６３：テキスト表示領域、１６４：終了ボタン、１７０：読唇画面Ｂ、１７１：通話種別表示領域、１７２：画像表示領域、１７３：テキスト表示領域、１７４：入力終了ボタン、１７５：キャンセルボタン、１８０：読唇通信設定画面、１８１：リトライ設定ボタン、１８２：ＳＯＳ設定ボタン、１８３：読唇履歴表示ボタン、１９０：ＳＯＳ設定画面、１９１：ＯＮ／ＯＦＦボタン、１９２：ＳＯＳ設定表示領域、２００：読唇履歴表示画面 DESCRIPTION OF SYMBOLS 10: Portable terminal device, 110: Control part, 111: Image acquisition part, 112: Character extraction part, 113: Voice generation part, 114: Display part, 115: Communication part, 116: Input part, 117: SOS transmission instruction reception , 118: voice acquisition unit, 120: storage unit, 121: SOS setting information, 130: CPU, 131: auxiliary storage device, 132: network I / F, 133: memory, 134: input I / F, 135: output I / F, 136: Touch panel, 137: Camera unit, 138: Display, 140: Home screen, 141: Call mode selection area, 142: Setting button, 150: Call screen, 151: Call mode display area, 152: Number selection Area, 153: Call button, 160: Lip reading screen A, 161: Call type display area, 162: Image display area, 163: Text table 164: End button, 170: Lip reading screen B, 171: Call type display area, 172: Image display area, 173: Text display area, 174: Input end button, 175: Cancel button, 180: Lip reading communication setting screen 181: Retry setting button 182: SOS setting button 183: Lip reading history display button 190: SOS setting screen 191: ON / OFF button 192: SOS setting display area 200: Lip reading history display screen

Claims

An image acquisition unit for acquiring an image of mouth movement;
A character extraction unit that extracts characters corresponding to the movement of the mouth based on the image acquired by the image acquisition unit;
A voice generation unit that generates voice using the characters extracted by the character extraction unit;
A communication unit that transmits the voice generated by the voice generation unit to a first communication destination;
A portable terminal device comprising:

The mobile terminal device according to claim 1,
A portable terminal device comprising: a display unit that displays the image acquired by the image acquisition unit and the character extracted by the character extraction unit.

The mobile terminal device according to claim 2,
An input unit that accepts correction of the character displayed by the display unit;
The voice generation unit generates the voice using the corrected character received by the input unit.

It is a portable terminal device according to claim 2 or 3,
A storage unit for storing communication destination information in which a message and a second communication destination are associated;
The communication unit makes a call to the second communication destination associated with the message in the communication destination information when the message is included in the character extracted by the character extraction unit. Terminal device.

The mobile terminal device according to claim 4,
An SOS transmission instruction receiving unit that receives an instruction of SOS transmission that is transmission to the second communication destination;
The communication unit receives the SOS transmission instruction during the communication with the first communication destination, and when the character extraction unit extracts the character including the message, A mobile terminal device that disconnects communication with a first communication destination and makes a call to the second communication destination.

The mobile terminal device according to claim 5,
The communication unit further transmits a call to the second communication destination when the image acquisition unit acquires the image of the mouth having a predetermined shape for a predetermined time.

The mobile terminal device according to any one of claims 2 to 6,
The display unit further displays a home screen for accepting selection of a continuous call mode or a single call mode,
The communication unit transmits the audio generated based on the image acquired by the image acquisition unit to the first communication destination after communication with the first communication destination is established when the continuous call mode is selected. And when the said single transmission mode is selected, the said audio | voice is transmitted to the said 1st communication destination where communication was established after the said image acquisition part acquired the said image, The portable terminal device characterized by the above-mentioned.

A portable terminal device according to any one of claims 2 to 7,
A voice acquisition unit for acquiring peripheral voice around the mobile terminal device;
The communication unit transmits the peripheral sound acquired by the sound acquisition unit to the first communication destination in addition to the sound generated by the sound generation unit.

The mobile terminal device according to claim 8,
The said communication part transmits the said periphery sound, when the said periphery sound acquired by the said sound acquisition part is more than predetermined | prescribed volume, The portable terminal device characterized by the above-mentioned.

A lipreading communication method using a mobile terminal device,
An image acquisition procedure for acquiring images of mouth movements;
A character extraction procedure for extracting a character corresponding to the movement of the mouth based on the image acquired in the image acquisition procedure;
A speech generation procedure for generating speech using the characters extracted in the character extraction procedure;
A communication procedure for transmitting the voice generated in the voice generation procedure to a first communication destination;
A lipreading communication method characterized by comprising:

A program that causes a computer to function as a mobile terminal device,
An image acquisition procedure for acquiring images of mouth movements;
A character extraction procedure for extracting a character corresponding to the movement of the mouth based on the image acquired in the image acquisition procedure;
A speech generation procedure for generating speech using the characters extracted in the character extraction procedure;
A communication procedure for transmitting the voice generated in the voice generation procedure to a first communication destination;
A program that causes a computer to execute.