JP2021135811A

JP2021135811A - Character input support control device, character input support system, and character input support program

Info

Publication number: JP2021135811A
Application number: JP2020032222A
Authority: JP
Inventors: 清文門馬; Kiyobumi Momma; 峰岩金; Hogan Kin
Original assignee: Tokyo Gas Co Ltd
Current assignee: Tokyo Gas Co Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2021-09-13

Abstract

To extract a character image from a display image based on image data already recorded on a recording medium to generate text data, and store image data of a captured image itself and the generated text data in association with each other.SOLUTION: A communication terminal device 12 reads recorded data already stored in a file server 14 or the like to display an image, the displayed image data is transmitted to an OCR server 16 via the file server 14, and characters included in the image are converted into text data and stored as a comment associated with the image data. By associating the characters displayed on the image as first text data with the image data, the characters can be used as search key words when a necessary image is searched for from a large amount of image files, and quick search processing can be achieved.SELECTED DRAWING: Figure 4

Description

本発明は、撮像現場での発話音声データを用いて、ファイル名やコメントを登録するための文字入力支援制御装置、文字入力支援システム、文字入力支援プログラムに関するものである。 The present invention relates to a character input support control device, a character input support system, and a character input support program for registering a file name and a comment by using utterance voice data at an imaging site.

例えば、製造業等の広大な工場を管理する企業において、日々の点検業務やトラブル時の対応業務の際、デジタルカメラやスマートホン（スマートフォンという場合もある）を用いて現場の状況をデジタルファイルとして映像記録する必要がある。また、撮像デバイスとして、近年では、スマートグラスに代表されるウェアラブル端末を用いることもある。 For example, in a company that manages a vast factory such as the manufacturing industry, the situation at the site is recorded as a digital file using a digital camera or smart phone (sometimes called a smartphone) during daily inspection work and troubleshooting work. It is necessary to record the video. Further, in recent years, a wearable terminal typified by smart glasses may be used as an imaging device.

ところで、撮像したデジタルファイルの対象物及びファイル名の関連付けは、オフィス等に戻ってからパソコン等でリネームする必要があったが、類似の映像記録から撮像時の状況を思い出して入力するのは労力がかかっている。 By the way, it was necessary to rename the object and the file name of the captured digital file on a computer after returning to the office, but it is laborious to remember the situation at the time of imaging from a similar video recording. Is on.

なお、参考として、撮像したデジタルファイルを自動で整理する技術として、特許文献１〜特許文献３の技術が提案されている。 As a reference, the techniques of Patent Documents 1 to 3 have been proposed as techniques for automatically organizing captured digital files.

特許文献１には、転送されてきた画像データ、又は読み取り装置で読み取った画像データを、領域分離してテキストが含まれているか否かによって、格納時の解像度を変更することが記載されている。 Patent Document 1 describes that the image data transferred or the image data read by a reading device is separated into areas and the resolution at the time of storage is changed depending on whether or not text is included. ..

具体的には、テキストが含まれている場合は、ＯＣＲ（Optical Character Recognition）での読み取り精度を維持する解像度とし、テキストが含まれていない場合は、格納占有領域を軽減する解像度とする。 Specifically, when the text is included, the resolution is set to maintain the reading accuracy by OCR (Optical Character Recognition), and when the text is not included, the resolution is set to reduce the storage occupied area.

また、特許文献２には、動画像からベストショットに対応するフレーム画像を、精度よく、効率的かつ高速に抽出することが記載されている。 Further, Patent Document 2 describes that a frame image corresponding to the best shot is extracted from a moving image with high accuracy, efficiency and high speed.

さらに、特許文献３には、予め登録された認識用の画像パターンに応じて自動的に撮像画像を振り分けることが記載されている。 Further, Patent Document 3 describes that captured images are automatically sorted according to a pre-registered image pattern for recognition.

特開２００７−３１２２２４号公報Japanese Unexamined Patent Publication No. 2007-31224 特開２０１６−２２５６７９号公報Japanese Unexamined Patent Publication No. 2016-225679 特開２０１０−５６８８４号公報JP-A-2010-56884

しかしながら、特許文献１では、イメージとテキストとをそれぞれ異なる適正な解像度で格納しておき、次にプリントアウト等を実行するときに、原稿とほぼ同一の状態でプリントすることを目的としていると考えられ、撮像した画像データとは別に、テキストデータとして格納し、例えば、画像データとテキストデータとを並べて表示するような概念はなく、示唆もされていない。 However, in Patent Document 1, it is considered that the purpose is to store the image and the text at different appropriate resolutions, and to print the image and the text in almost the same state as the original when the next printout or the like is executed. There is no concept or suggestion that the image data is stored as text data separately from the captured image data, and the image data and the text data are displayed side by side, for example.

また、特許文献２のように、大量の映像を自動で整理、または類似映像からベストショットだけを抜き出すなどの画像解析技術は従来から存在し、サービスとしても提供されているが、上記課題（類似の映像記録から撮像時の状況を思い出して入力するときの労力増大）を解決するものではない。 Further, as in Patent Document 2, image analysis technology such as automatically organizing a large amount of images or extracting only the best shots from similar images has existed conventionally and is provided as a service, but the above-mentioned problem (similarity). It does not solve the problem of increased labor when inputting by remembering the situation at the time of imaging from the video recording of.

なお、特許文献３では、予め登録した認識用の画像パターンに応じて、自動的に撮像画像を振り分けているが、初期設定（認識用の画像パターンの登録）が必須であり、撮像現場における様々な状況変化に対応できない。 In Patent Document 3, the captured images are automatically sorted according to the image pattern for recognition registered in advance, but the initial setting (registration of the image pattern for recognition) is indispensable, and various images are taken at the imaging site. I can't respond to changes in the situation.

本発明は、既に記録媒体に記録されている画像データに基づく表示画像から、文字画像を抽出してテキストデータを生成し、撮像画像自体の画像データと、生成したテキストデータとを関連付けて格納することができる文字入力支援制御装置、文字入力支援システム、文字入力支援プログラムを得ることが目的である。 The present invention extracts a character image from a display image based on image data already recorded on a recording medium to generate text data, and stores the image data of the captured image itself and the generated text data in association with each other. The purpose is to obtain a character input support control device, a character input support system, and a character input support program that can be used.

本発明に係る文字入力支援制御装置は、画像格納領域に格納された画像データを読み出して、表示部に表示する表示制御部と、前記表示部に表示された画像の中の文字画像を抽出して文字認識テキストデータを生成する文字認識部と、前記文字認識部で生成した文字認識テキストデータを、前記画像データに関連付けて格納するテキストデータ格納部と、
を有している。 The character input support control device according to the present invention reads out the image data stored in the image storage area and extracts the display control unit to be displayed on the display unit and the character image in the image displayed on the display unit. A character recognition unit that generates character recognition text data, and a text data storage unit that stores the character recognition text data generated by the character recognition unit in association with the image data.
have.

本発明によれば、表示制御部が、画像格納領域に格納された画像データを読み出して、表示部に表示する。 According to the present invention, the display control unit reads out the image data stored in the image storage area and displays it on the display unit.

文字認識部では、表示部に表示された画像の中の文字画像を抽出して文字認識テキストデータを生成する。テキストデータ格納部では、文字認識部で生成した文字認識テキストデータを、前記画像データに関連付けて格納する。 The character recognition unit extracts a character image from the image displayed on the display unit to generate character recognition text data. The text data storage unit stores the character recognition text data generated by the character recognition unit in association with the image data.

これにより、既に記録媒体に記録されている画像データに基づく表示画像から、文字画像を抽出してテキストデータを生成し、撮像画像自体の画像データと、生成したテキストデータとを関連付けて格納することができる。 As a result, character images are extracted from the display image based on the image data already recorded on the recording medium to generate text data, and the image data of the captured image itself and the generated text data are stored in association with each other. Can be done.

本発明において、前記文字認識テキストデータに基づく文字画像が、撮像された画像のコメントとして表示されることを特徴としている。 The present invention is characterized in that a character image based on the character recognition text data is displayed as a comment of the captured image.

撮像画像と共にコメントとして文字認識テキストデータが表示されることで、撮像画像の特定、識別が容易となる。 By displaying the character recognition text data as a comment together with the captured image, it becomes easy to identify and identify the captured image.

本発明において、前記表示部による画像表示中に発話した音声データを、音声認識テキストデータに変換する音声認識部をさらに有し、前記テキストデータ格納部が、前記音声認識部で音声認識した音声認識テキストデータを、前記画像データに関連付けて格納すると共に、前記音声認識テキストデータが、前記画像のタイトル及びコメントの少なくとも一方として表示されることを特徴としている。 In the present invention, the voice recognition unit further includes a voice recognition unit that converts voice data spoken during image display by the display unit into voice recognition text data, and the text data storage unit performs voice recognition by the voice recognition unit. The text data is stored in association with the image data, and the voice recognition text data is displayed as at least one of the title and the comment of the image.

文字認識は、撮像された画像の中の文字を認識するものであり、限られた情報となる。これに対して、例えば表示部による画像表示中に、頭に思い描いた情報を発話し、その発話した音声テキストデータを、タイトル及びコメントの少なくとも一方として表示することで、画像の特定、識別がさらに容易となる。 Character recognition recognizes characters in an captured image and is limited information. On the other hand, for example, during image display by the display unit, the information envisioned in the head is uttered, and the uttered voice text data is displayed as at least one of the title and the comment, thereby further identifying and identifying the image. It will be easy.

本発明に係る文字入力支援システムは、画像格納領域に格納された画像データを読み出して、表示部に表示する表示制御部を備えた通信端末装置と、前記表示部に表示された画像の中の文字画像を抽出して文字認識テキストデータを生成する文字認識部を備えた文字認識サーバと、前記表示部に表示された画像の画像データに、前記文字認識部で生成した文字認識テキストデータを関連付けて格納するテキストデータ格納部を備えたデータ格納サーバと、を有している。 The character input support system according to the present invention includes a communication terminal device having a display control unit that reads out image data stored in an image storage area and displays it on a display unit, and an image displayed on the display unit. A character recognition server equipped with a character recognition unit that extracts a character image and generates character recognition text data, and the image data of the image displayed on the display unit are associated with the character recognition text data generated by the character recognition unit. It has a data storage server provided with a text data storage unit for storing the data.

本発明によれば、通信端末装置、文字認識サーバ、及びデータ格納サーバを、ネットワークを介して連携させて、文字入力支援システムを構築する。 According to the present invention, a communication terminal device, a character recognition server, and a data storage server are linked via a network to construct a character input support system.

通信端末装置の表示制御部では、画像格納領域に格納された画像データを読み出して、表示部に表示する。 The display control unit of the communication terminal device reads out the image data stored in the image storage area and displays it on the display unit.

文字認識サーバの文字認識部では、表示部に表示された画像の中の文字画像を抽出して文字認識テキストデータを生成する。データ格納サーバでは、テキストデータ格納部に、文字認識部で生成した文字認識テキストデータを、前記画像データ格納部に格納した文字認識対象の画像データに関連付けて格納する。 The character recognition unit of the character recognition server extracts the character image from the image displayed on the display unit and generates character recognition text data. In the data storage server, the character recognition text data generated by the character recognition unit is stored in the text data storage unit in association with the image data of the character recognition target stored in the image data storage unit.

これにより、既に記録媒体に記録されている画像データに基づく表示画像から、文字画像を抽出してテキストデータを生成し、画像データと、生成したテキストデータとを関連付けて格納することができる。 Thereby, the character image can be extracted from the display image based on the image data already recorded on the recording medium to generate the text data, and the image data and the generated text data can be stored in association with each other.

本発明において、前記表示部による画像表示中に発話した音声データを、音声認識テキストデータに変換する音声認識サーバをさらに有し、前記テキストデータ格納部が、前記音声認識サーバで音声認識した音声認識テキストデータを、前記画像データに関連付けて格納すると共に、前記音声認識テキストデータが、前記画像のタイトル及びコメントの少なくとも一方として表示されることを特徴としている。 In the present invention, there is further a voice recognition server that converts voice data spoken during image display by the display unit into voice recognition text data, and the text data storage unit voice-recognizes voice recognition by the voice recognition server. The text data is stored in association with the image data, and the voice recognition text data is displayed as at least one of the title and the comment of the image.

文字認識は、撮像された画像の中の文字を認識するものであり、限られた情報となる。これに対して、例えば表示部による画像表示中に、頭に思い描いた情報を発話し、その発話した音声テキストデータを、タイトル及びコメントの少なくとも一方として表示することで、撮像画像の特定、識別がさらに容易となる。 Character recognition recognizes characters in an captured image and is limited information. On the other hand, for example, during image display by the display unit, the information envisioned in the head is spoken, and the spoken voice text data is displayed as at least one of the title and the comment, so that the captured image can be identified and identified. It will be easier.

本発明において、前記通信端末装置の撮像部で撮像した撮像画像は、前記データ格納サーバへ格納後に、予め許可を得た特定の通信端末装置に制限された状態で閲覧可能とされることを特徴としている。 The present invention is characterized in that the captured image captured by the imaging unit of the communication terminal device can be viewed in a state of being restricted to a specific communication terminal device for which permission has been obtained in advance after being stored in the data storage server. It is supposed to be.

データ格納サーバが不特定多数からアクセス可能であっても、通信端末装置に閲覧制限をかけることで、例えば、データ格納サーバが、契約によって記憶容量を確保し得る外部ストレージ等であっても、秘匿性を維持することができる。 Even if the data storage server can be accessed by an unspecified number of people, by restricting viewing on the communication terminal device, for example, even if the data storage server is an external storage that can secure storage capacity by contract, it is kept secret. Sex can be maintained.

本発明において、画像とテキストとがセットになった表示領域を１単位のフレームとして、格納されたフレームを一覧するビュワー機能を搭載することを特徴としている。 The present invention is characterized in that it is equipped with a viewer function for listing stored frames, with a display area in which an image and text are set as one unit of frames.

画像とテキストとがセットになって表示されることで、関連のある画像の仕分け等の作業が容易となる。 By displaying the image and the text as a set, it becomes easy to sort the related images.

本発明に係る文字入力支援プログラムは、コンピュータを、請求項１〜請求項３の何れか１項記載の文字入力支援制御装置の各部として動作させることを特徴としている。 The character input support program according to the present invention is characterized in that the computer is operated as each part of the character input support control device according to any one of claims 1 to 3.

本発明によれば、既に記録媒体に記録されている画像データに基づく表示画像から、文字画像を抽出してテキストデータを生成し、撮像画像自体の画像データと、生成したテキストデータとを関連付けて格納することができる。 According to the present invention, a character image is extracted from a display image based on image data already recorded on a recording medium to generate text data, and the image data of the captured image itself is associated with the generated text data. Can be stored.

第１の実施の形態に係る、文字入力支援システムを構成する、通信端末装置、ファイルサーバ、ＯＣＲサーバの相互の接続状態を示すネットワーク図である。FIG. 5 is a network diagram showing a mutual connection state of a communication terminal device, a file server, and an OCR server constituting a character input support system according to the first embodiment. 第１の実施の形態に係る通信端末装置のハード構成を示す制御ブロック図である。It is a control block diagram which shows the hardware composition of the communication terminal apparatus which concerns on 1st Embodiment. 第１の実施の形態の通信端末装置に適用可能なデバイス例であり、（Ａ）がスマートフォンの斜視図、（Ｂ）がデジタルカメラの斜視図、（Ｃ）がヘッドセット型ウェアラブルカメラの斜視図、（Ｄ）がスマートグラス型ウェアラブルカメラの斜視図である。An example of a device applicable to the communication terminal device of the first embodiment, (A) is a perspective view of a smartphone, (B) is a perspective view of a digital camera, and (C) is a perspective view of a headset-type wearable camera. , (D) is a perspective view of a smart glass type wearable camera. 第１の実施の形態係る文字入力支援システムにおける文字入力支援制御を実行するための通信プロトコルである。The first embodiment is a communication protocol for executing character input support control in the character input support system according to the first embodiment. 第１の実施の形態に係る通信端末装置のモニタの正面図であり、（Ａ）は撮影時画像、（Ｂ）はＯＣＲ処理時画像、（Ｃ）は履歴一覧画像、（Ｄ）は機能設定画像である。It is a front view of the monitor of the communication terminal apparatus which concerns on 1st Embodiment, (A) is an image at the time of shooting, (B) is an image at the time of OCR processing, (C) is a history list image, (D) is a function setting. It is an image. 第２の実施の形態に係る、文字入力支援システムを構成する、通信端末装置、ファイルサーバ、ＯＣＲサーバの相互の接続状態を示すネットワーク図である。FIG. 5 is a network diagram showing a mutual connection state of a communication terminal device, a file server, and an OCR server constituting a character input support system according to the second embodiment. 第２の実施の形態係る文字入力支援システムにおける文字入力支援制御を実行するための通信プロトコルである。The second embodiment is a communication protocol for executing character input support control in the character input support system according to the second embodiment. 第２の実施の形態に係る通信端末装置のモニタの正面図であり、（Ａ）は撮影時画像、（Ｂ）はＯＣＲ処理時画像、（Ｃ）は履歴一覧画像、（Ｄ）は機能設定画像である。It is a front view of the monitor of the communication terminal apparatus which concerns on 2nd Embodiment, (A) is an image at the time of shooting, (B) is an image at the time of OCR processing, (C) is a history list image, (D) is a function setting. It is an image.

「第１の実施の形態」 "First embodiment"

図１には、第１の実施の形態に係るファイル用文字入力支援システム１０の概略構成図が示されている。 FIG. 1 shows a schematic configuration diagram of a file character input support system 10 according to the first embodiment.

ファイル用文字入力支援システム１０は、当該ファイル用文字入力支援システム１０を利用する利用者１１が所持する通信端末装置１２と、ファイルサーバ１４とで構成され、ネットワーク１８を介して、相互に通信可能となっている。また、ファイルサーバ１４には、通信回線を介してＯＣＲサーバ１６が接続されている。 The file character input support system 10 is composed of a communication terminal device 12 owned by a user 11 who uses the file character input support system 10 and a file server 14, and can communicate with each other via a network 18. It has become. Further, the OCR server 16 is connected to the file server 14 via a communication line.

（通信端末装置１２） (Communication terminal device 12)

図２に示される如く、通信端末装置１２は、マイクロコンピュータ２０を備えている。マイクロコンピュータ２０は、ＣＰＵ２０Ａ、ＲＡＭ２０Ｂ、ＲＯＭ２０Ｃ、入出力ポート２０Ｄ、及びこれらを接続するデータバスやコントロールバス等で構成されたバス２０Ｅによって構成されている。 As shown in FIG. 2, the communication terminal device 12 includes a microcomputer 20. The microcomputer 20 is composed of a CPU 20A, a RAM 20B, a ROM 20C, an input / output port 20D, and a bus 20E composed of a data bus, a control bus, and the like connecting them.

Ｉ／Ｏ２０Ｄには、入出力デバイス２２、大規模記憶媒体２４、撮像デバイス２６、及びネットワーク１８と接続するための通信Ｉ／Ｆ２８が接続されている。 The I / O 20D is connected to the input / output device 22, the large-scale storage medium 24, the image pickup device 26, and the communication I / F 28 for connecting to the network 18.

入出力デバイス２２は、モニタ及び入力パッドとして機能するタッチパネル部２２Ａと、スピーカ２２Ｂと、マイクロフォン２２Ｃとを備える。 The input / output device 22 includes a touch panel unit 22A that functions as a monitor and an input pad, a speaker 22B, and a microphone 22C.

通信Ｉ／Ｆ２８は、ネットワーク１８に接続された無線中継装置３０を介してネットワーク１８との間で、データを送受信する。より具体的には、撮像デバイス２６で撮像した画像データは、自身（通信端末装置１２）の大規模記憶装置２４に格納する場合と、ネットワーク１８を介してファイルサーバ１４（図１参照）へ転送（送信）する場合とがある。さらには、ネットワーク１８に接続された、ファイルサーバ１４とは別の外部ストレージに転送する場合もある。なお、撮像は、動画と静止画の何れであってもよい。 The communication I / F 28 transmits / receives data to / from the network 18 via the wireless relay device 30 connected to the network 18. More specifically, the image data captured by the imaging device 26 is stored in the large-scale storage device 24 of itself (communication terminal device 12) and transferred to the file server 14 (see FIG. 1) via the network 18. It may be (sent). Further, the file may be transferred to an external storage connected to the network 18 and different from the file server 14. The imaging may be either a moving image or a still image.

動画の場合は、マイクロフォン２２Ｃで収録した音データ（後述する環境音データ）を含み、静止画の場合は、環境音データは存在しないという違いはある。また、静止画においても、画像データに関連付けて音声データをそのまま記録する機能を持つ撮像デバイスもある。 In the case of moving images, the sound data recorded by the microphone 22C (environmental sound data described later) is included, and in the case of still images, there is no environmental sound data. Further, even in a still image, there is an imaging device having a function of recording audio data as it is in association with image data.

以下において、単に、「撮像」とした場合は、動画と静止画とを含むものとする。また、以下において、画像データと音声データとを総称して、「記録データ」という。すなわち、ファイルサーバ１４は、画像データと音声データとを管理制御する機能を有する。 In the following, the term "imaging" simply includes moving images and still images. Further, in the following, the image data and the audio data are collectively referred to as "recorded data". That is, the file server 14 has a function of managing and controlling image data and audio data.

また、以下において、単に、音声データという場合は、撮像デバイス２６による動画の撮像時の周辺環境音データと、通信端末装置１２を管理（所持）する管理者が撮像している対象を特定するために発話する発話音声データとを含むものとし、必要に応じて、環境音データと発話音声データとは、区別して説明する。また、静止画の場合、発話音声データは収録可能である。 Further, in the following, in the case of simply referred to as voice data, in order to identify the ambient sound data at the time of capturing the moving image by the imaging device 26 and the target imaged by the administrator who manages (possessed) the communication terminal device 12. It is assumed that the utterance voice data to be uttered is included, and the environmental sound data and the utterance voice data will be described separately as necessary. Further, in the case of a still image, utterance voice data can be recorded.

（通信端末装置１２の適用例） (Application example of communication terminal device 12)

通信端末装置１２は、少なくとも、上記基本機能（入出力デバイス機能（特に、マイクロフォン）、撮像デバイス機能、及び通信Ｉ／Ｆ機能）を備えていればよい。一例として、図３（Ａ）に示すスマートフォン１２ＳＰが代表的な通信端末装置１２として適用可能である。なお、通常、スマートフォン１２ＳＰは通話機能を持つことで、当該通話機能を持たない比較的大画面サイズ（７〜１０インチ程度）のタブレット端末とは異なるカテゴリとする場合もあるが、ここでは、撮像デバイス機能という同等の機能を有するので、以下において、「スマートフォン１２ＳＰ」といった場合は、タブレット端末を含むものとする。 The communication terminal device 12 may have at least the above basic functions (input / output device function (particularly, microphone), image pickup device function, and communication I / F function). As an example, the smartphone 12SP shown in FIG. 3A can be applied as a typical communication terminal device 12. Normally, the smartphone 12SP has a call function, so that it may be in a different category from a tablet terminal having a relatively large screen size (about 7 to 10 inches) that does not have the call function. Since it has the same function as a device function, the term "smartphone 12SP" in the following includes a tablet terminal.

すなわち、図３（Ａ）に示される如く、スマートフォン１２ＳＰは、通信機能（図２に示す通信Ｉ／Ｆ２８を含む）を備えており、筐体３２の前面がタッチパネル部２２Ａとされ、タッチパネル部２２Ａの周囲には、スピーカ２２Ｂ及びマイクロフォン２２Ｃが設けられている。また、筐体３２の前面及び背面には、それぞれ撮像デバイスの一部を構成するカメラレンズ２６Ｆ及び２６Ｒが設けられている。 That is, as shown in FIG. 3A, the smartphone 12SP has a communication function (including the communication I / F28 shown in FIG. 2), the front surface of the housing 32 is a touch panel portion 22A, and the touch panel portion 22A. A speaker 22B and a microphone 22C are provided around the speaker. Further, on the front surface and the back surface of the housing 32, camera lenses 26F and 26R, which form a part of the imaging device, are provided, respectively.

また、その他の通信端末装置１２の例としては、図３（Ｂ）に示すデジタルカメラ１２ＤＣが適用可能である。デジタルカメラ１２ＤＣの場合、デジタルカメラ１２ＤＣ自体に通信機能を持たせなくても、通信機能（例えば、ＷｉＦｉ機能等）を備えたＳＤカードを用いることで、通信端末装置１２として適用可能である。 Further, as an example of the other communication terminal device 12, the digital camera 12DC shown in FIG. 3B can be applied. In the case of the digital camera 12DC, even if the digital camera 12DC itself does not have a communication function, it can be applied as a communication terminal device 12 by using an SD card having a communication function (for example, WiFi function).

さらに、通信端末装置１２の例としては、図３（Ｃ）に示される如く、ヘッドセット型ウェアラブルカメラユニット１２ＨＳが適用可能である。ヘッドセット型ウェアラブルカメラユニット１２ＨＳでは、利用者がヘルメット等に装着する撮像デバイス２６と、利用者１１が把持するその他のデバイス（入出力デバイス２２及び通信Ｉ／Ｆ２８等）とが分離され、ケーブル３４で接続されている。 Further, as an example of the communication terminal device 12, as shown in FIG. 3C, a headset type wearable camera unit 12HS can be applied. In the headset-type wearable camera unit 12HS, the image pickup device 26 worn by the user on a helmet or the like and other devices (input / output device 22 and communication I / F 28, etc.) held by the user 11 are separated, and the cable 34 is used. It is connected with.

また、通信端末装置１２の例としては、図３（Ｄ）に示される如く、入出力デバイス２２及び撮像デバイス２６を含む一部又は全部の機能が眼鏡型の装着部に組み込まれたスマートグラス型ウェアラブルカメラユニット１２ＳＧであってもよい。 Further, as an example of the communication terminal device 12, as shown in FIG. 3D, a smart glass type in which some or all functions including the input / output device 22 and the image pickup device 26 are incorporated in the eyeglass-type wearing portion. It may be a wearable camera unit 12SG.

（ファイルサーバ１４） (File server 14)

図１に示される如く、ファイルサーバ１４は、マイクロコンピュータとしての機能を備えており、図示は省略するが、ＣＰＵ、ＲＡＭ、ＲＯＭ、Ｉ／Ｏ、及びバスを有している。 As shown in FIG. 1, the file server 14 has a function as a microcomputer, and although not shown, it has a CPU, RAM, ROM, I / O, and a bus.

また、ファイルサーバ１４は、大規模記憶装置として、画像データ格納領域とテキストデータ格納領域を持つデータベース１４Ａを備えている。データベース１４Ａには、ファイルサーバ１４の制御に基づき、通信端末装置１２から受信した記録データがファイル化されて格納されるようになっている。 Further, the file server 14 includes a database 14A having an image data storage area and a text data storage area as a large-scale storage device. In the database 14A, the recorded data received from the communication terminal device 12 is stored as a file under the control of the file server 14.

（ＯＣＲサーバ１６） (OCR server 16)

第１の実施の形態では、通信端末装置１２の撮像デバイスの付加的機能として、ＯＣＲ処理アプリケーションプログラム(以下、ＯＣＲアプリという）を設定した。このＯＣＲアプリは、通信端末装置１２の大規模記憶装置２４及びファイルサーバ１４のデータベース１４Ａ、並びに、ネットワーク１８に接続されたその他の外部ストレージ（図示省略）に格納された記録データを読み出して、ファイルサーバ１４を介して、ＯＣＲサーバ１６へ送り、撮像された画像に含まれる文字を読み取って、文字情報に変換し、記録データに関連付けられた添付情報としてそれぞれの格納領域に格納する機能を有している。添付情報は、記録データに付加されるコメント領域に格納される。 In the first embodiment, an OCR processing application program (hereinafter referred to as an OCR application) is set as an additional function of the image pickup device of the communication terminal device 12. This OCR application reads out the recorded data stored in the large-scale storage device 24 of the communication terminal device 12, the database 14A of the file server 14, and other external storage (not shown) connected to the network 18, and files. It has a function of sending to the OCR server 16 via the server 14, reading the characters included in the captured image, converting them into character information, and storing them in each storage area as attached information associated with the recorded data. ing. The attached information is stored in the comment area added to the recorded data.

コメント領域としては、一例として、Ｅｘｉｆ（登録商標「Exchangeable Image File Format」）情報の一部として設けられたコメント領域が利用可能である。Ｅｘｉｆ（登録商標）情報領域には、撮像に関する情報（撮影日時、メーカー名、モデル名、解像度、シャッター速度、絞り、ＩＳＯ、フラッシュ有無、焦点距離、サムネイル画像、及びＧＰＳ情報等）を記録する領域であり、この撮像に関する情報の一つとしてコメント領域が設けられている。 As an example, a comment area provided as a part of Exif (registered trademark "Exchangeable Image File Format") information can be used as the comment area. The Exif (registered trademark) information area is an area for recording information related to imaging (shooting date / time, manufacturer name, model name, resolution, shutter speed, aperture, ISO, flash presence / absence, focal length, thumbnail image, GPS information, etc.). Therefore, a comment area is provided as one of the information regarding this imaging.

ＯＣＲ処理により、画像から読みとった文字情報を、コメント領域に格納し、画像を閲覧するときに、当該コメント領域の文字情報を併せて表示することで、画像のみに比べて、撮像画像の仕分けに役立てることができる。 By OCR processing, the character information read from the image is stored in the comment area, and when the image is viewed, the character information in the comment area is also displayed, so that the captured image can be sorted as compared with the image alone. Can be useful.

以下に、第１の実施の形態の作用を、図４の通信プロトコルに基づいて説明する。 The operation of the first embodiment will be described below based on the communication protocol of FIG.

図４は、図１に示す通信端末装置１２、ファイルサーバ１４、及びファイルサーバ１４に接続されたＯＣＲサーバ１６で、ネットワーク１８を介して実行される通信プロトコルである。 FIG. 4 is a communication protocol executed via the network 18 by the communication terminal device 12, the file server 14, and the OCR server 16 connected to the file server 14 shown in FIG.

通信端末装置１２のＯＣＲアプリ機能により、画像読み出しを指示すると（ステップ１００）、ファイルサーバ１４のデータベース１４Ａから記録データが読み出される（ステップ１０２）。このとき、記録データの読み出し対象は、自身（通信端末装置１２）の大規模記憶装置２４や、ネットワーク１８に接続された外部ストレージであってもよい（ステップ１０４）。 When the image reading is instructed by the OCR application function of the communication terminal device 12 (step 100), the recorded data is read from the database 14A of the file server 14 (step 102). At this time, the target for reading the recorded data may be the large-scale storage device 24 of itself (communication terminal device 12) or the external storage connected to the network 18 (step 104).

読み出された記録データは、入出力デバイス２２のタッチパネル部２２Ａ（モニタ）に表示される（ステップ１０６）。 The read recorded data is displayed on the touch panel unit 22A (monitor) of the input / output device 22 (step 106).

表示された画像の画像データは、ファイルサーバ１４を介して、ＯＣＲサーバ１６へ送信される（ステップ１０８）。ＯＣＲサーバ１６では、受信した画像データをＯＣＲ処理し、文字画像を抽出する（ステップ１１０）。抽出された文字画像は、第１テキストデータとして通信端末装置１２へ返信される(ステップ１１２)。なお、第１テキストデータの「第１」とは、後述する、音声−文字変換で生成された第２テキストデータを区別するためのものである。 The image data of the displayed image is transmitted to the OCR server 16 via the file server 14 (step 108). The OCR server 16 performs OCR processing on the received image data and extracts a character image (step 110). The extracted character image is returned to the communication terminal device 12 as first text data (step 112). The "first" of the first text data is for distinguishing the second text data generated by the voice-character conversion, which will be described later.

第１テキストデータを受け取った通信端末装置１２のモニタには、画像と共に、タイトル表示欄とコメント表示欄が設けられている。通信端末装置１２では、受信した第１テキストデータを、コメント表示欄に表示する（ステップ１１８）。 The monitor of the communication terminal device 12 that has received the first text data is provided with a title display column and a comment display column together with an image. The communication terminal device 12 displays the received first text data in the comment display field (step 118).

ここで、利用者１１は、タイトル表示欄又はコメント表示欄を選択し（ステップ１２０）、タイトル表示欄のテキストデータ又はコメント表示欄のテキストデータを手動で（入力パッドのキー入力操作で）編集する（ステップ１２２）。なお、編集不要の場合もある。 Here, the user 11 selects the title display field or the comment display field (step 120), and manually edits the text data in the title display field or the text data in the comment display field (by key input operation of the input pad). (Step 122). In some cases, editing is not required.

通信端末装置１２において、利用者１１の操作で、「確定」指示があると（ステップ１２４）、タイトル欄及びコメント欄のテキストデータを元の画像データと関連付けて、記録データ読み出し先の格納領域に格納する。 In the communication terminal device 12, when there is a "confirmation" instruction by the operation of the user 11 (step 124), the text data in the title column and the comment column is associated with the original image data and stored in the storage area of the recorded data read destination. Store.

例えば、記録データの読み出し先がファイルサーバ１４の場合は、第１テキストデータをファイルサーバ１４へ送信し（ステップ１２６）、テキストデータ格納領域を更新する（ステップ１２８）。なお、画像データの読み出し先が、通信端末装置１２又はその他の外部ストレージにおいても同様に、第１テキストデータが送信され、それぞれの画像データのテキストデータ格納領域を更新すればよい。 For example, when the read destination of the recorded data is the file server 14, the first text data is transmitted to the file server 14 (step 126), and the text data storage area is updated (step 128). Similarly, the first text data may be transmitted to the image data read destination in the communication terminal device 12 or other external storage, and the text data storage area of each image data may be updated.

図５は、図４の通信プロトコルの実行に基づいて、表示される通信端末装置１２の入出力デバイス２２（タッチパネル部２２Ａ）の表示画面の遷移図である。 FIG. 5 is a transition diagram of a display screen of the input / output device 22 (touch panel unit 22A) of the communication terminal device 12 to be displayed based on the execution of the communication protocol of FIG.

図５（Ａ）は、図４のステップ１０２においてタッチパネル部２２Ａの全域が撮像画面表示欄４０となって、読み出した記録データに基づく画像が表示される状態である。 FIG. 5A shows a state in which the entire area of the touch panel unit 22A serves as the imaging screen display field 40 in step 102 of FIG. 4, and an image based on the read recorded data is displayed.

図５（Ｂ）は、表示されている画像の第１テキストデータ表示画面である。このとき、タッチパネル部２２Ａは、撮像画面表示欄４２と、タイトル表示欄４４と、コメント表示欄４６とに分割され、第１テキストデータは、コメント表示欄４６に表示される。 FIG. 5B is a first text data display screen of the displayed image. At this time, the touch panel unit 22A is divided into an imaging screen display field 42, a title display field 44, and a comment display field 46, and the first text data is displayed in the comment display field 46.

図５（Ｃ）は、ＯＣＲ処理を行って第１テキストデータが付加された画像の履歴表示例であり、タッチパネル部２２Ａの全域が履歴表示欄４８となっている。 FIG. 5C is an example of history display of an image to which the first text data is added by performing OCR processing, and the entire area of the touch panel unit 22A is the history display field 48.

履歴表示欄４８には、通信端末装置１２から、読み出し先のストレージ（自身の大規模記憶装置２４、ファイルサーバ１４、及びネットワーク１８に接続されたその他の外部ストレージ）にアクセスし、指定したファイル（記録データ、テキストデータ）が一覧表示される。図５（Ｃ）では履歴表示欄４８に、１ファイル分の枠内にタイトル名、画像、及びコメントが表示され、最大３枠分表示されているが、スクロール又は切り替えによって、他のファイルの枠を表示することが可能である。 In the history display field 48, the communication terminal device 12 accesses the read destination storage (own large-scale storage device 24, file server 14, and other external storage connected to the network 18), and a designated file ( Recorded data, text data) are displayed in a list. In FIG. 5C, the title name, image, and comment are displayed in the history display field 48 in the frame for one file, and a maximum of three frames are displayed. However, by scrolling or switching, frames for other files are displayed. Can be displayed.

なお、図５（Ｃ）のように、ファイルサーバ１４や、その他の外部ストレージからダウンロードして画像を表示する場合、セキュリティ、秘匿性の観点から、予め特定した通信端末装置１２に制限することが好ましい。例えば、パスワードを入力して閲覧可能とする、或いは、通信端末装置１２毎のデフォルト設定で、閲覧の可否を決めておく、等が考えられる。 As shown in FIG. 5C, when the image is downloaded from the file server 14 or other external storage and displayed, the image may be limited to the communication terminal device 12 specified in advance from the viewpoint of security and confidentiality. preferable. For example, it is conceivable to enter a password to enable browsing, or to decide whether or not to browse by the default setting for each communication terminal device 12.

図５（Ｄ）は、通信端末装置１２にインストールされる文字入力支援アプリケーションプログラムの各種設定画面表示欄５０であり、ＯＣＲ処理を実行するか否か（ＯＣＲアプリの起動の要否）の切り替え設定することができるようになっている。 FIG. 5D shows various setting screen display fields 50 of the character input support application program installed in the communication terminal device 12, and is a setting for switching whether or not to execute the OCR process (whether or not the OCR application needs to be started). You can do it.

以上説明したように第１の実施の形態では、通信端末装置１２において、ＯＣＲアプリを起動して、既に何れかに格納されている画像を読み出して表示すると、当該表示された画像データがファイルサーバ１４を介して、ＯＣＲサーバ１６へ送られ、画像に含まれる文字を文字情報（第１テキストデータ）に変換し、画像データに関連付けられたコメントとして格納される。 As described above, in the first embodiment, when the OCR application is started in the communication terminal device 12 and the image already stored in any of them is read and displayed, the displayed image data is displayed on the file server. It is sent to the OCR server 16 via 14, converts the characters included in the image into character information (first text data), and is stored as a comment associated with the image data.

この画像に表示された文字が第１のテキストデータとして、画像データに関連付けられることで、大量の画像ファイルの中から、必要な画像を検索する際の検索キーワードとして利用することができ、迅速な検索処理を実現することができる。 By associating the characters displayed in this image with the image data as the first text data, it can be used as a search keyword when searching for a necessary image from a large number of image files, which is quick. Search processing can be realized.

「第２の実施の形態」 "Second embodiment"

以下に、本発明の第２の実施の形態について説明する。 The second embodiment of the present invention will be described below.

第１の実施の形態では、ＯＣＲ処理により、撮像された画像の中の文字画像をテキストデータ（第１テキストデータ）に変換して、第１テキストデータをコメント表示欄に格納することが特徴であった。 The first embodiment is characterized in that the character image in the captured image is converted into text data (first text data) by OCR processing and the first text data is stored in the comment display field. there were.

これに対して、第２の実施の形態は、ＯＣＲ処理を実行するＯＣＲサーバ１６に加え、ネットワーク１８上に発話音声−文字変換制御機能を持つ音声認識サーバ３６を接続し、通信端末装置１２及びファイルサーバ１４との間で通信可能とした。 On the other hand, in the second embodiment, in addition to the OCR server 16 that executes the OCR process, a voice recognition server 36 having a spoken voice-character conversion control function is connected on the network 18, and the communication terminal device 12 and Communication with the file server 14 is possible.

すなわち、第２の実施の形態では、画像を見た利用者１１が、撮像直後に撮像現場において、画像に関連して発した音声（発話音声データ）を、音声認識サーバ３６へ送り、文字情報（第２テキストデータ）に変換して、タイトル表示欄又はコメント表示欄に格納することを特徴としている。 That is, in the second embodiment, the user 11 who sees the image sends the voice (spoken voice data) emitted in relation to the image to the voice recognition server 36 at the imaging site immediately after the image capture, and character information. It is characterized in that it is converted into (second text data) and stored in the title display column or comment display column.

ファイル名は、規則性を持つ番号や日付等が、自動的に付与され、設定されるのが一般的である（例えば、「DVC01012020-001」等）。 The file name is generally automatically assigned and set with a regular number, date, etc. (for example, "DVC01012020-001").

第２の実施の形態では、通信端末装置１２による撮像が実行される撮像拠点において、撮像が動画の場合は、撮像中（主として動画）又は撮像後（主として静止画）のプレビュー表示中に、利用者１１（利用者、図１参照）が発話した音声（発話音声データ）を文字情報に変換し、記録データのタイトル名とした。 In the second embodiment, in the imaging base where the imaging by the communication terminal device 12 is executed, when the imaging is a moving image, it is used during the preview display during imaging (mainly moving image) or after imaging (mainly still image). The voice (spoken voice data) spoken by the person 11 (user, see FIG. 1) was converted into character information and used as the title name of the recorded data.

また、撮像が静止画の場合は、静止画の撮像後に当該静止画を一定時間プレビュー表示させている間に、利用者１１（図１参照）が発話した音声（発話音声データ）を文字情報に変換し、記録データのタイトル名とした。 When the image is a still image, the voice (spoken voice data) spoken by the user 11 (see FIG. 1) is used as character information while the still image is preview-displayed for a certain period of time after the still image is captured. It was converted and used as the title name of the recorded data.

なお、第２テキストデータは、タイトル名とすると共に、或いはタイトル名に代えて、ＯＣＲ処理で変換された第１テキストデータと同様に、コメント表示欄に格納するようにしてもよい。 The second text data may be used as the title name, or instead of the title name, may be stored in the comment display field in the same manner as the first text data converted by the OCR process.

以下に、第２の実施の形態の作用を、図７の通信プロトコルに基づいて説明する。 The operation of the second embodiment will be described below based on the communication protocol of FIG.

図７は、図６に示す通信端末装置１２、ファイルサーバ１４、及びファイルサーバ１４に接続されたＯＣＲサーバ１６で、ネットワーク１８を介して実行される通信プロトコルである。 FIG. 7 is a communication protocol executed via the network 18 by the communication terminal device 12, the file server 14, and the OCR server 16 connected to the file server 14 shown in FIG.

表示された画像の画像データは、ファイルサーバ１４を介して、ＯＣＲサーバ１６へ送信される（ステップ１０８）。ＯＣＲサーバ１６では、受信した画像データをＯＣＲ処理し、文字画像を抽出する（ステップ１１０Ａ）。抽出された文字画像は、第１テキストデータとして通信端末装置１２へ返信される(ステップ１１２)。なお、第１テキストデータの「第１」とは、後述する、音声−文字変換で生成された第２テキストデータを区別するためのものである。 The image data of the displayed image is transmitted to the OCR server 16 via the file server 14 (step 108). The OCR server 16 performs OCR processing on the received image data and extracts a character image (step 110A). The extracted character image is returned to the communication terminal device 12 as first text data (step 112). The "first" of the first text data is for distinguishing the second text data generated by the voice-character conversion, which will be described later.

以上がＯＣＲ処理に関する通信プロトコルとなり、第１の実施の形態と同様である。 The above is the communication protocol related to OCR processing, which is the same as that of the first embodiment.

次に、音声認識処理に関する通信プロトコルについて、説明する。 Next, a communication protocol related to voice recognition processing will be described.

コメント欄に第１テキストデータが表示されると（前述したステップ１１８）、次に、利用者１１は、音声入力として、タイトル又はコメントを選択する（ステップ１５０）。タイトル又はコメントを選択後、利用者１１は、撮像現場で発話して音声入力を実行する（ステップ１５２）。 When the first text data is displayed in the comment field (step 118 described above), the user 11 then selects a title or comment as voice input (step 150). After selecting the title or comment, the user 11 speaks at the imaging site and executes voice input (step 152).

発明した音声データは、通信端末装置１２から音声認識サーバ３６へ送信される（ステップ１５４）。 The voice data of the invention is transmitted from the communication terminal device 12 to the voice recognition server 36 (step 154).

音声認識サーバ３６では、音声認識処理を実行し、音声データを文字情報に変換し、第２テキストデータを生成する（ステップ１５６）。 The voice recognition server 36 executes voice recognition processing, converts voice data into character information, and generates second text data (step 156).

第２テキストデータは、音声認識サーバ３６から通信端末装置１２へ返信され（ステップ１５８）、タイトル欄又はコメント欄に表示される（ステップ１６０）。 The second text data is returned from the voice recognition server 36 to the communication terminal device 12 (step 158) and displayed in the title column or the comment column (step 160).

図８は、図７の通信プロトコルの実行に基づいて、表示される通信端末装置１２の入出力デバイス２２（タッチパネル部２２Ａ）の表示画面の遷移図である。 FIG. 8 is a transition diagram of a display screen of the input / output device 22 (touch panel unit 22A) of the communication terminal device 12 to be displayed based on the execution of the communication protocol of FIG. 7.

図８（Ａ）は、図４のステップ１０２においてタッチパネル部２２Ａの全域が撮像画面表示欄４０となって、読み出した記録データに基づく画像が表示される状態である。 FIG. 8A shows a state in which the entire area of the touch panel unit 22A serves as the imaging screen display field 40 in step 102 of FIG. 4, and an image based on the read recorded data is displayed.

図８（Ｂ）は、表示されている画像の第１テキストデータ表示画面である。このとき、タッチパネル部２２Ａは、撮像画面表示欄４２と、タイトル表示欄４４と、コメント表示欄４６とに分割され、第１テキストデータは、コメント表示欄４６に表示され、第２テキストデータは、タイトル表示欄４４及び／又はコメント表示欄４６に表示される。 FIG. 8B is a first text data display screen of the displayed image. At this time, the touch panel unit 22A is divided into an imaging screen display field 42, a title display field 44, and a comment display field 46, the first text data is displayed in the comment display field 46, and the second text data is displayed. It is displayed in the title display field 44 and / or the comment display field 46.

図８（Ｃ）は、ＯＣＲ処理を行って第１テキストデータが付加された画像の履歴表示例であり、タッチパネル部２２Ａの全域が履歴表示欄４８となっている。 FIG. 8C is an example of history display of an image to which the first text data is added by performing OCR processing, and the entire area of the touch panel unit 22A is the history display field 48.

なお、図８（Ｃ）のように、ファイルサーバ１４や、その他の外部ストレージからダウンロードして画像を表示する場合、セキュリティ、秘匿性の観点から、予め特定した通信端末装置１２に制限することが好ましい。例えば、パスワードを入力して閲覧可能とする、或いは、通信端末装置１２毎のデフォルト設定で、閲覧の可否を決めておく、等が考えられる。 As shown in FIG. 8C, when the image is downloaded from the file server 14 or other external storage and displayed, the image may be limited to the communication terminal device 12 specified in advance from the viewpoint of security and confidentiality. preferable. For example, it is conceivable to enter a password to enable browsing, or to decide whether or not to browse by the default setting for each communication terminal device 12.

図８（Ｄ）は、通信端末装置１２にインストールされる文字入力支援アプリケーションプログラムの各種設定画面表示欄５０であり、音声認識処理、及びＯＣＲ処理を実行するか否かを切り替え設定することができるようになっている。 FIG. 8D shows various setting screen display fields 50 of the character input support application program installed in the communication terminal device 12, and it is possible to switch and set whether to execute the voice recognition process and the OCR process. It has become like.

以上説明したように第２の実施の形態では、ＯＣＲ処理による画像からの文字情報抽出に加え、発話音声データが音声認識サーバ３６へ送られ、発話した音声データを文字情報（第２テキストデータ）に変換し、画像データに関連付けられたタイトル又はコメントとして格納される。 As described above, in the second embodiment, in addition to extracting character information from the image by OCR processing, spoken voice data is sent to the voice recognition server 36, and the spoken voice data is used as character information (second text data). Is converted to and stored as a title or comment associated with the image data.

この画像に表示された文字が第２のテキストデータとして、画像データに関連付けられることで、大量の画像ファイルの中から、必要な画像を検索する際の検索キーワードとして利用することができ、迅速な検索処理を実現することができる。 By associating the characters displayed in this image with the image data as the second text data, it can be used as a search keyword when searching for the required image from a large number of image files, which is quick. Search processing can be realized.

なお、第２の実施の形態では、既に格納された記録データに対してＯＣＲ処理を行ったため、音声認識もＯＣＲ処理とともに行うようにしたが、発話による音声認識の時期は、ＯＣＲ処理とは別に、撮像直後であってもよい。 In the second embodiment, since the recorded data already stored is subjected to OCR processing, voice recognition is also performed together with OCR processing, but the timing of voice recognition by utterance is different from OCR processing. , It may be immediately after imaging.

１０ファイル用文字入力支援システム
１１利用者
１２通信端末装置
１２ＳＰスマートフォン
１２ＤＣデジタルカメラ
１２ＨＳヘッドセット型ウェアラブルカメラユニット
１２ＳＧスマートグラス型ウェアラブルカメラユニット
１４ファイルサーバ
１４Ａデータベース
１６ＯＣＲサーバ
１８ネットワーク
２０マイクロコンピュータ
２０ＡＣＰＵ
２０ＢＲＡＭ
２０ＣＲＯＭ
２０Ｄ入出力ポート
２０Ｅバス
２２入出力デバイス
２２Ａタッチパネル部
２２Ｂスピーカ
２２Ｃマイクロフォン
２４ハードディスク
２６撮像デバイス
２８通信Ｉ／Ｆ
３０無線中継装置
３２筐体
３４ケーブル
３６音声認識サーバ
４０撮像画面表示欄
４２撮像画面表示欄
４４タイトル表示欄
４６コメント表示欄
４８履歴表示欄 10 Character input support system for files 11 Users 12 Communication terminal equipment 12SP Smartphone 12DC Digital camera 12HS Headset type wearable camera unit 12SG Smart glass type wearable camera unit 14 File server 14A Database 16 OCR server 18 Network 20 Microcomputer 20A CPU
20B RAM
20C ROM
20D I / O port 20E bus 22 I / O device 22A Touch panel 22B Speaker 22C Microphone 24 Hard disk 26 Imaging device 28 Communication I / F
30 Wireless relay device 32 Housing 34 Cable 36 Voice recognition server 40 Imaging screen display column 42 Imaging screen display column 44 Title display column 46 Comment display column 48 History display column

Claims

A display control unit that reads out the image data stored in the image storage area and displays it on the display unit.
A character recognition unit that extracts a character image from the image displayed on the display unit to generate character recognition text data, and a character recognition unit.
A text data storage unit that stores character recognition text data generated by the character recognition unit in association with the image data, and a text data storage unit.
Character input support control device with.

A character image based on the character recognition text data is displayed as a comment of the captured image.
The character input support control device according to claim 1.

It further has a voice recognition unit that converts voice data uttered during image display by the display unit into voice recognition text data.
The text data storage unit stores the voice recognition text data voice-recognized by the voice recognition unit in association with the image data, and at the same time.
The voice recognition text data is displayed as at least one of the title and the comment of the image.
The character input support control device according to claim 1 or 2.

A communication terminal device equipped with a display control unit that reads out image data stored in the image storage area and displays it on the display unit.
A character recognition server equipped with a character recognition unit that extracts a character image from the image displayed on the display unit and generates character recognition text data.
A data storage server provided with a text data storage unit that stores the character recognition text data generated by the character recognition unit in association with the image data of the image displayed on the display unit.
Character input support system with.

A character image based on the character recognition text data is displayed as a comment of the captured image.
The character input support system according to claim 4.

It also has a voice recognition server that converts voice data spoken during image display by the display unit into voice recognition text data.
The text data storage unit stores the voice recognition text data voice-recognized by the voice recognition server in association with the image data, and at the same time.
The voice recognition text data is displayed as at least one of the title and the comment of the image.
The character input support system according to claim 5.

After being stored in the data storage server, the captured image captured by the imaging unit of the communication terminal device can be viewed in a state of being restricted to a specific communication terminal device for which permission has been obtained in advance.
The character input support system according to any one of claims 4 to 6.

Equipped with a viewer function that lists the stored frames, with the display area where the image and text are set as one unit frame.
The character input support system according to any one of claims 4 to 7.

Computer,
Operate as each part of the character input support control device according to any one of claims 1 to 3.
Character input support program.