JP2019121991A

JP2019121991A - Moving image manual preparing system

Info

Publication number: JP2019121991A
Application number: JP2018002062A
Authority: JP
Inventors: 拓也小倉; Takuya Ogura
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2019-07-22

Abstract

To form a moving image manual on which a photographer intent is accurately reflected by using sight information of the photographer with a simple method.SOLUTION: A moving image manual preparing system 1 comprises an imaging part 2, a sight detection part 3, a determination part 14, a storage target determination part 16 and a storage part 12. The imaging part 2 acquires a visual field image by photographing a visual field of a photographer. The sight detection part 3 detects sight information of the photographer. The determination part 14 determines whether there is gaze of the photographer on the basis of the sight information. The storage target determination part 16 determines at least one part of the visual field image as a storage target on the basis of a determination result of the determination part 14. The storage part 12 stores the storage target determined by the storage target determination part 16 as data of a moving image manual.SELECTED DRAWING: Figure 2

Description

本発明は、動画マニュアルを作成する動画マニュアル作成システムに関する。 The present invention relates to a movie manual creation system for creating a movie manual.

従来から、業務の共有・効率化、新人教育などで文書マニュアルが活用されている一方で、マニュアルを動画で表示する動画マニュアルが近年では注目されており、その作成技術が例えば特許文献１に開示されている。特許文献１では、入力されるテキストデータに基づいて、作業内容を示す作業画像データ（例えば３Ｄアニメーションデータ）を作成し、作業画像データとテキストデータとを対応付けて出力するようにしている。 Conventionally, a document manual has been utilized for sharing and efficiency of work, new employee education, etc., while a moving image manual for displaying the manual as a moving image has attracted attention in recent years, and its creation technology is disclosed in, for example, Patent Document 1 It is done. In Patent Document 1, work image data (for example, 3D animation data) indicating work content is created based on input text data, and work image data and text data are associated and output.

一方、動画に対する画像処理の一手法が、例えば特許文献２に開示されている。特許文献２では、動画を構成する各フレームを、所定のサイズの領域ごとに走査して、領域ごとの重要度を算出し、上記領域を重要度に基づいてクラスタリングすることにより、少なくとも１つの領域クラスタを生成し、上記少なくとも１つの領域クラスタから、重要度に基づいて重要領域クラスタを選択し、各フレームの重要領域クラスタを含む部分フレームを抽出し、上記部分フレームにより構成される部分動画を生成するようにしている。これにより、動画に含まれる重要度が大きい領域を効率的に視聴することが可能となっている。 On the other hand, one method of image processing for moving images is disclosed, for example, in Patent Document 2. In patent document 2, each frame which comprises a moving image is scanned for every area | region of a predetermined size, the importance degree for every area | region is calculated, and the said area | region is clustered based on importance, at least 1 area | region Clusters are generated, important area clusters are selected from the at least one area cluster based on importance, partial frames including important area clusters of each frame are extracted, and a partial moving image composed of the partial frames is generated I am trying to do it. Thereby, it is possible to efficiently view a region of high importance included in the moving image.

また、例えば特許文献３では、動画に含まれるフレームからオブジェクトを検出し、検出されたオブジェクトを含む所定の領域から得られる画像特徴量に基づいてオブジェクトを識別し、動画においてオブジェクトに関連する動作を検出し、検出されたオブジェクトの位置と、オブジェクトに関連する動作の検出結果とに基づいて、動画の各フレームにおける注目領域を決定している。これにより、注目領域のトリミングや拡大表示によって、動画閲覧者が注目領域を容易に閲覧することができるようにしている。 For example, in Patent Document 3, an object is detected from a frame included in a moving image, the object is identified based on an image feature obtained from a predetermined area including the detected object, and an operation related to the object in the moving image is The attention area in each frame of the moving image is determined based on the detected and detected position of the object and the detection result of the motion related to the object. This enables the user of the moving image to easily view the region of interest by trimming or enlarging the region of interest.

また、例えば特許文献４では、記録媒体に、撮影画像の画像データと、視線情報または視線情報から検出される主要被写体の情報とを記録するようにしている。これにより、記録媒体に記録された情報を用いて、主要被写体が存在する見たいシーンの検出や、主要被写体を中心とした画像の再生や編集等を効率よく行い、貴重な画像データを十分に活用することが可能となっている。また、特許文献４では、動画の画像データから、主要被写体を特定できるシーンのみ（あるいはさらに、その前後の所定数フレーム）を抽出して、これを１つの動画の画像データに編集したり、同じ主要被写体が存在する複数の動画から同じ主要被写体が存在するシーンをまとめて、１つの動画の画像データに編集してもよいことも開示されている。 Further, for example, in Patent Document 4, image data of a photographed image and information of a main subject detected from line-of-sight information or line-of-sight information are recorded in a recording medium. As a result, information recorded in the recording medium is used to efficiently detect a scene in which the main subject is desired, and to reproduce and edit an image centered on the main subject efficiently, thereby making it possible to obtain valuable image data sufficiently. It is possible to utilize it. Further, in Patent Document 4, only a scene that can identify a main subject (or a predetermined number of frames before and after that) is extracted from image data of a moving image, and this is edited into image data of one moving image, or the same It is also disclosed that a scene in which the same main subject is present from a plurality of videos in which the main subject is present may be compiled into image data of one video.

特開２００６−２１５９８６号公報（請求項１、段落〔００２７〕、〔００５１〕、図１等参照）JP-A-2006-215986 (see claim 1, paragraphs [0027] and [0051], FIG. 1, etc.) 特開２０１６−２１９８７９号公報（請求項１、段落〔０００８〕、〔０００９〕、〔００３５〕、図１等参照）JP, 2016-219879, A (refer to claim 1, paragraph [0008], [0009], [0035], etc.) 特開２０１４−８５８４５号公報（請求項１、段落〔０００８〕、〔０００９〕、図１等参照）JP 2014-85845 A (see claim 1, paragraphs [0008], [0009], FIG. 1, etc.) 特開２００４−７１５８号公報（請求項１、段落〔００５５〕、〔００６０〕、〔００６３〕、図１等参照）JP-A-2004-7158 (see claim 1, paragraphs [0055], [0060], [0063], FIG. 1, etc.)

ところが、特許文献１の動画マニュアルの作成方法では、テキストデータの入力の作業や工程が必要であり、簡単な手法で動画マニュアルを作成することができない。また、特許文献２および３は、いずれも動画の一部を抽出することを開示しているが、抽出にあたって動画の撮影者の意図が全く反映されていない。さらに、特許文献４では、記録媒体に記録された情報から、主要被写体を中心とする画像の編集作業を行うことは可能であるが、記録媒体に記録される情報（画像データ）自体を、撮影者の視線情報に応じて変える思想は全くない。このため、特許文献２〜４の技術を動画マニュアルの作成に適用しても、撮影者の意図を的確に反映した動画マニュアルを作成することができない。 However, the method of creating a moving image manual of Patent Document 1 requires an operation and a process of inputting text data, and can not create a moving image manual by a simple method. Further, Patent Documents 2 and 3 both disclose extracting a part of a moving image, but the extraction does not reflect the intention of the moving image photographer at all. Furthermore, according to Patent Document 4, although it is possible to edit an image centering on the main subject from the information recorded on the recording medium, the information (image data) itself recorded on the recording medium is photographed There is no idea to change according to the gaze information of the person at all. For this reason, even if the techniques of Patent Documents 2 to 4 are applied to creation of a moving image manual, it is not possible to create a moving image manual accurately reflecting the photographer's intention.

本発明は、上記の問題点を解決するためになされたもので、その目的は、撮影者の視線情報を用いて、撮影者の意図を的確に反映した動画マニュアルを簡単な方法で作成することができる動画マニュアル作成システムを提供することにある。 The present invention has been made to solve the above-mentioned problems, and its object is to create a video manual, which accurately reflects the photographer's intention, using the photographer's gaze information in a simple method. To provide a video manual creation system that can

本発明の一側面に係る動画マニュアル作成システムは、撮影者の視界を動画で撮影して視界画像を取得する撮像部と、前記撮影者の視線情報を検出する視線検出部と、前記視線情報に基づいて、前記撮影者の注視の有無を判別する判別部と、前記判別部の判別結果に基づいて、前記視界画像の少なくとも一部を記憶対象として決定する記憶対象決定部と、前記記憶対象決定部によって決定された前記記憶対象を、動画マニュアルのデータとして記憶する記憶部とを備えている。 A moving image manual creation system according to one aspect of the present invention includes an imaging unit that captures a view of a photographer as a moving image and acquires a view image, a gaze detection unit that detects gaze information of the photographer, and the gaze information. And a storage target determination unit configured to determine at least a part of the view image as a storage target based on the determination unit that determines the presence or absence of the photographer's gaze based on the determination result of the determination unit; And a storage unit configured to store the storage target determined by the unit as data of a moving image manual.

上記の動画マニュアル作成システムにおいて、前記判別部は、前記視線情報に基づいて、前記視界画像における前記撮影者の視線の位置と、前記視線の位置の滞留時間とを求める視線解析部と、前記視線の位置および前記滞留時間に基づいて、前記撮影者の前記注視状態を判断する注視判断部とを含んでいてもよい。 In the above-described moving image manual creation system, the determination unit determines a position of the line of sight of the photographer in the view image and a dwell time of the position of the line of sight based on the line-of-sight information; And a gaze determination unit configured to determine the gaze condition of the photographer based on the position of and the staying time.

上記の動画マニュアル作成システムにおいて、前記注視判断部は、前記視線の位置に基づいて、前記視線のブレを示すドリフト量を検出し、前記ドリフト量が閾値以下で、かつ、前記滞留時間が所定時間以上である場合に、前記撮影者の注視ありと判断してもよい。 In the above-described moving image manual creation system, the gaze determination unit detects a drift amount indicating a blur of the gaze based on the position of the gaze, and the drift amount is equal to or less than a threshold and the staying time is a predetermined time In the case of the above, it may be determined that the photographer gazes.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記判別部によって前記撮影者の注視があると判別された場合に、前記視界画像の中で前記撮影者によって注視されている被写体が存在する被写体領域を特定して、前記視界画像から前記被写体領域を抽出し、抽出した前記被写体領域を前記記憶対象として決定してもよい。 In the above-described moving image manual creation system, when the determination unit determines that the photographer is gazing at the storage object determination unit, a subject being gazed by the photographer is present in the view image. The subject region may be specified, the subject region may be extracted from the view image, and the extracted subject region may be determined as the storage target.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、抽出した前記被写体領域を拡大し、拡大後の画像を前記記憶対象として決定してもよい。 In the above-described moving image manual creation system, the storage target determination unit may enlarge the extracted subject area and determine the image after enlargement as the storage target.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記判別部によって取得される情報に応じて、前記視界画像から前記被写体領域を抽出する際の抽出範囲を変化させてもよい。 In the above-described moving image manual creation system, the storage target determination unit may change an extraction range when extracting the subject region from the view image, according to the information acquired by the determination unit.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記判別部によって取得される前記滞留時間に応じて、前記抽出範囲を変化させてもよい。 In the above-described moving image manual creation system, the storage target determination unit may change the extraction range in accordance with the residence time acquired by the determination unit.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記判別部によって取得される前記視線の複数位置の重心の変化に応じて、前記抽出範囲を変化させてもよい。 In the above-described moving image manual creation system, the storage target determination unit may change the extraction range according to a change in the center of gravity of the plurality of positions of the line of sight acquired by the determination unit.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記判別部によって前記撮影者の注視がないと判別された場合に、前記視界画像を前記記憶対象として決定してもよい。 In the above-described moving image manual creation system, the storage target determination unit may determine the view image as the storage target when the determination unit determines that the photographer is not gazing.

上記の動画マニュアル作成システムは、前記撮影者の音声が入力される音声入力部と、前記音声入力部によって入力された音声を認識してテキストデータに変換する音声認識部とをさらに備え、前記記憶対象決定部は、前記テキストデータを、前記視界画像の少なくとも一部に合成して、前記記憶対象を決定してもよい。 The above-mentioned moving picture manual creation system further comprises: a voice input unit to which the voice of the photographer is input; and a voice recognition unit to recognize the voice input by the voice input unit and convert it into text data; The object determining unit may combine the text data with at least a part of the view image to determine the storage object.

上記の動画マニュアル作成システムにおいて、前記記憶対象決定部は、前記記憶対象において被写体が存在する領域の外側に前記テキストデータが位置するように、前記テキストデータを前記視界画像の少なくとも一部に合成してもよい。 In the above-described moving image manual creation system, the storage target determination unit combines the text data with at least a part of the view image so that the text data is positioned outside the area where the subject is present in the storage target. May be

上記の動画マニュアル作成システムの構成によれば、撮影者の視線情報を用いて、撮影者の意図を的確に反映した動画マニュアルを簡単な方法で作成することができる。 According to the above-described configuration of the moving image manual creation system, it is possible to create a moving image manual that accurately reflects the photographer's intention by a simple method using the photographer's line of sight information.

本発明の実施の形態に係る動画マニュアル作成システムの概略の構成を模式的に示す斜視図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a perspective view which shows typically the schematic structure of the moving image manual production system which concerns on embodiment of this invention. 上記動画マニュアル作成システムが有する制御ボックスの詳細な構成を示すブロック図である。It is a block diagram which shows the detailed structure of the control box which the said moving image manual production system has. 上記動画マニュアル作成システムにおける処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process in the said moving image manual production system. 同期処理を行う場合の視界画像および視線情報を示す説明図である。It is explanatory drawing which shows the view image and gaze information in the case of performing a synchronous process. 同期処理後に得られる視界画像、視線情報および音声情報の一例を示す説明図である。It is explanatory drawing which shows an example of the view image, gaze information, and audio | voice information obtained after synchronous processing. 撮像部の水平方向の画角および垂直方向の画角をそれぞれＸ軸およびＹ軸に対応させたＸＹ座標平面を模式的に示す説明図である。FIG. 6 is an explanatory view schematically showing an XY coordinate plane in which an angle of view in the horizontal direction and an angle of view in the vertical direction of the imaging unit correspond to the X axis and the Y axis, respectively. 記憶部に記憶される記憶対象の画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image of the memory object memorize | stored in a memory | storage part. 同期処理後の視界画像、視線情報および音声情報の他の例を示す説明図である。It is explanatory drawing which shows the other example of the view image, gaze information, and audio | voice information after synchronous processing. 視線の滞留時間と、記憶対象の再生時における表示の拡大率との関係を模式的に示す説明図である。It is explanatory drawing which shows typically the relationship between the residence time of eyes | visual_axis, and the magnifying power of the display at the time of reproduction | regeneration of memory object. 視界画像から抽出される記憶対象の画像の例を示す説明図である。It is explanatory drawing which shows the example of the image of the memory | storage object extracted from a view image. 同期処理後の視界画像、視線情報および音声情報のさらに他の例を示す説明図である。It is explanatory drawing which shows the further another example of the visual field image, gaze information, and audio | voice information after synchronous processing. 視界画像から抽出される記憶対象の画像の他の例を示す説明図である。It is explanatory drawing which shows the other example of the image of the memory object extract | extracted from a view image.

本発明の実施の一形態に係る動画マニュアル作成システムについて、図面に基づいて説明すれば以下の通りである。なお、本発明は、以下の内容に限定されるわけではない。 It will be as follows if the animation manual creation system concerning one embodiment of the present invention is explained based on a drawing. The present invention is not limited to the following contents.

〔動画マニュアル作成システムの構成〕
図１は、本実施形態の動画マニュアル作成システム１の概略の構成を模式的に示す斜視図である。動画マニュアル作成システム１は、撮像部２と、視線検出部３と、マイク４と、制御ボックス１０とを備えて構成されている。撮像部２および視線検出部３は、例えば眼鏡５に搭載されている。眼鏡５は、視度矯正用のレンズを備えた通常の眼鏡であってもよいし、視度を矯正しないダミーレンズを備えた眼鏡であってもよいし、頭部搭載型の虚像表示装置（ヘッドマウントディスプレイ）であってもよい。 [Configuration of movie manual creation system]
FIG. 1 is a perspective view schematically showing a schematic configuration of a moving image manual creation system 1 of the present embodiment. The moving picture manual creation system 1 is configured to include an imaging unit 2, a gaze detection unit 3, a microphone 4, and a control box 10. The imaging unit 2 and the gaze detection unit 3 are mounted on, for example, the glasses 5. The glasses 5 may be ordinary glasses provided with a diopter correction lens, glasses provided with a dummy lens not correcting diopter, or a head mounted virtual image display device ( It may be a head mounted display).

撮像部２は、撮影者の視界を動画で撮影して視界画像を取得するカメラ（視線カメラ）であり、例えば眼鏡５の左右のレンズをつなぐ中央のブリッジに搭載されている。このため、撮影者が眼鏡５をかけたとき、撮像部２は撮影者の眼に近い位置から撮影者の視界を撮影することになる。 The imaging unit 2 is a camera (line-of-sight camera) that captures a visual field of a photographer as a moving image and acquires a visual field image, and is mounted on, for example, a central bridge connecting left and right lenses of the glasses 5. Therefore, when the photographer wears the glasses 5, the imaging unit 2 captures an image of the field of view of the photographer from a position close to the eye of the photographer.

視線検出部３は、撮影者の視線情報を検出する素子であり、照明部（イルミネータ）および受光部を有している。イルミネータによって撮影者の角膜付近に近赤外線を照射し、その反射光を受光部で検出することにより、撮影者の角膜表面にできるプルキニエ像の位置と撮影者の瞳孔の位置との相対的な位置の変化を検出することができ、これによって撮影者の視線方向を上記視線情報として検出することができる（角膜反射法）。例えば、左眼の角膜反射の位置（プルキニエ像の位置）よりも瞳孔が目じり側にあれば、撮影者は左側を見ており（撮影者の視線が左方向を向いており）、角膜反射の位置よりも瞳孔が目頭側にあれば、撮影者は右側を見ている（撮影者の視線が右方向を向いている）ことを検出できる。なお、本実施形態の視線検出部３は、上記のように代表的な角膜反射法によって視線検出（アイトラッキング）を行う構成であるが、強膜反射法などの他の方法で視線検出を行う構成であってもよい。また、本実施形態では、視線検出部３は、眼鏡５の左右のレンズのそれぞれの縁に設けられて、両眼について視線を検出する構成としているが、どちらか一方のレンズの縁に設けられて、片眼について視線を検出する構成としてもよい。 The gaze detection unit 3 is an element that detects gaze information of the photographer, and includes an illumination unit (illuminator) and a light reception unit. The position of the Purkinje image on the surface of the photographer's cornea and the position of the pupil of the photographer relative to each other by irradiating near infrared rays near the cornea of the photographer by the illuminator and detecting the reflected light by the light receiving unit The change in the position of the photographer can be detected, whereby the gaze direction of the photographer can be detected as the gaze information (corneal reflection method). For example, if the pupil is on the staring side of the position of the corneal reflection of the left eye (the position of the Purkinje image), the photographer looks at the left side (the photographer's gaze is facing to the left). If the pupil is closer to the eye than the position, it can be detected that the photographer is looking to the right (the photographer's gaze is pointing to the right). Although the visual axis detection unit 3 according to this embodiment is configured to perform visual axis detection (eye tracking) by the representative corneal reflection method as described above, the visual axis detection is performed by another method such as the scleral reflection method. It may be a configuration. Further, in the present embodiment, the line-of-sight detection unit 3 is provided at each edge of the left and right lenses of the glasses 5 to detect the line of sight for both eyes, but is provided at the edge of one of the lenses The line of sight may be detected for one eye.

マイク４は、撮影者の音声が入力される音声入力部である。 The microphone 4 is a voice input unit to which the voice of the photographer is input.

撮像部２、視線検出部３およびマイク４は、配線６を介して制御ボックス１０と接続されている。これにより、撮像部２で取得された情報（撮影者の視界画像のデータ）、視線検出部３で取得された情報（撮影者の視線情報）、およびマイク４で取得された情報（撮影者の音声情報）が、配線６を介して制御ボックス１０に出力される。なお、撮像部２、視線検出部３およびマイク４と、制御ボックス１０とは、無線で情報を伝達する構成であってもよい。 The imaging unit 2, the sight line detection unit 3 and the microphone 4 are connected to the control box 10 via the wiring 6. Thereby, the information acquired by the imaging unit 2 (data of the view image of the photographer), the information acquired by the gaze detection unit 3 (the gaze information of the photographer), and the information acquired by the microphone 4 (the photographer Audio information is output to the control box 10 via the wiring 6. The imaging unit 2, the sight line detection unit 3, the microphone 4, and the control box 10 may be configured to wirelessly transmit information.

〔制御ボックスの構成〕
図２は、制御ボックス１０の詳細な構成を示すブロック図である。制御ボックス１０は、撮像部２、視線検出部３およびマイク４から出力される情報に基づいて、動画マニュアルの作成を実行する。この制御ボックス１０は、操作部１１と、記憶部１２と、制御部１３とを有している。 [Configuration of control box]
FIG. 2 is a block diagram showing the detailed configuration of the control box 10. As shown in FIG. The control box 10 executes creation of a moving image manual based on information output from the imaging unit 2, the sight line detection unit 3 and the microphone 4. The control box 10 includes an operation unit 11, a storage unit 12, and a control unit 13.

操作部１１は、撮影者によって操作されて各種の指示入力を受け付けるボタンやスイッチで構成されている。記憶部１２は、制御部１３の動作プログラムのほか、各種の情報を記憶するメモリであり、例えばＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、不揮発性メモリなどを含んで構成されている。特に、本実施形態では、記憶部１２は、後述する記憶対象決定部１６によって決定された記憶対象を、動画マニュアルのデータとして記憶するメモリである。 The operation unit 11 includes buttons and switches operated by the photographer to receive various instruction inputs. The storage unit 12 is a memory for storing various information in addition to the operation program of the control unit 13, and includes, for example, a random access memory (RAM), a read only memory (ROM), a non-volatile memory, and the like. . In particular, in the present embodiment, the storage unit 12 is a memory that stores the storage target determined by the storage target determination unit 16 described later as data of a moving image manual.

なお、本実施形態では、記憶部１２に記憶された情報（記憶対象のデータ）を読み出して、図示しない再生装置の表示部に記憶対象の画像を表示させることを、「動画マニュアルの再生」、「記憶対象の再生」、または単に「再生」と称する。 In the present embodiment, “reproduction of a moving image manual” is to read out information (data to be stored) stored in the storage unit 12 and to display an image to be stored on a display unit of a reproduction apparatus (not shown). It is referred to as "reproduction of storage target" or simply "reproduction".

制御部１３は、例えばＣＰＵ（Central Processing Unit；中央演算処理装置）で構成されている。この制御部１３は、判別部１４と、音声認識部１５と、記憶対象決定部１６と、主制御部１７とを有しており、上記のＣＰＵがこれらの各部の機能を発揮する。なお、判別部１４、音声認識部１５、記憶対象決定部１６および主制御部１７は、別々のＣＰＵで構成されていてもよいし、所定の機能を発揮する専用の処理回路で構成されていてもよい。 The control unit 13 is configured of, for example, a CPU (Central Processing Unit). The control unit 13 includes a determination unit 14, a speech recognition unit 15, a storage target determination unit 16, and a main control unit 17, and the above-described CPU exerts the functions of these units. The determination unit 14, the speech recognition unit 15, the storage target determination unit 16, and the main control unit 17 may be configured by separate CPUs, or are configured by dedicated processing circuits that exhibit predetermined functions. It is also good.

判別部１４は、視線検出部３によって検出された視線情報に基づいて、撮影者の注視の有無を判別する。このような判別部１４は、視線解析部１４ａと、注視判断部１４ｂとを含んで構成されている。 The determination unit 14 determines the presence or absence of the gaze of the photographer based on the line-of-sight information detected by the line-of-sight detection unit 3. Such a determination unit 14 is configured to include a line-of-sight analysis unit 14 a and a gaze determination unit 14 b.

視線解析部１４ａは、上記視線情報に基づいて、視界画像における撮影者の視線の位置と、上記視線の位置の滞留時間とを解析して求める。視線解析部１４ａは、計時部（図示せず）を内蔵しており、これによって、上記滞留時間を計時することができる。 The line-of-sight analysis unit 14a analyzes and obtains the position of the line-of-sight of the photographer in the view image and the staying time of the position of the line-of-sight based on the line-of-sight information. The line-of-sight analysis unit 14a incorporates a clock unit (not shown), and thereby can measure the above-mentioned residence time.

注視判断部１４ｂは、視線解析部１４ａによって取得された上記視線の位置および上記滞留時間に基づいて、撮影者の注視状態（注視の有無）を判断する。例えば、注視判断部１４ｂは、撮影者の視線の位置のブレが所定範囲内に収まっており、その状態が所定時間以上続いた場合には、撮影者が被写体を注視していると判断でき、それ以外の場合は、撮影者が被写体を注視していないと判断することができる。 The gaze determination unit 14b determines the gaze state (presence or absence of gaze) of the photographer based on the position of the line of sight acquired by the line of sight analysis unit 14a and the staying time. For example, when the blur of the position of the line of sight of the photographer is within a predetermined range and the state continues for a predetermined time or more, the gaze determination unit 14b can determine that the photographer gazes at the subject. Otherwise, it can be determined that the photographer is not gazing at the subject.

音声認識部１５は、マイク４によって入力された音声を認識してテキストデータに変換する。記憶対象決定部１６は、判別部１４の判別結果（注視の有無の判別結果）に基づいて、視界画像の少なくとも一部を記憶対象として決定する。主制御部１７は、制御ボックス１０内の各部の動作を制御する。 The speech recognition unit 15 recognizes speech inputted by the microphone 4 and converts it into text data. The storage target determination unit 16 determines at least a part of the view image as a storage target based on the determination result of the determination unit 14 (the determination result of the presence or absence of gaze). The main control unit 17 controls the operation of each unit in the control box 10.

〔処理フロー〕
次に、本実施形態の動画マニュアル作成システム１における各処理を具体的に説明する。図３は、動画マニュアル作成システム１における処理の流れを示すフローチャートである。 [Processing flow]
Next, each process in the moving image manual creation system 1 of the present embodiment will be specifically described. FIG. 3 is a flow chart showing the flow of processing in the moving picture manual creation system 1.

まず、撮影者は、視線検出部３によって取得される視線情報と、撮像部２によって取得される視界画像との同期を図る（Ｓ１）。この同期処理により、撮影者の被写体に対する視線の方向と、視界画像の中での被写体の位置とを合わせる（同期させる）ことができる。 First, the photographer synchronizes the gaze information acquired by the gaze detection unit 3 with the view image acquired by the imaging unit 2 (S1). By this synchronization process, it is possible to match (synchronize) the direction of the line of sight of the photographer with respect to the subject and the position of the subject in the view image.

この同期処理は、具体的には、マニュアル作成のための動画撮影の前に、撮影者がマーカーを注視することによって行われる。つまり、図４に示すように、撮影者はマーカーＭの中心Ｍｏを注視し、このときの視界画像を撮像部２によって取得する。同時に、撮影者がマーカーＭの中心Ｍｏを注視しているときの視線情報を視線検出部３によって取得する。取得された上記視線情報は、マーカーＭの中心Ｍｏを注視する撮影者の視線の方向を示すため、この状態で視線情報と視界画像との同期がとれていることになる。そこで、撮影者は、この状態で操作部１１を操作して同期を確立させ、同期処理を終了させる。 Specifically, this synchronization processing is performed by the photographer looking at a marker before moving image shooting for manual creation. That is, as shown in FIG. 4, the photographer gazes at the center Mo of the marker M, and the imaging unit 2 acquires a view image at this time. At the same time, the line-of-sight information when the photographer gazes at the center Mo of the marker M is acquired by the line-of-sight detection unit 3. The acquired line-of-sight information indicates the direction of the line of sight of the photographer looking at the center Mo of the marker M, so that the line-of-sight information and the view image are synchronized in this state. Therefore, the photographer operates the operation unit 11 in this state to establish synchronization, and ends the synchronization process.

次に、撮影者は、操作部１１を操作して、撮像部２によってマニュアル作成のための動画撮影を開始させる（Ｓ２）。撮影者が作業内容を説明しながら実際の作業を進めていくことで、そのときの撮影者の視界画像（動画）が撮像部２によって取得され、撮影者の視線情報が視線検出部３によって取得され、撮影者の音声情報がマイク４によって取得される（Ｓ３）。なお、視線検出部３による視線情報の取得は、任意の間隔でよく、本実施形態では、例えば０．１秒ごとに視線情報が取得されているとする。 Next, the photographer operates the operation unit 11 to start moving image shooting for manual creation by the imaging unit 2 (S2). As the photographer proceeds with the actual work while explaining the work content, the image of the photographer's field of vision (moving image) at that time is acquired by the imaging unit 2 and the photographer's gaze information is acquired by the gaze detection unit 3 The voice information of the photographer is acquired by the microphone 4 (S3). Note that acquisition of line-of-sight information by the line-of-sight detection unit 3 may be at an arbitrary interval, and in the present embodiment, it is assumed that line-of-sight information is acquired every 0.1 seconds, for example.

図５は、Ｓ３で取得した視界画像、視線情報および音声情報の一例を示している。ここでは例として、動画マニュアルの作成対象となる作業が、ドライバＡ１を用いて、ネジＡ２を筐体Ａ３に締める作業であるとする。同図より、視界画像に含まれるドライバＡ１、ネジＡ２および筐体Ａ３と対応する位置に視線映像（視線の位置を示す映像）Ｐ１〜Ｐ３が位置する視線情報が得られていることがわかる。撮像部２、視線検出部３、およびマイク４によって取得された各情報（視界画像の情報、視線情報、音声情報）は、配線６を介して制御ボックス１０に送られる。 FIG. 5 shows an example of the view image, the line-of-sight information, and the audio information acquired in S3. Here, as an example, it is assumed that the operation to be a target of creation of the moving image manual is an operation of tightening the screw A2 to the housing A3 using the driver A1. From the figure, it can be seen that line-of-sight information is obtained in which line-of-sight videos (video showing the position of the line of sight) P1 to P3 are located at positions corresponding to the driver A1, the screw A2 and the housing A3 included in the view image. Information (visual field image information, visual line information, audio information) acquired by the imaging unit 2, the visual axis detection unit 3, and the microphone 4 is sent to the control box 10 via the wiring 6.

制御ボックス１０の判別部１４は、視線検出部３によって取得された視線情報を解析して、撮影者の注視状態を判断する（Ｓ４）。具体的には、判別部１４の視線解析部１４ａは、上記視線情報に基づいて、撮像部２によって取得された視界画像における撮影者の視線の位置と、上記視線の位置の滞留時間とを求める。そして、注視判断部１４ｂは、上記視線の位置および上記滞留時間に基づいて、撮影者の注視の有無を判断する。 The determination unit 14 of the control box 10 analyzes the line-of-sight information acquired by the line-of-sight detection unit 3 to determine the gaze state of the photographer (S4). Specifically, the line-of-sight analysis unit 14a of the determination unit 14 obtains the position of the photographer's line of sight in the view image acquired by the imaging unit 2 and the residence time of the position of the line of sight based on the line-of-sight information. . Then, the gaze determination unit 14b determines the presence or absence of the photographer's gaze based on the position of the sight line and the staying time.

例えば、注視判断部１４ｂは、水平方向の画角および垂直方向の画角がそれぞれＸ軸およびＹ軸に対応するＸＹ座標平面上で、上記視線情報に基づいて得られる撮影者の最新の視線位置と、直近で得られた９個の視線位置の合計１０個についての位置座標の重心の移動量（移動前後の各重心に向かう２つの視線方向の角度差）を算出する。なお、上記重心のＸ座標は、１０個の視線位置のＸ座標の平均であり、上記重心のＹ座標は、１０個の視線位置のＹ座標の平均である。そして、注視判断部１４ｂは、上記角度差が例えば０．５°以内であると、視線のブレを示すドリフト量が小さいと判断し、その状態（滞留時間）が所定時間（例えば２秒）以上継続すると、注視と判断する。なお、本実施形態では、重心を求める際に用いた視線位置の数を１０個とし、上記角度差を１°以内の０．５°とし、上記所定時間を２秒としたが、これらの値は作業内容に応じて適宜変更可能である。 For example, the gaze determination unit 14b determines the latest gaze position of the photographer obtained based on the gaze information on the XY coordinate plane in which the horizontal angle of view and the vertical angle of view respectively correspond to the X axis and Y axis. The amount of movement of the center of gravity of the position coordinates for a total of ten of the nine line-of-sight positions obtained most recently (angle difference between two line-of-sight directions toward each center of gravity before and after movement) is calculated. The X-coordinate of the center of gravity is an average of X-coordinates of ten eye-gaze positions, and the Y-coordinate of the centroid is an average of Y-coordinates of ten eye-gaze positions. Then, the gaze determination unit 14b determines that the drift amount indicating the blur of the sight line is small if the angle difference is, for example, 0.5 ° or less, and the state (retention time) is a predetermined time (for example, 2 seconds) or more If it continues, it will be judged as a gaze. In the present embodiment, the number of eye positions used when determining the center of gravity is ten, the angle difference is 0.5 ° within 1 °, and the predetermined time is 2 seconds. Can be changed appropriately according to the work content.

ここで、上記角度差が０．５°以内であるか否かは、撮像部２の撮影画角（水平方向、垂直方向の各画角）と撮像素子のパラメータ（画素数、１画素の大きさ）とを用いて判断することができる。例えば、図６は、上記のＸＹ座標平面を模式的に示している。なお、Ｘ軸は、撮像部２の水平方向の画角に対応しており、Ｙ軸は、撮像部２の垂直方向の画角に対応している。撮像部２において、画角ａ°が画素数ｂ個に対応している場合、画角０．５°は、画素数（０．５ｂ／ａ）個に対応する。また、撮像素子の１画素（正方形とする）の一辺の長さをｃ（μｍ）とすると、角度差０．５°に相当する長さ（μｍ）は、（画素数）×（１画素の一辺の長さ）＝（０．５ｂ／ａ）×ｃで表される。したがって、１０個の視線位置の重心Ｇの移動量Ｔが、（０．５ｂ／ａ）×ｃ以下であるか否かを判断することにより、上記の角度差が０．５°以内であるか否かを判断することができる。 Here, whether or not the angle difference is within 0.5 ° depends on the shooting angle of view (horizontal and vertical angle of view) of the imaging unit 2 and the parameter (number of pixels, size of one pixel) of the imaging device. Can be used to determine For example, FIG. 6 schematically shows the XY coordinate plane described above. The X axis corresponds to the horizontal angle of view of the imaging unit 2, and the Y axis corresponds to the vertical angle of view of the imaging unit 2. In the imaging unit 2, when the angle of view a ° corresponds to the number b of pixels, the angle of view 0.5 ° corresponds to the number of pixels (0.5 b / a). Further, assuming that the length of one side of one pixel (square) of the image sensor is c (μm), the length (μm) corresponding to an angle difference of 0.5 ° is (number of pixels) × (one pixel The length of one side) = (0.5 b / a) × c. Therefore, whether the above-mentioned angular difference is within 0.5 ° by judging whether the movement amount T of the center of gravity G of ten eye gaze positions is less than or equal to (0.5 b / a) × c It can be determined whether or not.

注視判断部１４ｂによって撮影者の注視ありと判断された場合（Ｓ５にてＹｅｓ）、記憶対象決定部１６は、公知の稜線検出法を用いた画像処理によって、視界画像の中で被写体の輪郭線を検出し、輪郭線で囲まれた領域を、被写体が存在する被写体領域として特定する（Ｓ６）。 When it is determined by the gaze determination unit 14b that the photographer has gazed (Yes in S5), the storage target determination unit 16 performs an image process using a well-known ridge detection method to obtain an outline of the subject in the view image. Is detected, and an area surrounded by an outline is specified as an object area where an object exists (S6).

次に、記憶対象決定部１６は、視界画像から、Ｓ６で特定された被写体領域を画像処理によって抽出（トリミング）する（Ｓ７）。そして、記憶対象決定部１６は、抽出された上記被写体領域を、記憶部１２に記憶させる記憶対象の画像として決定する（Ｓ８）。このとき、記憶対象決定部１６は、上記被写体領域を等倍のまま記憶対象の画像として決定してもよいし、上記被写体領域を拡大して拡大画像を生成し、この拡大画像を記憶対象の画像として決定してもよい。ここでは、記憶対象決定部１６は、後者のように被写体領域の拡大画像を記憶対象の画像として決定しているとする。なお、上記の記憶対象は、作成された動画マニュアルを再生するときに、表示部に表示される表示対象でもある。 Next, the storage target determining unit 16 extracts (trimms) the subject region specified in S6 from the view image by image processing (S7). Then, the storage target determining unit 16 determines the extracted subject region as an image of a storage target to be stored in the storage unit 12 (S8). At this time, the storage target determination unit 16 may determine the above-mentioned subject region as an image with the same magnification as the storage target image, or enlarge the above-mentioned subject region to generate a magnified image, and this magnified image is stored. It may be determined as an image. Here, it is assumed that the storage target determination unit 16 determines the enlarged image of the subject region as the storage target image as in the latter case. Note that the above storage target is also a display target displayed on the display unit when the created moving image manual is played back.

一方、注視判断部１４ｂによって撮影者の注視がないと判断された場合（Ｓ５にてＮｏ）、記憶対象決定部１６は、Ｓ３で取得された全体の視界画像を、記憶部１２に記憶させる記憶対象の画像として決定する（Ｓ９）。 On the other hand, when it is determined by the gaze determination unit 14b that the photographer does not gaze (No in S5), the storage target determination unit 16 stores the entire view image acquired in S3 in the storage unit 12 It determines as an image of an object (S9).

次に、音声認識部１５は、Ｓ３で取得された音声情報を、音声認識してテキストデータに変換する（Ｓ１０）。そして、記憶対象決定部１６は、上記した記憶対象の画像に対して、Ｓ１０で取得されたテキストデータをコメントとして付加するコメント付加領域を設定する（Ｓ１１）。このとき、コメント付加領域は、記憶対象の画像において、撮影者の注視がある場合は、注視されている被写体の外側の領域に設定され、撮影者の注視がない場合は、被写体が存在しない領域（被写体と重ならない領域）に設定される。なお、記憶対象の画像において、被写体が存在する領域であるか否かは、記憶対象決定部１６がＳ６と同様の処理（例えば稜線検出法を用いた画像処理）を行うことによって判断することができる。 Next, the voice recognition unit 15 performs voice recognition on the voice information acquired in S3 and converts it into text data (S10). Then, the storage target determination unit 16 sets a comment addition area in which the text data acquired in S10 is added as a comment to the above-described storage target image (S11). At this time, in the image to be stored, the comment addition area is set to an area outside the subject being gazed when the photographer gazes, and when the photographer does not gaze, an area where the subject does not exist. It is set to (area that does not overlap with the subject). In addition, in the image to be stored, it may be determined whether the storage object determining unit 16 performs the same processing as S6 (for example, image processing using a ridgeline detection method) whether or not it is an area where a subject is present. it can.

そして、記憶対象決定部１６は、Ｓ８またはＳ９で取得した記憶対象の画像のコメント付加領域に、Ｓ１０で取得したテキストデータをコメントとして付加して、これらを合成する（Ｓ１２）。そして、記憶対象決定部１６は、合成後の記憶対象の画像を経過時間と対応付けて記憶部１２に記憶させる（Ｓ１３）。撮影者が操作部１１を操作して作業終了を入力するまで、以上の処理が繰り返し行われ（Ｓ１４）、作業が終了すると、一連の処理が終了する。 Then, the storage target determination unit 16 adds the text data acquired in S10 as a comment to the comment addition area of the storage target image acquired in S8 or S9, and combines these (S12). Then, the storage target determining unit 16 stores the image of the storage target after combination in the storage unit 12 in association with the elapsed time (S13). The above processing is repeatedly performed until the photographer operates the operation unit 11 to input work end (S14), and when the work is ended, a series of processes are ended.

図７は、Ｓ１３において、記憶部１２に記憶される記憶対象の画像（記憶画像、表示画像）の一例を示している。撮影者の注視がない場合、記憶対象は視界画像であるため、視界画像において「５か所のネジを締めます」というコメントが付加された画像が、動画マニュアルのデータとして記憶部１２に記憶される。このとき、被写体（ドライバＡ１、筐体Ａ２、ネジＡ３）に対する注視がないため、上記コメントは、視界画像において被写体が存在しない領域（ドライバＡ１、筐体Ａ２およびネジＡ３と重ならない領域）に付加される。 FIG. 7 shows an example of an image to be stored (stored image, display image) stored in the storage unit 12 in S13. Since the storage target is a view image when the photographer does not gaze, an image to which a comment “5 screws are tightened” is added in the view image is stored in the storage unit 12 as data of the moving image manual. Ru. At this time, since there is no gaze on the subject (driver A1, case A2, screw A3), the above comment is added to the area where the subject does not exist in the view image (area not overlapping with driver A1, case A2 and screw A3) Be done.

また、撮影者がネジＡ２を注視している場合、記憶対象は、視界画像から抽出されて拡大されたネジＡ２の画像（被写体領域）であるため、ネジＡ２の画像の外側に上記コメントが付加された画像が、動画マニュアルのデータとして記憶部１２に記憶される。また、撮影者が筐体Ａ３を注視している場合、記憶対象は、視界画像から抽出されて拡大された筐体Ａ３の画像（被写体領域）であるため、筐体Ａ３の画像の外側に上記コメントが付加された画像が、動画マニュアルのデータとして記憶部１２に記憶される。 Further, when the photographer gazes at the screw A2, the above-mentioned comment is added to the outside of the image of the screw A2 because the storage target is the image (object region) of the screw A2 extracted and enlarged from the view image. The stored image is stored in the storage unit 12 as data of a moving image manual. Further, when the photographer gazes at the housing A3, the storage target is the image (subject region) of the housing A3 extracted and enlarged from the view image, and thus the above-described storage is performed outside the image of the housing A3. The image to which the comment is added is stored in the storage unit 12 as data of the moving image manual.

動画マニュアルを再生する場合、記憶部１２に記憶された動画マニュアルのデータが再生装置（図示せず）によって読み出され、表示部（図示せず）に、上記データに基づく動画マニュアル（図７と同様の画像）が表示される。これにより、視聴者は、表示された動画マニュアルを視聴することが可能となる。 When reproducing a moving image manual, data of the moving image manual stored in the storage unit 12 is read by a reproduction device (not shown), and the display unit (not shown) displays the moving image manual (FIG. The same image is displayed. Thus, the viewer can view the displayed moving image manual.

以上のように、本実施形態の動画マニュアル作成システム１では、判別部１４が、視線検出部３によって取得された視線情報に基づいて、撮影者の注視の有無を判別する（Ｓ３〜Ｓ５）。そして、記憶対象決定部１６が、判別部１４の判別結果に基づいて、撮像部２によって取得された視界画像の少なくとも一部を記憶対象として決定する（Ｓ６〜Ｓ９）。決定された記憶対象は、動画マニュアルのデータとして記憶部１２に記憶される（Ｓ１３）。 As described above, in the moving image manual creation system 1 of the present embodiment, the determination unit 14 determines the presence or absence of the photographer's gaze based on the line-of-sight information acquired by the line-of-sight detection unit 3 (S3 to S5). Then, the storage target determination unit 16 determines at least a part of the view image acquired by the imaging unit 2 as a storage target based on the determination result of the determination unit 14 (S6 to S9). The determined storage target is stored in the storage unit 12 as data of a moving image manual (S13).

記憶対象決定部１６は、判別部１４の判別結果（撮影者の注視の有無）に基づいて、記憶対象を決定するため、撮影者が注視ありの場合と注視なしの場合とで、つまり、撮影者の視線情報に応じて、記憶対象を異ならせることができる。これにより、視線情報に応じて異なる記憶対象を動画マニュアルのデータとして記憶部１２に記憶させて、撮影者の視線による意図を的確に反映した動画マニュアルを作成することができる。したがって、動画マニュアルの再生時には、撮影者の視線による意図を直観的でわかりやすく視聴者に伝えることができる。 The storage target determination unit 16 determines the storage target based on the determination result of the determination unit 14 (presence or absence of the photographer's gaze). The storage target can be made different according to the line-of-sight information of the person. As a result, it is possible to store a different storage target according to the line-of-sight information in the storage unit 12 as data of the moving image manual, and create a moving image manual that accurately reflects the intention of the photographer. Therefore, at the time of reproduction of the moving image manual, it is possible to convey the intention of the photographer's line of sight to the viewer intuitively and easily.

また、撮影者の視界画像および視線情報を用いて記憶対象を決定するため、例えば撮影者が操作部１１を操作してデータを手動入力する作業や工程は一切不要である。したがって、撮影者の視界画像および視線情報を用いて、動画マニュアルを簡単な方法で自動作成することができる。 Further, since the storage target is determined using the visual field image and the line-of-sight information of the photographer, for example, the operation and process of manually inputting data by the photographer operating the operation unit 11 are unnecessary at all. Therefore, it is possible to automatically create a moving image manual by a simple method using the photographer's view image and gaze information.

また、判別部１４は、上述した視線解析部１４ａと、注視判断部１４ｂとを含んで構成されるため、注視判断部１４ｂは、視線解析部１４ａによって求めた撮影者の視線の位置と、上記視線の位置の滞留時間とに基づいて、撮影者の注視状態（注視の有無）を的確に判断することができる。また、撮影者が作業時に集中して物体（被写体）を見ると、注視度が高くなるが、この注視度を、撮影者の視線の位置および滞留時間に基づいて的確に判断することも可能となる。 Further, since the determination unit 14 includes the above-described line-of-sight analysis unit 14a and the gaze determination unit 14b, the gaze determination unit 14b determines the position of the line of sight of the photographer determined by the line-of-sight analysis unit 14a; The gaze condition (presence or absence of gaze) of the photographer can be accurately determined based on the staying time of the position of the line of sight. In addition, when the photographer concentrates on work and looks at an object (subject), the degree of fixation increases, but it is also possible to accurately determine the degree of fixation based on the position of the line of sight of the photographer and the staying time. Become.

また、注視判断部１４ｂは、視線解析部１４ａによって求めた撮影者の視線の位置に基づいて、視線のブレを示すドリフト量を検出し、ドリフト量が閾値以下で、かつ、滞留時間が所定時間以上である場合に、撮影者の注視ありと判断する（Ｓ４、Ｓ５）。ドリフト量が閾値以下で、滞留時間が所定時間以上である場合、撮影者の注視度が高いと判断できるため、この場合に撮影者の注視ありと判断することで、判断の精度を向上させることができる。 In addition, the gaze determination unit 14b detects a drift amount indicating a blur of the gaze based on the position of the gaze of the photographer determined by the gaze analysis unit 14a, and the drift amount is equal to or less than the threshold and the residence time is predetermined time If it is above, it is determined that the photographer is watching (S4, S5). If the amount of drift is equal to or less than the threshold and the residence time is equal to or more than the predetermined time, it can be determined that the photographer's gaze degree is high. In this case, the determination accuracy is improved by determining that the photographer gazes. Can.

また、記憶対象決定部１６は、判別部１４によって撮影者の注視があると判別された場合に、視界画像の中で撮影者によって注視されている被写体が存在する被写体領域を特定して（Ｓ６）、視界画像から被写体領域を抽出し（Ｓ７）、抽出した被写体領域を記憶対象として決定する（Ｓ８）。撮影者が意図する被写体領域のみを抽出して記憶対象として決定することにより、記憶対象の再生時に、必要な領域（被写体領域）の情報のみを視聴者に提示することができる（再生時に不要な情報は視聴者に提示されないようにすることができる）。これにより、視聴者は提示される情報を見て、迷うことなく所定の作業に取り掛かることができる。 Further, when it is determined by the determination unit 14 that the photographer gazes, the storage target determination unit 16 specifies a subject region in which the subject being gazed by the photographer is present in the view image (S6) ) Extract the subject area from the view image (S7), and determine the extracted subject area as a storage target (S8). By extracting only the subject region intended by the photographer and determining it as the storage target, it is possible to present only the information of the necessary region (subject region) to the viewer at the time of playback of the storage target (not necessary at the time of playback Information may not be presented to the viewer). In this way, the viewer can see the presented information and start working on it without getting lost.

また、記憶対象決定部１６は、抽出した被写体領域を拡大し、拡大後の画像を記憶対象として決定する（Ｓ８）。この場合、記憶対象の再生時に、必要な領域（被写体領域）が拡大表示されるため、視聴者は表示される画像から、被写体の細部を確認することが可能かつ容易となる。 Further, the storage target determination unit 16 enlarges the extracted subject area and determines the image after enlargement as a storage target (S8). In this case, since the necessary area (subject area) is enlarged and displayed at the time of reproduction of the storage target, it is possible and easy for the viewer to confirm details of the subject from the displayed image.

また、記憶対象決定部１６は、判別部１４によって撮影者の注視がないと判別された場合に、撮像部２によって取得された視界画像を記憶対象として決定する（Ｓ９）。撮影者の注視がない場合、視界画像全体が記憶対象として決定されるため、記憶対象の再生時に、視聴者は、撮影者の視界全体を俯瞰して（広い視野で把握して）、大まかな情報（例えば作業内容）を把握することができる。 Further, when it is determined by the determination unit 14 that the photographer does not gaze, the storage target determination unit 16 determines a view image acquired by the imaging unit 2 as a storage target (S9). When the photographer does not gaze, the entire view image is determined as the storage target. Therefore, at the time of reproduction of the storage target, the viewer looks over the entire view of the photographer (understands in a wide view) and Information (for example, work content) can be grasped.

また、音声認識部１５は、マイク４によって入力された撮影者の音声を認識してテキストデータに変換し（Ｓ１０）、記憶対象決定部１６は、上記のテキストデータを、視界画像の少なくとも一部に合成して、記憶対象を決定する（Ｓ１２）。このように、音声認識によるテキストデータを記憶対象に付加するため、記憶対象の再生時には、視聴者は表示される画像を見ながら、撮影者の指示や助言を視覚情報（テキストデータ）として明確に認識することができる。また、音声認識部１５を備えているため、本実施形態のように、視界の撮影と同時に音声入力を行う構成が可能となり、テキストデータを別途手動で入力して画像と合成するような編集の手間を省くことも可能となる。 Further, the voice recognition unit 15 recognizes the voice of the photographer input by the microphone 4 and converts it into text data (S10), and the storage target determination unit 16 converts the above text data into at least a part of the view image. To determine the storage target (S12). As described above, in order to add text data by voice recognition to a storage target, at the time of playback of the storage target, the viewer clearly views the instruction and advice of the photographer as visual information (text data) while viewing the displayed image. It can be recognized. In addition, since the voice recognition unit 15 is provided, as in the present embodiment, a configuration in which voice input is performed at the same time as shooting of a field of view becomes possible. It is also possible to save time and effort.

特に、記憶対象決定部１６は、音声認識によって取得されたテキストデータが、記憶対象において被写体が存在する領域の外側に位置するように、上記テキストデータを視界画像の少なくとも一部に合成する（Ｓ１１、Ｓ１２）。合成後の記憶対象において、被写体が存在する重要な領域にテキストデータが重ならないため、記憶対象の再生時に、視聴者は、重要な被写体およびテキストデータの両方を明確に認識することが可能となる。 In particular, the storage target determination unit 16 combines the text data with at least a part of the view image so that the text data acquired by speech recognition is located outside the area where the subject is present in the storage target (S11) , S12). In the storage target after composition, since the text data does not overlap in the important area where the subject exists, the viewer can clearly recognize both the important subject and the text data when reproducing the storage target. .

〔被写体領域の抽出のバリエーション（１）〕
上述したＳ７の工程では、記憶対象決定部１６が、判別部１４によって取得される情報に応じて、視界画像から被写体領域を抽出する際の抽出範囲を変化させてもよい。以下、このような抽出のバリエーションについて説明する。 [Variation of Extraction of Subject Region (1)]
In the process of S7 described above, the storage target determining unit 16 may change the extraction range when extracting the subject region from the view image according to the information acquired by the determination unit 14. Hereinafter, variations of such extraction will be described.

図８は、Ｓ３で取得される視界画像、視線情報および音声情報の他の例を示している。記憶対象決定部１６は、視線解析部１４ａによって取得される視線の滞留時間に応じて、上記抽出範囲を変化させてもよい。例えば、記憶対象決定部１６は、上記滞留時間が長くなると、上記抽出範囲を狭めて、被写体領域の一部を抽出してもよい。また、記憶対象決定部１６は、視界画像から抽出した被写体領域の一部を拡大し、拡大後の画像を記憶対象として決定してもよい。この場合、記憶対象の再生時には、上記拡大後の画像が表示されることになる。 FIG. 8 shows another example of the view image, the line-of-sight information and the audio information acquired in S3. The storage target determining unit 16 may change the extraction range according to the staying time of the line of sight acquired by the line-of-sight analysis unit 14a. For example, the storage target determination unit 16 may narrow the extraction range and extract a part of the subject region when the residence time becomes long. In addition, the storage target determination unit 16 may enlarge a part of the subject region extracted from the view image and determine the image after the enlargement as a storage target. In this case, at the time of reproduction of the storage target, the enlarged image is displayed.

図９は、視線の滞留時間と、記憶対象の再生時における表示の拡大率との関係を模式的に示している。ここでは、視線のブレを示すドリフト量が小さい状態（視線の変化の前後での角度差が例えば０．５°以内の状態）を滞留時間の開始（基準、０秒）としている。図３で示したフローに基づき、まず、ドリフト量小の状態が２秒間続くと、注視と判定されるが（Ｓ４、Ｓ５）、それまでは、注視なしの判定であるため、記憶対象として決定される画像は視界画像であり（Ｓ９）、記憶対象の再生時には、視界画像が表示される。 FIG. 9 schematically shows the relationship between the dwell time of the line of sight and the enlargement factor of the display at the time of reproduction of the storage target. Here, a state in which the amount of drift indicating a blur of the line of sight is small (a state in which an angle difference before and after a change in the line of sight is, for example, 0.5 °) is set as the start of the residence time (0 second). Based on the flow shown in FIG. 3, first, if the state with a small drift amount continues for 2 seconds, it is determined to be a gaze (S4, S5), but it is determined to be a storage target since it is a determination without gaze. The image to be displayed is a view image (S9), and at the time of reproduction of the storage target, the view image is displayed.

注視ありの判定後、次に示す抽出範囲の縮小開始までは（ドリフト量小の開始から２秒後〜５秒後の間は）、視界画像の中で被写体領域が特定されて、被写体領域の全体が抽出される（Ｓ６、Ｓ７）。このため、記憶対象の再生時には、上記被写体領域が表示される。図１０の上段は、注視判定から抽出範囲縮小開始までの間で、図８の視界画像から抽出された被写体領域Ｒ１の画像（被写体画像）を示している。 After it is determined that there is a gaze, the subject area is identified in the view image until the start of reduction of the extraction range shown next (from 2 seconds to 5 seconds after the start of the small drift amount). The whole is extracted (S6, S7). Therefore, the subject region is displayed at the time of reproduction of the storage target. The upper part of FIG. 10 shows an image (subject image) of the subject region R1 extracted from the view image of FIG. 8 between the gaze determination and the extraction range reduction start.

注視ありの判定後、ドリフト量小の状態が３秒間続くと（ドリフト量小の開始から５秒経過後）、記憶対象決定部１６は、抽出範囲の縮小を開始させるととともに、上記抽出範囲内で被写体領域の一部を抽出し（Ｓ７）、抽出した被写体領域の一部を拡大して記憶対象として決定する（Ｓ８）。図１０の下段は、抽出範囲の縮小開始後に、図８の視界画像から、被写体領域Ｒ１の一部の領域Ｒ２を抽出して拡大した画像（被写体画像）を示している。なお、領域Ｒ２は、被写体領域Ｒ１の一部の領域であるため、被写体領域Ｒ１よりも抽出範囲が小さい領域であることは明らかである。 If it is determined that the state of small drift continues for 3 seconds after the presence of gaze (after 5 seconds from the start of small drift amount), the storage target determining unit 16 starts the reduction of the extraction range and also falls within the extraction range. Then, a part of the subject area is extracted (S7), and a part of the extracted subject area is enlarged and determined as a storage target (S8). The lower part of FIG. 10 shows an image (subject image) obtained by extracting and enlarging a region R2 of a part of the subject region R1 from the view image of FIG. 8 after reduction of the extraction range is started. Since the region R2 is a partial region of the subject region R1, it is apparent that the extraction range is smaller than the subject region R1.

抽出領域の縮小に応じて、抽出された被写体領域は拡大され、拡大開始後３秒で（ドリフト量小の開始から８秒で）、目的の枠サイズまで拡大される。なお、目的の枠サイズとは、視線位置の標準偏差（ばらつき）をσとしたときに、ドリフト量小の状態が開始されてから、拡大開始の間に変動する視線位置の３σが包括される状態である。記憶対象の再生時には、所定の拡大率で被写体領域の一部（拡大画像）が表示される。なお、図９の縦軸の拡大率Ｌは、視線位置の３σが包括される状態に対応している。また、記憶対象の再生時の表示の中心は、同期間（拡大開始から拡大終了までの期間）での視線位置の平均である。 According to the reduction of the extraction area, the extracted subject area is expanded and expanded to the target frame size in 3 seconds after the start of expansion (in 8 seconds from the start of the small drift amount). When the standard deviation (variation) of the line-of-sight position is σ, the target frame size includes 3σ of the line-of-sight position that fluctuates between the start of enlargement after the state of small drift amount is started. It is a state. At the time of reproduction of the storage target, a part (enlarged image) of the subject area is displayed at a predetermined enlargement factor. Note that the enlargement ratio L on the vertical axis in FIG. 9 corresponds to the state in which 3σ of the gaze position is included. Further, the center of the display at the time of reproduction of the storage target is the average of the sight line positions during the synchronization (period from the start of enlargement to the end of enlargement).

なお、上記の角度差、注視判定のタイミング（ドリフト量小の開始から２秒後）、表示拡大のタイミング（注視判定から３秒後）、拡大の速度、目的の枠サイズ等は一例であり、状況に応じて適宜変更可能である。 The above angle difference, timing of gaze determination (2 seconds after the start of small drift amount), timing of display enlargement (3 seconds after gaze determination), speed of enlargement, target frame size, etc. are examples. It can be changed appropriately according to the situation.

例えば、基板のはんだ付けのような細かい作業では、撮影者の視線の滞留時間（注視時間）も自然と長くなり、記憶対象の再生時には、細部まで視聴者に見えたほうがよいことが多い。上記のように、記憶対象決定部１６が、視線解析部１４ａによって取得される滞留時間に応じて、視界画像から被写体領域を抽出する際の抽出範囲を変化させることにより、上述のように、滞留時間に応じて抽出範囲を狭くするとともに、抽出した被写体領域の一部を拡大させた画像を記憶対象とすることができる。これにより、記憶対象の再生時に、視聴者に被写体の細部を大きく表示して細かい情報を伝えることが可能となる。 For example, in a detailed operation such as soldering of a substrate, the dwell time (gaze time) of the photographer's gaze naturally becomes long, and it is often better for the viewer to see the details at the time of reproduction of the memory object. As described above, as described above, the storage target determination unit 16 changes the extraction range when extracting the subject region from the view image according to the residence time acquired by the sight line analysis unit 14a. While narrowing the extraction range according to time, it is possible to store an image obtained by enlarging a part of the extracted subject region. As a result, when reproducing the storage target, it is possible to display the detail of the subject largely to the viewer and transmit detailed information.

また、例えば注視の初期から（視線位置の滞留時間が短いうちから）抽出範囲を大きく狭めて、視界画像の一部の領域を抽出し、抽出した領域を拡大して記憶対象として記憶させると、記憶対象の再生の初期に、視聴者は被写体および作業の全体を把握しにくく、画像を見てどの作業であるか、あるいは被写体のどの部分であるかを把握することが困難になるおそれがある。本実施形態のように抽出範囲の縮小を段階的に行い、抽出された領域の拡大を段階的に行って記憶対象とすることにより、記憶対象の再生時には表示倍率（拡大倍率）が徐々に上がることになり、再生の初期において、視聴者は被写体および作業の全体を把握することが容易となるため、上記の不都合を回避することができる。 In addition, for example, when the extraction range is greatly narrowed from the initial stage of gaze (while the dwell time of the gaze position is short), a partial area of the view image is extracted, and the extracted area is enlarged and stored as a storage target In the initial stage of reproduction of the storage target, the viewer may have difficulty grasping the subject and the entire work, and it may become difficult to grasp which work or which part of the subject is viewed by viewing the image. . As in the present embodiment, the extraction range is reduced stepwise and the extracted area is expanded stepwise to be the storage target, so that the display magnification (magnification magnification) gradually increases when the storage target is reproduced. In the initial stage of reproduction, the viewer can easily grasp the subject and the entire work, so that the above-mentioned inconvenience can be avoided.

〔被写体領域の抽出のバリエーション（２）〕
図１１は、Ｓ３で取得される視界画像、視線情報および音声情報のさらに他の例を示している。上述したＳ７の工程では、記憶対象決定部１６が、判別部１４によって取得される視線の複数位置の重心の変化に応じて、視界画像から被写体領域を抽出する際の抽出範囲を変化させてもよい。なお、上記重心については、判別部１４の注視判断部１４ｂによって求めることができるが、その求め方は上述した通りである。 [Variation of Extraction of Subject Region (2)]
FIG. 11 illustrates still another example of the view image, the line-of-sight information, and the audio information acquired in S3. In the process of S7 described above, even if the storage target determining unit 16 changes the extraction range when extracting the subject region from the view image according to the change of the gravity center of the plurality of positions of the line of sight acquired by the determination unit 14 Good. The center of gravity can be determined by the gaze determination unit 14 b of the determination unit 14, but the method of determination is as described above.

図３で示したフローに基づき、撮影者が図１１に示す視界画像の中で、ある被写体を注視していると判断された場合（Ｓ４、Ｓ５）、記憶対象決定部１６は、視界画像の中で上記被写体が存在する被写体領域Ｒ１１を特定し（Ｓ６）、視界画像から被写体領域Ｒ１１を抽出し（Ｓ７）、抽出した被写体領域Ｒ１１を記憶対象として決定する（Ｓ８）。図１２の上段は、このときの記憶対象の画像、つまり、図１１の視界画像から抽出された被写体領域Ｒ１１の画像（被写体画像）を示している。 If it is determined based on the flow shown in FIG. 3 that the photographer is gazing at a certain subject in the view image shown in FIG. 11 (S4, S5), the storage target determination unit 16 The subject region R11 in which the subject is present is specified (S6), the subject region R11 is extracted from the view image (S7), and the extracted subject region R11 is determined as a storage target (S8). The upper part of FIG. 12 shows an image to be stored at this time, that is, an image (subject image) of the subject region R11 extracted from the view image of FIG.

次に、被写体に対してシールを左側から貼り合わせる作業に伴い、撮影者が被写体の左側を注視すると、視線解析部１４ａで検出される視線の複数位置（例えば直近の１０点の視線位置）の重心が移動する。注視判断部１４ｂにより、視線の複数位置の重心の変化が認識されると、記憶対象決定部１６は、視界画像から、被写体領域Ｒ１１の中で撮影者の視線の重心が位置している領域、つまり、撮影者が注視している左側の領域Ｒ１２を抽出し（Ｓ７）、これを拡大して記憶対象とする（Ｓ８）。図１２の中段は、このときの記憶対象の画像、つまり、図１１の視界画像から抽出された領域Ｒ１２の画像（拡大画像）を示している。 Next, when the photographer gazes at the left side of the subject along with the work of attaching the seal to the subject from the left side, the eye gaze detection unit 14a detects a plurality of eye positions detected by the eye line analysis unit 14a The center of gravity moves. When the gaze determination unit 14b recognizes changes in the center of gravity of the multiple positions of the line of sight, the storage target determination unit 16 determines from the view image an area where the center of gravity of the line of sight of the photographer is located in the subject area R11. That is, the area R12 on the left side where the photographer gazes at is extracted (S7), and this is enlarged and set as a storage target (S8). The middle part of FIG. 12 shows an image to be stored at this time, that is, an image (enlarged image) of the region R12 extracted from the view image of FIG.

さらに、被写体に対してシールの右側を貼り合わせるべく、被写体の右側を注視すると、視線解析部１４ａで検出される視線の複数位置（例えば直近の１０点の視線位置）の重心も右側に移動する。注視判断部１４ｂにより、視線の複数位置の重心の変化が認識されると、記憶対象決定部１６は、視界画像から、被写体領域Ｒ１１の中で撮影者の視線の重心が位置している領域、つまり、撮影者が注視している右側の領域Ｒ１３を抽出し（Ｓ７）、これを拡大して記憶対象とする（Ｓ８）。図１２の下段は、このときの記憶対象の画像、つまり、図１１の視界画像から抽出された領域Ｒ１３の画像（拡大画像）を示している。 Furthermore, when sticking the right side of the sticker to the subject, if you gaze at the right side of the subject, the centers of gravity of multiple positions of the line of sight detected by the line of sight analysis unit 14a . When the gaze determination unit 14b recognizes changes in the center of gravity of the multiple positions of the line of sight, the storage target determination unit 16 determines from the view image an area where the center of gravity of the line of sight of the photographer is located in the subject area R11. That is, the area R13 on the right side where the photographer gazes at is extracted (S7), and this is enlarged to be a storage target (S8). The lower part of FIG. 12 shows an image to be stored at this time, that is, an image (enlarged image) of the region R13 extracted from the view image of FIG.

シール貼りの工程で厳密な位置合わせが必要となる場合、同一被写体の中で注視位置が変化することはよくある。上記のように、記憶対象決定部１６が、判別部１４によって取得される視線の複数位置の重心の変化に応じて、抽出範囲を変化させることにより、撮影者の注視位置が同一被写体内で移動する場合でも、それに追随して抽出範囲を変化させて注視位置の画像を抽出し、抽出した画像を拡大して記憶対象を得ることができる。これにより、記憶対象の再生時に、重要な位置を拡大した画像を視聴者に提供（表示）して、重要な情報を確実に視聴者に伝えることができる。また、再生時に、撮影者の視線の重心位置を表示画像の中心に持ってくることができるため、撮影者の意図を的確に反映した画像を表示させて、視聴者に情報を伝えることができる。 When strict alignment is required in the sealing process, the gaze position often changes in the same object. As described above, the storage target determination unit 16 changes the extraction range according to the change in the gravity center of the plurality of positions of the line of sight acquired by the determination unit 14 so that the gaze position of the photographer moves within the same subject. Even in this case, it is possible to change the extraction range to extract the image of the gaze position and enlarge the extracted image to obtain the storage target. Thus, at the time of reproduction of the storage target, it is possible to provide (display) the image in which the important position is enlarged to the viewer, and to transmit the important information to the viewer with certainty. In addition, since the barycentric position of the line of sight of the photographer can be brought to the center of the display image at the time of reproduction, an image can be displayed which accurately reflects the intention of the photographer, and information can be transmitted to the viewer. .

以上、記憶対象決定部１６が、判別部１４によって取得される情報（滞留時間、視線の複数位置の重心の変化）に応じて、視界画像から被写体領域を抽出する際の抽出範囲を変化させることにより、視界画像の中から、撮影者の意図を的確に反映した重要な領域だけを抽出して拡大し、再生時に視聴者に提供することができる。これにより、視聴者は、表示画像において、撮影者の意図する重要な領域を直ちに認識し、把握するとともに、表示画像に基づく作業の内容も適切に把握することが可能となる。 As described above, changing the extraction range at the time of extracting the subject region from the view image according to the information (the residence time, the change of the gravity center of the plurality of positions of the gaze) acquired by the determination unit 14 By this, it is possible to extract and enlarge only the important area that accurately reflects the photographer's intention from the view image, and provide it to the viewer at the time of reproduction. As a result, the viewer can immediately recognize and grasp the important area intended by the photographer in the display image, and can appropriately grasp the content of the work based on the display image.

なお、本実施形態では、撮像部２および視線検出部３が眼鏡５に搭載される例について説明したが、例えば視線検出部３を撮像部２に組み込んで構成することも可能である。この場合、撮影者が撮像部２を保持しながら視界画像を撮影することになるが、眼鏡５を用いずに動画マニュアル作成システム１を構成でき、部品点数を削減できる点で有効である。 In the present embodiment, an example in which the imaging unit 2 and the sight line detection unit 3 are mounted on the glasses 5 has been described. However, for example, the sight line detection unit 3 may be incorporated in the imaging unit 2. In this case, the photographer captures a view image while holding the imaging unit 2. However, this is effective in that the moving picture manual creation system 1 can be configured without using the glasses 5, and the number of parts can be reduced.

以上、本発明の実施形態について説明したが、本発明の範囲はこれに限定されるものではなく、発明の主旨を逸脱しない範囲で拡張または変更して実施することができる。 As mentioned above, although embodiment of this invention was described, the scope of the present invention is not limited to this, and it can extend or change and carry out within the range which does not deviate from the main point of invention.

本発明は、例えば動画マニュアルを自動で作成するシステムに利用可能である。 The present invention can be used, for example, in a system for automatically creating a video manual.

１動画マニュアル作成システム
２撮像部
３視線検出部
４マイク（音声入力部）
１２記憶部
１４判別部
１４ａ視線解析部
１４ｂ注視判断部
１５音声認識部
１６記憶対象決定部 1 movie manual creation system 2 imaging unit 3 gaze detection unit 4 microphone (audio input unit)
12 storage unit 14 determination unit 14a gaze analysis unit 14b gaze determination unit 15 speech recognition unit 16 storage target determination unit

Claims

An imaging unit that captures a field of view of a photographer as a moving image and acquires a field of view image;
A gaze detection unit that detects gaze information of the photographer;
A determination unit that determines the presence or absence of the gaze of the photographer based on the line-of-sight information;
A storage target determining unit configured to determine at least a part of the view image as a storage target based on the determination result of the determination unit;
And a storage unit configured to store the storage target determined by the storage target determination unit as data of a moving image manual.

The determination unit
A line-of-sight analysis unit which obtains the position of the line-of-sight of the photographer in the view image and the dwell time of the position of the line-of-sight based on the line-of-sight information;
The movie manual creation system according to claim 1, further comprising: a gaze determination unit that determines the gaze condition of the photographer based on the position of the gaze and the staying time.

The gaze determination unit detects a drift amount indicating a blur of the sight line based on the position of the sight line, and when the drift amount is equal to or less than a threshold and the staying time is equal to or more than a predetermined time The moving image manual creation system according to claim 2, wherein it is determined that the person gazes.

The storage target determination unit specifies a subject region in which the subject being gazed by the photographer is present in the view image when the determination unit determines that the photographer is watching. 4. The moving picture manual creation system according to claim 2, wherein the subject area is extracted from the view image, and the extracted subject area is determined as the storage target.

5. The moving picture manual creation system according to claim 4, wherein the storage target determining unit enlarges the extracted subject area and determines the image after enlargement as the storage target.

5. The moving image manual creation method according to claim 4, wherein the storage target determination unit changes an extraction range when extracting the subject region from the view image according to the information acquired by the determination unit. system.

The moving image manual creation system according to claim 6, wherein the storage target determination unit changes the extraction range in accordance with the residence time acquired by the determination unit.

The moving image manual creation system according to claim 6, wherein the storage target determination unit changes the extraction range in accordance with a change in the center of gravity of the plurality of positions of the line of sight acquired by the determination unit.

The storage target determination unit according to any one of claims 4 to 8, wherein the storage target determination unit determines the view image as the storage target when the determination unit determines that the photographer does not gaze. Video manual creation system.

A voice input unit to which the voice of the photographer is input;
And a voice recognition unit that recognizes voice input by the voice input unit and converts the voice into text data.
The moving picture manual creation system according to any one of claims 1 to 9, wherein the storage target determination unit combines the text data with at least a part of the view image to determine the storage target. .

10. The image processing apparatus according to claim 10, wherein the storage target determination unit combines the text data with at least a part of the view image such that the text data is located outside an area where a subject is present in the storage target. Video manual creation system described in.