JP2021083015A

JP2021083015A - Reproduction control device and reproduction control program

Info

Publication number: JP2021083015A
Application number: JP2019210882A
Authority: JP
Inventors: 麻樹杉本; Maki Sugimoto; 哲林田; Tetsu Hayashida; 啓太郎吉田; Keitaro Yoshida
Original assignee: Keio University
Current assignee: Keio University
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-05-27
Anticipated expiration: 2039-11-21
Also published as: JP7442300B2

Abstract

To more properly support browsing of a user with control related to reproduction.SOLUTION: A reproduction control device 20 comprises an image data acquisition unit 211, a region detection unit 212, a feature amount extraction unit 214, a scene detection unit 216 and a reproduction control unit 217. The image data acquisition unit 211 acquires a plurality of pieces of image data that are chronically continuous. The region detection unit 212 detects a target region including a prescribed target from each image of the plurality of pieces of image data. The feature amount extraction unit 214 extracts the feature amount from each of the plurality of pieces of image data on the basis of the change in the images between the plurality of pieces of image data and whether or not the changed region is the target region. The scene detection unit 216 detects the image data corresponding to a prescribed scene by inputting the feature amount to a learning model. The reproduction control unit 217 controls reproduction of the plurality of pieces of image data on the basis of information indicating the image data corresponding to the prescribed scene detected by the scene detection unit 216.SELECTED DRAWING: Figure 3

Description

本発明は、再生制御装置及び再生制御プログラムに関する。 The present invention relates to a reproduction control device and a reproduction control program.

従来、動画を閲覧するユーザの操作に応じて、再生を制御する技術が知られている。
例えば、特許文献１には、ユーザのタッチパネルに対する接触操作の継続時間や、接触操作の押圧力に基づいて、動画の再生速度を段階的に変化させることが開示されている。これにより、ユーザは、早送りボタンやシークバー等の一般的なユーザインタフェースを操作する場合に比べて、より直感的に再生の制御をすることができる。 Conventionally, there is known a technique for controlling playback according to an operation of a user who browses a moving image.
For example, Patent Document 1 discloses that the reproduction speed of a moving image is changed stepwise based on the duration of a contact operation with respect to a user's touch panel and the pressing force of the contact operation. As a result, the user can control the playback more intuitively than when operating a general user interface such as a fast-forward button or a seek bar.

特許第６４８３３０５号公報Japanese Patent No. 6483305

しかしながら、上述したようなユーザの操作内容に応じて再生の制御をする方法では、ユーザが再生制御のための様々な操作を行う必要があり、ユーザにとって煩雑である。また、例えば、はじめて閲覧する動画等では、ユーザは所定の場面（例えば、動画の閲覧の目的となる場面）が、動画のどの箇所に含まれているかを特定することが容易ではない。 However, in the method of controlling the reproduction according to the operation content of the user as described above, it is necessary for the user to perform various operations for the reproduction control, which is complicated for the user. Further, for example, in a moving image for the first time to be viewed, it is not easy for the user to specify in which part of the moving image a predetermined scene (for example, a scene for viewing the moving image) is included.

本発明は、このような状況に鑑みてなされたものである。そして、本発明の課題は、再生に関する制御によって、より適切にユーザの閲覧を支援することである。 The present invention has been made in view of such a situation. Then, an object of the present invention is to support the user's browsing more appropriately by controlling the reproduction.

上記課題を解決するため、本発明の一実施形態に係る再生制御装置は、
時間的に連続した複数の画像データを取得する画像データ取得手段と、
前記複数の画像データそれぞれの画像内から所定の対象を含んだ対象領域を検出する領域検出手段と、
前記複数の画像データ間の画像の変化と、当該変化している領域が前記対象領域であるか否かと、に基づいて前記複数の画像データそれぞれから特徴量を抽出する特徴量抽出手段と、
前記特徴量を学習モデルに入力することにより、所定の場面に対応する画像データを検出する場面検出手段と、
前記場面検出手段が検出した前記所定の場面に対応する画像データを示す情報に基づいて、前記複数の画像データの再生を制御する再生制御手段と、
を備えることを特徴とする。 In order to solve the above problems, the reproduction control device according to the embodiment of the present invention is
An image data acquisition means for acquiring a plurality of image data that are continuous in time, and
An area detecting means for detecting a target area including a predetermined target from the image of each of the plurality of image data, and an area detecting means.
A feature amount extraction means for extracting a feature amount from each of the plurality of image data based on a change in an image between the plurality of image data and whether or not the changing region is the target area.
A scene detection means for detecting image data corresponding to a predetermined scene by inputting the feature amount into the learning model, and
A reproduction control means for controlling the reproduction of the plurality of image data based on the information indicating the image data corresponding to the predetermined scene detected by the scene detection means.
It is characterized by having.

本発明によれば、再生に関する制御によって、より適切にユーザの閲覧を支援することができる。 According to the present invention, it is possible to more appropriately support the user's browsing by controlling the reproduction.

本発明の一実施形態に係る再生制御システムの全体構成の一例を示すブロック図である。It is a block diagram which shows an example of the whole structure of the reproduction control system which concerns on one Embodiment of this invention. 本発明の一実施形態に係るウェアラブルカメラの構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the wearable camera which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置による処理での対象領域と注視点の検出について説明する模式図である。It is a schematic diagram explaining the detection of the target area and the gaze point in the process by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置による処理での注視点の移動距離について説明する模式図である。It is a schematic diagram explaining the moving distance of the gaze point in the process by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置による処理での背景の移動量について説明する模式図である。It is a schematic diagram explaining the movement amount of the background in the processing by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置による処理での動作部位の移動量について説明する模式図である。It is a schematic diagram explaining the movement amount of the moving part in the processing by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置による処理での再生時のユーザインタフェースの一例について示す模式図である。It is a schematic diagram which shows an example of the user interface at the time of reproduction in the processing by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係るウェアラブルカメラが実行する撮影処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the photographing process executed by the wearable camera which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置が実行する学習処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the learning process executed by the reproduction control apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る再生制御装置が実行する再生制御処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the reproduction control processing executed by the reproduction control apparatus which concerns on one Embodiment of this invention.

以下、添付の図面を参照して本発明の実施形態の一例について説明する。 Hereinafter, an example of the embodiment of the present invention will be described with reference to the accompanying drawings.

［システム構成］
図１は、本実施形態に係る再生制御システムＳの全体構成を示すブロック図である。図１に示すように、再生制御システムＳは、ウェアラブルカメラ１０と、再生制御装置２０とを含む。また、図１には、ウェアラブルカメラ１０を装着するユーザＵも図示する。 [System configuration]
FIG. 1 is a block diagram showing an overall configuration of the reproduction control system S according to the present embodiment. As shown in FIG. 1, the reproduction control system S includes a wearable camera 10 and a reproduction control device 20. Further, FIG. 1 also shows a user U who wears the wearable camera 10.

これらウェアラブルカメラ１０と再生制御装置２０とは、相互に通信可能に接続される。この各装置の間での通信は、任意の通信方式に準拠して行われてよく、その通信方式は特に限定されない。また、通信接続は、有線接続であっても、無線接続であってもよい。更に、各装置の間での通信は、直接行われてもよいし、中継装置を含んだネットワークを介して行われてもよい。この場合、ネットワークは、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、インターネットや、携帯電話網といったネットワーク、或いはこれらを組み合わせたネットワークにより実現される。 The wearable camera 10 and the reproduction control device 20 are connected to each other so as to be able to communicate with each other. Communication between the devices may be performed in accordance with an arbitrary communication method, and the communication method is not particularly limited. Further, the communication connection may be a wired connection or a wireless connection. Further, communication between the devices may be performed directly or via a network including a relay device. In this case, the network is realized by, for example, a LAN (Local Area Network), a network such as the Internet or a mobile phone network, or a network in which these are combined.

ウェアラブルカメラ１０は、ユーザＵの視野に相当する空間を撮影した画像（以下、「視野画像」と称する。）を撮影する機能を備えたデバイスである。ウェアラブルカメラ１０は、例えば、眼鏡型のウェアラブルデバイスにより実現される。 The wearable camera 10 is a device having a function of capturing an image (hereinafter, referred to as a “field-of-view image”) of a space corresponding to the field of view of the user U. The wearable camera 10 is realized by, for example, a glasses-type wearable device.

また、ウェアラブルカメラ１０は、視野画像の撮影と同時に、ユーザＵが視野のなかで注視している箇所である注視点の計測も行う。更に、ウェアラブルカメラ１０は、撮影した視野画像と、計測したユーザＵの注視点の情報（例えば、注視点の位置に対応する二次元座標の座標値）とを含む画像データを生成する。更に、ウェアラブルカメラ１０は、このような画像データの生成のための処理を繰り返すことにより、ユーザＵの視野画像と注視点の変化を示す、時間的に連続した複数の画像データからなる動画を生成する。そして、ウェアラブルカメラ１０は、この複数の画像データからなる動画を再生制御装置２０に対して送信する。 Further, the wearable camera 10 also measures the gazing point, which is a portion of the visual field that the user U is gazing at, at the same time as capturing the visual field image. Further, the wearable camera 10 generates image data including the captured visual field image and the measured information on the gazing point of the user U (for example, coordinate values of two-dimensional coordinates corresponding to the position of the gazing point). Further, the wearable camera 10 repeats the process for generating such image data to generate a moving image composed of a plurality of temporally continuous image data showing a change in the visual field image and the gazing point of the user U. To do. Then, the wearable camera 10 transmits the moving image composed of the plurality of image data to the reproduction control device 20.

再生制御装置２０は、ウェアラブルカメラ１０から受信した動画の再生を制御する装置である。再生制御装置２０は、例えば、パーソナルコンピュータやサーバ装置により実現される。
具体的な処理の内容として、再生制御装置２０は、ウェアラブルカメラ１０から、時間的に連続した複数の画像データを取得する。また、再生制御装置２０は、この複数の画像データそれぞれの画像内から所定の対象を含んだ対象領域を検出する。更に、再生制御装置２０は、この複数の画像データ間の画像の変化と、当該変化している領域が対象領域であるか否かと、に基づいてこの複数の画像データそれぞれから特徴量を抽出する。更に、再生制御装置２０は、この特徴量を学習モデルに入力することにより、所定の場面（例えば、動画の閲覧の目的となる場面）に対応する画像データを検出する。そして、再生制御装置２０は、検出した所定の場面に対応する画像データを示す情報に基づいて、複数の画像データの再生を制御する。 The reproduction control device 20 is a device that controls the reproduction of the moving image received from the wearable camera 10. The reproduction control device 20 is realized by, for example, a personal computer or a server device.
As a specific processing content, the reproduction control device 20 acquires a plurality of time-continuous image data from the wearable camera 10. Further, the reproduction control device 20 detects a target area including a predetermined target from the images of each of the plurality of image data. Further, the reproduction control device 20 extracts a feature amount from each of the plurality of image data based on the change of the image between the plurality of image data and whether or not the changing region is the target region. .. Further, the reproduction control device 20 detects image data corresponding to a predetermined scene (for example, a scene to be viewed in a moving image) by inputting this feature amount into the learning model. Then, the reproduction control device 20 controls the reproduction of a plurality of image data based on the information indicating the image data corresponding to the detected predetermined scene.

このように、ウェアラブルカメラ１０は、ユーザＵの視野画像や注視点の変化を示す、複数の画像データからなる動画を生成することができる。また、再生制御装置２０は、動画内の複数の画像データから抽出した特徴量と、学習モデルとに基づいて、所定の場面を検出すると共に、所定の場面であるか否かに基づいて、複数の画像データからなる動画の再生を制御することができる。
従って、本実施形態に係る再生制御システムＳによれば、再生に関する制御によって、より適切にユーザの閲覧を支援することができる。 In this way, the wearable camera 10 can generate a moving image composed of a plurality of image data showing changes in the visual field image and the gazing point of the user U. Further, the reproduction control device 20 detects a predetermined scene based on the feature amount extracted from a plurality of image data in the moving image and the learning model, and also determines a plurality of predetermined scenes based on whether or not the scene is a predetermined scene. It is possible to control the playback of a moving image consisting of the image data of.
Therefore, according to the reproduction control system S according to the present embodiment, it is possible to more appropriately support the user's browsing by controlling the reproduction.

このような閲覧の支援を行うことから、再生制御システムＳによれば、上述したような、ユーザが再生制御のための様々な操作を行う必要があり、ユーザにとって煩雑である、という問題を解消することができる。他にも、再生制御システムＳによれば、上述したような、はじめて閲覧する動画等では、ユーザは所定の場面（例えば、動画の閲覧の目的となる場面）が、動画のどの箇所に含まれているかを特定することが容易ではない、という問題を解消することができる。 Since such browsing support is provided, the playback control system S solves the problem that the user needs to perform various operations for playback control as described above, which is complicated for the user. can do. In addition, according to the playback control system S, in the video or the like that is viewed for the first time as described above, a predetermined scene (for example, a scene that is the purpose of viewing the video) is included in any part of the video. It is possible to solve the problem that it is not easy to identify whether or not the system is used.

このような再生制御システムＳは、様々な用途において利用することができる。以下では、再生制御システムＳの好適な用途の一例として、ユーザＵが所定の作業として手術を行う執刀医である場合を例にとって説明する。そして、この手術における、（１）ユーザＵの視線の動き、（２）ユーザＵの視野画像における背景変化、及び（３）ユーザＵの動作部位である手の動き、という３つの特徴量に基づいた機械学習をすることによって、所定の場面である切開場面を検出する用途に再生制御システムＳを用いることを想定する。 Such a reproduction control system S can be used for various purposes. In the following, as an example of a suitable use of the regeneration control system S, a case where the user U is a surgeon who performs an operation as a predetermined operation will be described as an example. Then, based on the three feature quantities of (1) the movement of the line of sight of the user U, (2) the background change in the visual field image of the user U, and (3) the movement of the hand which is the movement part of the user U in this operation. It is assumed that the reproduction control system S is used for detecting an incision scene, which is a predetermined scene, by performing machine learning.

切開場面では、患部に注視した作業であるためユーザＵの視線の動きが小さく、ユーザＵが頭を動かさないので背景の変化も少なく、手先による精緻な作業であるため手の全体の動きは小さいと考えられる。すなわち、これら３つの特徴量は、切開場面との関連性が高い特徴量であるため、切開場面の検出の用途に好適と考えられる。なお、所定の作業である手術は、ユーザＵ一人で行われてもよいが、以下の説明では、ユーザＵと助手とによる協働作業として行われることを想定する。そのため、上記（３）においては、助手の動作部位である手の動きも特徴量として抽出される。 In the incision scene, the movement of the line of sight of the user U is small because the work is focused on the affected area, the background does not change much because the user U does not move the head, and the movement of the entire hand is small because it is a delicate work by the hand. it is conceivable that. That is, since these three features are highly related to the incision scene, they are considered to be suitable for the use of detecting the incision scene. The operation, which is a predetermined operation, may be performed by the user U alone, but in the following description, it is assumed that the operation is performed as a collaborative work between the user U and the assistant. Therefore, in the above (3), the movement of the hand, which is the movement part of the assistant, is also extracted as a feature amount.

この手術の動画に再生制御システムＳを用いるという用途に関して、より詳細に説明する。医療技術を伝達する方法の１つとして手術動画を参照するという方法がある。特に若い外科医には執刀医として手術を経験する機会が限られるため、執刀医の視野に対応する一人称視点での手術動画は、手術の実践訓練を補うための教材として有益である。しかしながら、手術の動画は長時間となることが多い。例えば、乳腺外科における腫瘍摘出手術では、二時間程度の録画時間となることも少なくない。こうした長時間の動画から、動画の閲覧の目的となるような所定の場面（ここでは、一例として切開場面）を特定するには多くの時間を要してしまう。なぜならば、手術動画には準備場面や片付け場面といった、手術において本質的ではない場面も含まれているためである。 The application of using the reproduction control system S in the moving image of this operation will be described in more detail. One of the methods of transmitting medical technology is to refer to surgical videos. In particular, young surgeons have limited opportunities to experience surgery as a surgeon, so surgical videos from a first-person perspective that correspond to the surgeon's field of view are useful as teaching materials to supplement practical training in surgery. However, surgical videos are often lengthy. For example, in tumor removal surgery in breast surgery, the recording time is often about two hours. It takes a lot of time to identify a predetermined scene (here, an incision scene as an example) that is the purpose of viewing the video from such a long video. This is because the surgical video includes scenes that are not essential in the surgery, such as preparation scenes and tidying up scenes.

そこで、上述したように再生制御システムＳを用いることにより、長時間となりがちな手術の動画から、動画の閲覧の目的となる切開場面を検出し、この検出した切開場面を、他の場面（例えば、準備場面や片付け場面）よりも、閲覧者であるユーザにとってより見やすい態様で閲覧できるようにする。これにより、閲覧者であるユーザは、再生制御のための煩雑な操作を行うことなく、容易に切開場面を閲覧することができる。 Therefore, by using the reproduction control system S as described above, an incision scene that is the purpose of viewing the video is detected from the video of the surgery that tends to take a long time, and the detected incision scene is used as another scene (for example,). , Preparation scenes and tidying up scenes), so that the user who is the viewer can browse in a more easily visible manner. As a result, the user who is a viewer can easily browse the incision scene without performing a complicated operation for playback control.

また、繰り返しになるが、これは好適な用途の一例に過ぎず、再生制御システムＳを利用することができる用途を限定する趣旨ではない。すなわち、再生制御システムＳは、これ以外にも任意の動画の再生の制御に利用することができる。また、再生を制御する動画に作業が含まれる場合、この作業は、単独の作業者による作業であってもよく、複数の作業者による協働作業であってもよい。 Further, again, this is only an example of suitable applications, and is not intended to limit the applications in which the reproduction control system S can be used. That is, the reproduction control system S can be used for controlling the reproduction of any moving image other than this. Further, when the moving image for controlling playback includes a work, this work may be a work by a single worker or a collaborative work by a plurality of workers.

なお、以下では説明を明確とするために、ウェアラブルカメラ１０を装着して手術を行うユーザ（図１のユーザＵに相当）及びその助手を「作業者」と称する。これに対して、再生制御装置２０が再生する手術の動画を閲覧するユーザを「閲覧者」と称する。 In the following, in order to clarify the explanation, a user (corresponding to the user U in FIG. 1) who wears the wearable camera 10 and performs an operation and an assistant thereof are referred to as “workers”. On the other hand, a user who browses the moving image of the operation reproduced by the reproduction control device 20 is referred to as a "viewer".

［ウェアラブルカメラの構成］
次に、ウェアラブルカメラ１０の構成について、図２のブロック図を参照して説明をする。図２に示すように、ウェアラブルカメラ１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３と、通信部１４と、センサ部１５と、記憶部１６と、入力部１７と、撮像部１８と、アイトラッキング部１９と、を備えている。これら各部は、信号線により接続されており、相互に信号を送受する。 [Wearable camera configuration]
Next, the configuration of the wearable camera 10 will be described with reference to the block diagram of FIG. As shown in FIG. 2, the wearable camera 10 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a communication unit 14, a sensor unit 15, and a storage. A unit 16, an input unit 17, an imaging unit 18, and an eye tracking unit 19 are provided. Each of these parts is connected by a signal line and sends and receives signals to and from each other.

ＣＰＵ１１は、ＲＯＭ１２に記録されているプログラム、又は、記憶部１６からＲＡＭ１３にロードされたプログラムに従って各種の処理（例えば、後述する撮影処理）を実行する。
ＲＡＭ１３には、ＣＰＵ１１が各種の処理を実行する上において必要なデータ等も適宜記憶される。 The CPU 11 executes various processes (for example, a shooting process described later) according to a program recorded in the ROM 12 or a program loaded from the storage unit 16 into the RAM 13.
Data and the like necessary for the CPU 11 to execute various processes are also appropriately stored in the RAM 13.

通信部１４は、ＣＰＵ１１が、他の装置（例えば、再生制御装置２０）との間で通信を行うための通信制御を行う。
センサ部１５は、加速度センサやジャイロセンサで構成され、ウェアラブルカメラ１０を装着した作業者の動きを測定する。このようなセンサ部１５の測定結果に基づいて、ＣＰＵ１１は、キャリブレーションをした後の、撮像部１８と作業者とのズレの補正等の処理を行うことができる。 The communication unit 14 performs communication control for the CPU 11 to perform communication with another device (for example, the reproduction control device 20).
The sensor unit 15 is composed of an acceleration sensor and a gyro sensor, and measures the movement of an operator wearing the wearable camera 10. Based on the measurement result of the sensor unit 15, the CPU 11 can perform processing such as correction of the deviation between the image pickup unit 18 and the operator after the calibration.

記憶部１６は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の半導体メモリで構成され、各種データを記憶する。
入力部１７は、各種ボタン及びタッチパネル等で構成され、ユーザの指示操作に応じて各種情報を入力する。 The storage unit 16 is composed of a semiconductor memory such as a DRAM (Dynamic Random Access Memory) and stores various data.
The input unit 17 is composed of various buttons, a touch panel, and the like, and inputs various information according to a user's instruction operation.

撮像部１８は、レンズ及び撮像素子等を備えた撮像装置によって構成され、視野画像を撮像する。
アイトラッキング部１９は、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）等の発光素子と、アイトラッキング用の撮像装置によって構成され、注視点を計測する。具体的には、アイトラッキング部１９は、発光素子を発光させることにより作業者の角膜上に光の反射点を生じさせると共に、その作業者の眼球の画像をアイトラッキング用の撮像装置で撮像する。そして、アイトラッキング部１９は、撮像された眼球の画像を解析することにより、作業者の注視点を示す情報として、注視点の位置に対応する二次元座標の座標値を算出する。 The image pickup unit 18 is composed of an image pickup device including a lens, an image pickup element, and the like, and captures a field image.
The eye tracking unit 19 is composed of a light emitting element such as an LED (Light Emitting Diode) and an image pickup device for eye tracking, and measures a gazing point. Specifically, the eye tracking unit 19 generates a reflection point of light on the cornea of the worker by causing the light emitting element to emit light, and captures an image of the worker's eyeball with an image pickup device for eye tracking. .. Then, the eye tracking unit 19 analyzes the captured image of the eyeball to calculate the coordinate values of the two-dimensional coordinates corresponding to the position of the gazing point as information indicating the gazing point of the operator.

これら撮像部１８やアイトラッキング部１９は、作業者がウェアラブルカメラ１０を装着した状態において、視野画像の撮影や注視点の測定を行うのに適した位置に配置される。例えば、撮像部１８のレンズは、ウェアラブルカメラ１０における眼鏡のブリッジ部分に配置される。また、例えば、アイトラッキング部１９の発光装置やアイトラッキング用の撮像装置は、ウェアラブルカメラ１０における眼鏡のレンズ周辺に配置される。 The imaging unit 18 and the eye tracking unit 19 are arranged at positions suitable for taking a visual field image and measuring the gazing point while the operator is wearing the wearable camera 10. For example, the lens of the imaging unit 18 is arranged at the bridge portion of the eyeglasses in the wearable camera 10. Further, for example, the light emitting device of the eye tracking unit 19 and the imaging device for eye tracking are arranged around the lens of the eyeglasses in the wearable camera 10.

ウェアラブルカメラ１０では、これら各部が協働することにより、「撮影処理」を行なう。
ここで、撮影処理は、ウェアラブルカメラ１０が、視野画像と注視点の位置を示す情報とに基づいて、時間的に連続した複数の複数の画像データからなる動画を生成する一連の処理である。 In the wearable camera 10, the "shooting process" is performed by the cooperation of each of these parts.
Here, the shooting process is a series of processes in which the wearable camera 10 generates a moving image composed of a plurality of image data continuously connected in time based on the visual field image and the information indicating the position of the gazing point.

この撮影処理が実行される場合、図２に示すように、ＣＰＵ１１において、視野画像撮影部１１１と、注視点計測部１１２と、画像データ生成部１１３と、画像データ送信部１１４と、が機能する。
また、記憶部１６の一領域には、画像データ記憶部１６１が設けられる。
以下で特に言及しない場合も含め、これら機能ブロック間では、処理を実現するために必要なデータを、適切なタイミングで適宜送受信する。 When this shooting process is executed, as shown in FIG. 2, the visual field image shooting unit 111, the gazing point measurement unit 112, the image data generation unit 113, and the image data transmission unit 114 function in the CPU 11. ..
Further, an image data storage unit 161 is provided in one area of the storage unit 16.
Data necessary for realizing processing is appropriately transmitted and received between these functional blocks at appropriate timings, even if not specifically mentioned below.

視野画像撮影部１１１は、撮像部１８を用いて、所定の周期（すなわち、所定のフレームレート）で視野画像を撮影する。そして、視野画像撮影部１１１は、撮影により得られた視野画像を画像データ生成部１１３に対して出力する。 The field-of-view image capturing unit 111 uses the imaging unit 18 to capture a field-of-view image at a predetermined cycle (that is, a predetermined frame rate). Then, the field-of-view image photographing unit 111 outputs the field-of-view image obtained by the photographing to the image data generation unit 113.

注視点計測部１１２は、アイトラッキング部１９を用いて、視野画像撮影部１１１による撮影と同様の所定の周期（すなわち、所定のフレームレート）で注視点の位置に対応する二次元座標の座標値を算出する。そして、注視点計測部１１２は、算出した注視点の位置に対応する座標値を画像データ生成部１１３に対して出力する。 The gazing point measuring unit 112 uses the eye tracking unit 19 to obtain coordinate values of two-dimensional coordinates corresponding to the gazing point position in a predetermined period (that is, a predetermined frame rate) similar to that taken by the visual field image capturing unit 111. Is calculated. Then, the gazing point measurement unit 112 outputs the coordinate values corresponding to the calculated gazing point positions to the image data generation unit 113.

画像データ生成部１１３は、視野画像撮影部１１１から入力された視野画像と、注視点計測部１１２から入力された注視点の位置に対応する座標値とを、フレーム単位で対応付けする（すなわち、合成する）ことにより、注視点の情報を含んだ画像データを生成する。そして、画像データ生成部１１３は、生成した画像データを画像データ記憶部１６１に記憶させる。
視野画像撮影部１１１、注視点計測部１１２、及び画像データ生成部１１３は、作業者による作業が継続している間、このような画像データの生成のための処理を繰り返すことにより、ユーザＵの視野画像と注視点の変化を示す、時間的に連続した複数の画像データを生成する。 The image data generation unit 113 associates the field image input from the field image capturing unit 111 with the coordinate values corresponding to the position of the gazing point input from the gazing point measuring unit 112 (that is, frame by frame). By synthesizing), image data including information on the gazing point is generated. Then, the image data generation unit 113 stores the generated image data in the image data storage unit 161.
The field-of-view image capturing unit 111, the gazing point measurement unit 112, and the image data generation unit 113 repeat the process for generating such image data while the work by the operator is continuing, so that the user U can generate the image data. Generates a plurality of temporally continuous image data showing changes in the visual field image and the gazing point.

画像データ送信部１１４は、画像データ生成部１１３により生成されて、画像データ記憶部１６１に記憶されている、時間的に連続した複数の画像データを、動画データの形式に変換して再生制御装置２０に対して送信する。なお、この複数の画像データを動画データの形式に変換する処理は、複数の画像データを受信した再生制御装置２０が行うようにしてもよい。 The image data transmission unit 114 converts a plurality of temporally continuous image data generated by the image data generation unit 113 and stored in the image data storage unit 161 into a moving image data format, and is a playback control device. Send to 20. The process of converting the plurality of image data into the format of moving image data may be performed by the reproduction control device 20 that has received the plurality of image data.

［再生制御装置の構成］
次に、再生制御装置２０の構成について、図３のブロック図を参照して説明をする。図３に示すように、再生制御装置２０は、ＣＰＵ２１と、ＲＯＭ２２と、ＲＡＭ２３と、通信部２４と、記憶部２５と、入力部２６と、出力部２７と、ドライブ２８と、を備えている。これら各部は、信号線により接続されており、相互に信号を送受する。 [Configuration of playback control device]
Next, the configuration of the reproduction control device 20 will be described with reference to the block diagram of FIG. As shown in FIG. 3, the reproduction control device 20 includes a CPU 21, a ROM 22, a RAM 23, a communication unit 24, a storage unit 25, an input unit 26, an output unit 27, and a drive 28. .. Each of these parts is connected by a signal line and sends and receives signals to and from each other.

ＣＰＵ２１は、ＲＯＭ２２に記録されているプログラム、又は、記憶部２５からＲＡＭ２３にロードされたプログラムに従って各種の処理（例えば、後述する学習処理や再生制御処理）を実行する。
ＲＡＭ２３には、ＣＰＵ２１が各種の処理を実行する上において必要なデータ等も適宜記憶される。 The CPU 21 executes various processes (for example, learning process and reproduction control process described later) according to the program recorded in the ROM 22 or the program loaded from the storage unit 25 into the RAM 23.
Data and the like necessary for the CPU 21 to execute various processes are also appropriately stored in the RAM 23.

通信部２４は、ＣＰＵ２１が、他の装置（例えば、ウェアラブルカメラ１０）との間で通信を行うための通信制御を行う。
記憶部２５は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の半導体メモリで構成され、各種データを記憶する。 The communication unit 24 controls communication for the CPU 21 to communicate with another device (for example, the wearable camera 10).
The storage unit 25 is composed of a semiconductor memory such as a DRAM (Dynamic Random Access Memory) and stores various data.

入力部２６は、各種ボタン及びタッチパネル、又はマウス及びキーボード等の外部入力装置で構成され、ユーザの指示操作に応じて各種情報を入力する。
出力部２７は、ディスプレイやスピーカ等で構成され、画像や音声を出力する。
ドライブ２８には、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリ等よりなる、リムーバブルメディア（図示を省略する。）が適宜装着される。ドライブ２８よってリムーバブルメディアから読み出されたプログラムは、必要に応じて記憶部２５にインストールされる。 The input unit 26 is composed of various buttons and a touch panel, or an external input device such as a mouse and a keyboard, and inputs various information according to a user's instruction operation.
The output unit 27 is composed of a display, a speaker, or the like, and outputs an image or sound.
A removable medium (not shown) made of a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is appropriately mounted on the drive 28. The program read from the removable media by the drive 28 is installed in the storage unit 25 as needed.

再生制御装置２０では、これら各部が協働することにより、「学習処理」と、「再生制御処理」とを行なう。
ここで、学習処理は、再生制御装置２０が、ウェアラブルカメラ１０から受信した動画データから抽出される特徴量を含む入力データと、閲覧者から取得した所定の場面（ここでは、切開場面）を示すラベルとの組を教師データとして機械学習を行うことにより、学習モデルを構築（学習モデルの更新を含む）する一連の処理である。
また、再生制御処理は、再生制御装置２０が、動画内の複数の画像データから抽出した特徴量と、学習処理により構築された学習モデルとに基づいて、所定の場面を検出すると共に、所定の場面であるか否かに基づいて、複数の画像データからなる動画の再生を制御する一連の処理である。 In the reproduction control device 20, the "learning process" and the "reproduction control process" are performed by the cooperation of each of these parts.
Here, the learning process shows the input data including the feature amount extracted from the moving image data received from the wearable camera 10 by the playback control device 20 and a predetermined scene (here, an incision scene) acquired from the viewer. It is a series of processes for constructing a learning model (including updating the learning model) by performing machine learning using a set with a label as teacher data.
Further, in the reproduction control process, the reproduction control device 20 detects a predetermined scene based on the feature amount extracted from a plurality of image data in the moving image and the learning model constructed by the learning process, and also determines a predetermined scene. It is a series of processes for controlling the reproduction of a moving image composed of a plurality of image data based on whether or not it is a scene.

これら学習処理や再生制御処理が実行される場合、図３に示すように、ＣＰＵ２１において、画像データ取得部２１１と、領域検出部２１２と、注視点検出部２１３と、特徴量抽出部２１４と、が機能する。
また、記憶部２５の一領域には、動画データ記憶部２５１と、学習モデル記憶部２５２と、が設けられる。
以下で特に言及しない場合も含め、これら機能ブロック間では、処理を実現するために必要なデータを、適切なタイミングで適宜送受信する。 When these learning processes and reproduction control processes are executed, as shown in FIG. 3, in the CPU 21, the image data acquisition unit 211, the area detection unit 212, the gazing point detection unit 213, the feature amount extraction unit 214, and so on. Works.
Further, in one area of the storage unit 25, a moving image data storage unit 251 and a learning model storage unit 252 are provided.
Data necessary for realizing processing is appropriately transmitted and received between these functional blocks at appropriate timings, even if not specifically mentioned below.

画像データ取得部２１１は、ウェアラブルカメラ１０から複数の画像データを変換した動画データを、受信することにより取得する。そして、画像データ取得部２１１は、取得した複数の画像データを変換した動画データを動画データ記憶部２５１に記憶させる。なお、画像データを動画データの形式に変換する処理を再生制御装置２０で行うようにしてもよい点については、画像データ送信部１１４の説明において上述した通りである。 The image data acquisition unit 211 acquires the moving image data obtained by converting a plurality of image data from the wearable camera 10 by receiving the moving image data. Then, the image data acquisition unit 211 stores the video data obtained by converting the acquired plurality of image data in the video data storage unit 251. The point that the reproduction control device 20 may perform the process of converting the image data into the format of the moving image data is as described above in the description of the image data transmission unit 114.

領域検出部２１２は、動画データ記憶部２５１に記憶されている動画データ内の各視野画像（すなわち、各フレーム）のそれぞれに対して、エッジ検出等の既存の手法を用いた画像認識を行うことにより、作業者の動作部位（ここでは、作業者の手）が含まれる領域である対象領域を検出する。 The area detection unit 212 performs image recognition using an existing method such as edge detection for each of the visual field images (that is, each frame) in the moving image data stored in the moving image data storage unit 251. Detects the target area, which is the area including the moving part of the worker (here, the worker's hand).

注視点検出部２１３は、動画データ記憶部２５１に記憶されている動画データ内の各視野画像（すなわち、各フレーム）のそれぞれから、画像データ生成部１１３が画像データ生成時に画像データに含ませた、作業者の注視点の情報（ここでは、注視点の位置を示す座標値）を検出する。 The gazing point detection unit 213 includes the image data generation unit 113 in the image data at the time of image data generation from each of the field images (that is, each frame) in the moving image data stored in the moving image data storage unit 251. , The information of the gazing point of the operator (here, the coordinate value indicating the position of the gazing point) is detected.

これら対象領域の検出及び注視点の情報の検出について、図４を参照して説明する。図４は、対象領域と注視点の検出について説明する模式図である。
図４に示すように、視野画像の一例である視野画像３１は、動作部位３２、動作部位３３、メス３４、及びマーキング３５といった撮影された物体の画像を含む。また、視野画像３１には、対象領域の境界３６、及び注視点３７を併せて図示する。 The detection of the target area and the detection of the gazing point information will be described with reference to FIG. FIG. 4 is a schematic diagram illustrating the detection of the target area and the gazing point.
As shown in FIG. 4, the field of view image 31, which is an example of the field of view image, includes images of captured objects such as the moving part 32, the moving part 33, the scalpel 34, and the marking 35. Further, in the visual field image 31, the boundary 36 of the target area and the gazing point 37 are also shown.

視野画像３１は、作業者である助手の補助のもと、作業者であるユーザＵが切開をしている場面を撮影した視野画像である。
動作部位３２は、作業者であるユーザＵ（執刀医）の動作部位の手である。一方で、動作部位３３は、作業者である助手の動作部位の手である。 The field-of-view image 31 is a field-of-view image of a scene in which the user U, who is a worker, is making an incision with the assistance of an assistant who is a worker.
The moving part 32 is the hand of the moving part of the user U (surgeon) who is an operator. On the other hand, the moving part 33 is the hand of the moving part of the assistant who is the worker.

メス３４は、作業者であるユーザＵ（執刀医）が患者を切開するために用いているメスである。マーキング３５は、手術部位を明確とするためにスキンマーカにより患者に引かれた線である。 The scalpel 34 is a scalpel used by a user U (surgeon) who is an operator to make an incision in a patient. Marking 35 is a line drawn on the patient by a skin marker to clarify the surgical site.

対象領域の境界３６は、領域検出部２１２により検出された対象領域と、それ以外の領域である非対象領域の境界である。本例では、動作部位３２及び動作部位３３が含まれることから対象領域の境界３６の内側が対象領域として検出され、外側が非対象領域として検出されている。なお、本例では対象領域は、１つの円型形状の領域として検出されているが、本実施形態を実装する環境等に応じて、各動作部位に対応して複数の領域として検出されるようにしてもよいし、円型以外の形状の領域として検出されるようにしてもよい。 The boundary 36 of the target area is the boundary between the target area detected by the area detection unit 212 and the non-target area which is another area. In this example, since the operating portion 32 and the operating portion 33 are included, the inside of the boundary 36 of the target region is detected as the target region, and the outside is detected as the non-target region. In this example, the target region is detected as one circular region, but it is detected as a plurality of regions corresponding to each operating part according to the environment in which the present embodiment is implemented. Alternatively, it may be detected as a region having a shape other than a circular shape.

注視点３７は、領域検出部２１２により検出された注視点の位置を示す座標値に対応する点である。これは、視野画像を撮影した際に、作業者であるユーザＵ（執刀医）が実際に注視していた注視点に対応する。 The gazing point 37 is a point corresponding to a coordinate value indicating the position of the gazing point detected by the region detection unit 212. This corresponds to the gazing point that the worker U (surgeon) was actually gazing at when the visual field image was taken.

領域検出部２１２及び注視点検出部２１３は、このように検出した対象領域と、注視点の情報とを、特徴量抽出部２１４に対して出力する。 The area detection unit 212 and the gazing point detection unit 213 output the target area detected in this way and the gazing point information to the feature amount extraction unit 214.

特徴量抽出部２１４は、領域検出部２１２及び注視点検出部２１３の検出結果や動画データの間での変化等に基づいて、動画データ内の各動画データ（すなわち、各フレーム）それぞれの特徴量を抽出する。 The feature amount extraction unit 214 is a feature amount of each moving image data (that is, each frame) in the moving image data based on the detection results of the area detecting unit 212 and the gazing point detecting unit 213, changes between the moving image data, and the like. Is extracted.

第１の特徴量として、作業者であるユーザＵ（執刀医）の視線の動き（すなわち、注視点の移動）に基づいた特徴量の抽出について図５を参照して説明する。図５は、注視点の移動距離について説明する模式図である。まず、第ｎフレーム（ｎは１以上の整数値）の視野画像である視野画像４１−ｎにおいて、注視点４２−ｎとして示す位置に注視点が検出されたとする。次に、注視点が移動し、第ｍフレーム（ｍ＝ｎ＋１）の視野画像である視野画像４１−ｍにおいて、注視点４２−ｍとして示す位置に注視点が検出されたとする。この場合、注視点４２−ｎから注視点４２−ｍまでの距離が注視点の移動距離となる。この場合に、特徴量抽出部２１４は、第１の特徴量を、例えば、＜注視点の移動に基づく特徴量の算出式＞として示す以下の数式により算出することにより抽出する。 As the first feature amount, extraction of the feature amount based on the movement of the line of sight (that is, the movement of the gazing point) of the user U (surgeon) who is the operator will be described with reference to FIG. FIG. 5 is a schematic diagram illustrating the moving distance of the gazing point. First, it is assumed that the gazing point is detected at the position indicated as the gazing point 42-n in the field image 41-n which is the field image of the nth frame (n is an integer value of 1 or more). Next, it is assumed that the gazing point moves and the gazing point is detected at the position indicated as the gazing point 42-m in the visual field image 41-m which is the visual field image of the mth frame (m = n + 1). In this case, the distance from the gazing point 42-n to the gazing point 42-m is the moving distance of the gazing point. In this case, the feature amount extraction unit 214 extracts the first feature amount by calculating it by the following formula shown as, for example, <calculation formula of the feature amount based on the movement of the gazing point>.

＜注視点の移動に基づく特徴量の算出式＞
ユークリッド距離／単位時間
ただし、ユークリッド距離は注視点４２−ｎ及び注視点４２−ｍの座標値の成分ごとの差分の２乗和の正の平方根であり、単位時間は視野画像の撮影時のフレームレートに対応する隣接するフレームの間隔である。 <Calculation formula for features based on movement of gazing point>
Euclidean distance / unit time However, the Euclidean distance is the positive square root of the sum of squares of the differences between the components of the coordinate values of the gazing point 42-n and the gazing point 42-m, and the unit time is the frame at the time of shooting the field image. The spacing between adjacent frames corresponding to the rate.

第２の特徴量として、作業者であるユーザＵ（執刀医）の視野画像における背景変化に基づいた特徴量の抽出について図６を参照して説明する。図６は、背景の移動量について説明する模式図である。まず、第ｎフレーム（ｎは１以上の整数値）の視野画像内の非対象領域（すなわち、背景）である非対象領域４３−ｎにおいて、物体４４−ｎとして示す位置に手術台が撮影されたとする。次に、作業者であるユーザＵ（執刀医）の頭部の向きが変わったことから、第ｍフレーム（ｍ＝ｎ＋１）の視野画像内容の非対象領域である非対象領域４３−ｍにおいて、物体４４−ｍとして示す位置に手術台が撮影されたとする。 As the second feature amount, extraction of the feature amount based on the background change in the visual field image of the user U (surgeon) who is the operator will be described with reference to FIG. FIG. 6 is a schematic diagram illustrating the amount of movement of the background. First, in the non-target region 43-n, which is the non-target region (that is, the background) in the field image of the nth frame (n is an integer value of 1 or more), the operating table is photographed at the position indicated as the object 44-n. Suppose. Next, since the orientation of the head of the user U (surgeon) who is the operator has changed, in the non-target area 43-m, which is the non-target area of the visual field image content of the mth frame (m = n + 1), It is assumed that the operating table is photographed at the position shown as the object 44-m.

この場合に、特徴量抽出部２１４は、第２の特徴量を算出するために、まずフレーム間の物体（ここでは、手術台）の動きを示す移動ベクトル（図中の矢印に相当）を算出する。この移動ベクトルの算出は、例えば、オプティカルフローのＬｕｋａｓ−Ｋａｎａｄｅ法に基づいて行うことができる。また、この場合の追跡する特徴点の検出は、例えば、コーナー検出等の既存の手法を用いることができる。ここで、本実施形態では、算出した全ての特徴点の移動ベクトルをそのまま特徴量として利用するのではなく、背景が大きく動いているか否かということを基準として特徴量とする。そこで、特徴量抽出部２１４は、第２の特徴量を、フレーム間における算出した全ての移動ベクトルの平均値を算出することにより抽出する。なお、ここでは、非対象領域におけるフレーム間における全ての移動ベクトルの平均値を第２の特徴量としているが、対象領域及び非対象領域双方におけるフレーム間における全ての移動ベクトルの平均値を第２の特徴量とするようにしてもよい。 In this case, in order to calculate the second feature amount, the feature amount extraction unit 214 first calculates a movement vector (corresponding to an arrow in the figure) indicating the movement of an object (here, an operating table) between frames. To do. The calculation of this movement vector can be performed based on, for example, the Lucas-Kanade method of optical flow. Further, for the detection of the feature points to be tracked in this case, an existing method such as corner detection can be used. Here, in the present embodiment, the movement vectors of all the calculated feature points are not used as they are as feature quantities, but are used as feature quantities based on whether or not the background moves significantly. Therefore, the feature amount extraction unit 214 extracts the second feature amount by calculating the average value of all the movement vectors calculated between the frames. Here, the average value of all movement vectors between frames in the non-target area is used as the second feature amount, but the average value of all movement vectors between frames in both the target area and the non-target area is used as the second feature. It may be set as the feature amount of.

第３の特徴量として、作業者であるユーザＵ（執刀医）の動作部位（ここでは、手）の動きに基づいた特徴量の抽出について図７を参照して説明する。図７は、動作部位の移動量について説明する模式図である。まず、第ｎフレーム（ｎは１以上の整数値）の視野画像内の対象領域である対象領域４５−ｎにおいて、動作部位−４６ｎとして示す位置に動作部位である手が撮影されたとする。次に、作業者であるユーザＵ（執刀医）の手が移動したことから、第ｍフレーム（ｍ＝ｎ＋１）の視野画像内の対象領域である対象領域４５−ｍにおいて、動作部位−４６ｍとして示す位置に動作部位である手が撮影されたとする。 As a third feature amount, extraction of the feature amount based on the movement of the moving part (here, the hand) of the user U (surgeon) who is the operator will be described with reference to FIG. 7. FIG. 7 is a schematic diagram illustrating the amount of movement of the moving portion. First, it is assumed that in the target area 45-n, which is the target area in the field image of the nth frame (n is an integer value of 1 or more), the hand, which is the moving part, is photographed at the position indicated as the moving part-46n. Next, since the hand of the user U (surgeon) who is the operator has moved, in the target area 45-m, which is the target area in the visual field image of the mth frame (m = n + 1), the moving part is -46 m. It is assumed that the hand, which is the moving part, is photographed at the indicated position.

この場合に、特徴量抽出部２１４は、第３の特徴量を算出するために、まずフレーム間の動作部位（ここでは、手）の動きを示す移動ベクトル（図中の矢印に相当）を算出する。この移動ベクトルの算出は、例えば、第２の特徴量と同様にして、オプティカルフローのＬｕｋａｓ−Ｋａｎａｄｅ法に基づいて行うことができる。ただし、動作部位の種類によっては、特徴点を十分に検出できない可能性がある。このような場合には、手に対応する画素全てを対象として、オプティカルフローのＧｕｎｎａｒ−Ｆａｒｎｅｂａｃｋ法に基づいて移動ベクトルを算出するようにしてもよい。何れの場合であっても、特徴量抽出部２１４は、第３の特徴量を、第２の特徴量と同様の考えで、フレーム間における移動部位について算出した全ての移動ベクトルの平均値を算出することにより抽出する。 In this case, in order to calculate the third feature amount, the feature amount extraction unit 214 first calculates a movement vector (corresponding to an arrow in the figure) indicating the movement of the motion part (here, the hand) between frames. To do. The calculation of this movement vector can be performed based on the Lucas-Kanade method of optical flow, for example, in the same manner as in the second feature quantity. However, depending on the type of moving part, it may not be possible to sufficiently detect the feature points. In such a case, the movement vector may be calculated based on the Gunnar-Farneback method of optical flow for all the pixels corresponding to the hand. In any case, the feature amount extraction unit 214 calculates the average value of all the movement vectors calculated for the movement part between the frames with the third feature amount in the same way as the second feature amount. Extract by doing.

そして、特徴量抽出部２１４は、算出することにより抽出したこれら３つの特徴量のそれぞれを出力する。出力先は、学習処理の場合には学習部２１５であり、再生制御処理の場合には場面検出部２１６である。 Then, the feature amount extraction unit 214 outputs each of these three feature amounts extracted by calculation. The output destination is the learning unit 215 in the case of the learning process, and the scene detection unit 216 in the case of the reproduction control process.

学習部２１５は、特徴量抽出部２１４が抽出した３つの特徴量を含む入力データと、閲覧者から取得した所定の場面（ここでは、切開場面）を示すラベルとの組を教師データとして機械学習を行うことにより、学習モデルを構築（学習モデルの更新を含む）する。
ここで、学習対象とする動画データ内の各画像データの３つの特徴量については、上述したように特徴量抽出部２１４から入力されることにより取得される。 The learning unit 215 uses machine learning as teacher data for a set of input data including three feature quantities extracted by the feature quantity extraction unit 214 and a label indicating a predetermined scene (here, an incision scene) acquired from a viewer. Build a learning model (including updating the learning model) by performing.
Here, the three feature amounts of each image data in the moving image data to be learned are acquired by being input from the feature amount extraction unit 214 as described above.

ラベルは、予め閲覧者が学習対象とする動画を参照して、所定の場面（ここでは、切開場面）に対応する画像データに対して、ラベル付けのための操作を行うことにより生成される。例えば、切開場面であれば、メスを切り込む瞬間からメスを患部から離す瞬間までに対応する画像データに対してラベル付けを行う操作を行う。この操作に応じて、メスを切り込む瞬間からメスを患部から離す瞬間までに対応する画像データそれぞれに正解を示す情報（例えば、値「１」）を付与し、それ以外の画像データには不正解を示す情報（例えば、値「０」）を付与する。このラベル付けの処理により、学習部２１５は、各画像データのそれぞれについてラベルを取得することができる。このラベル付けの処理は、再生制御装置２０により行われてもよいし、他の装置で行われて、その結果を再生制御装置２０が取得するようにしてもよい。 The label is generated by referring to the moving image to be learned by the viewer in advance and performing an operation for labeling the image data corresponding to a predetermined scene (here, an incision scene). For example, in the case of an incision scene, an operation of labeling the corresponding image data from the moment when the scalpel is cut to the moment when the scalpel is separated from the affected part is performed. In response to this operation, information indicating the correct answer (for example, the value "1") is given to each of the corresponding image data from the moment when the scalpel is cut to the moment when the scalpel is separated from the affected part, and the other image data is incorrect. (For example, the value "0") is given. By this labeling process, the learning unit 215 can acquire a label for each of the image data. This labeling process may be performed by the reproduction control device 20, or may be performed by another device, and the result may be acquired by the reproduction control device 20.

学習部２１５は、このようにして取得した３つの特徴量と、対応するラベルとを組にして教師データを生成する。そして、学習部２１５は、この教師データを用いて、例えば、教師ありの機械学習を行う。この場合、学習部２１５は、例えば、パーセプトロンを組み合わせて構成したニューラルネットワークにより、機械学習を行う。具体的には、教師データに含まれる特徴量をニューラルネットワークの入力層に対して入力データとして与え、ニューラルネットワークの出力層の出力がラベルと同じとなるように、各パーセプトロンについての重み付けを変更しながら学習を繰り返す。例えば、フォワードプロパゲーション（Ｆｏｒｗａｒｄ−ｐｒｏｐａｇａｔｉｏｎ）と呼ばれる手法で出力した後に、バックプロパゲーション（Ｂａｃｋ−ｐｒｏｐａａｔｉｏｎ、誤差逆伝搬法とも呼ばれる。）という手法により各パーセプトロンの出力の誤差を小さくするように重み付け値を調整することを繰り返す。
学習部２１５は、このようにして、教師データの特徴を学習し、入力から結果を推定するための学習モデルを帰納的に獲得する。 The learning unit 215 generates teacher data by combining the three feature quantities acquired in this way with the corresponding labels. Then, the learning unit 215 uses the teacher data to perform, for example, supervised machine learning. In this case, the learning unit 215 performs machine learning by, for example, a neural network configured by combining perceptrons. Specifically, the features included in the teacher data are given to the input layer of the neural network as input data, and the weighting for each perceptron is changed so that the output of the output layer of the neural network is the same as the label. Repeat learning while doing. For example, after outputting by a method called forward propagation, a weighted value is used to reduce the output error of each perceptron by a technique called backpropagation (also called backpropagation). Repeat adjusting.
In this way, the learning unit 215 inductively acquires a learning model for learning the characteristics of the teacher data and estimating the result from the input.

なお、機械学習の手法は必ずしも限定されず、例えば、一般的な全結合層のみのニューラルネットワークを用いてもよいし、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）等の再帰型ニューラルネットワークを用いてもよい。 The machine learning method is not necessarily limited, and for example, a general neural network containing only fully connected layers may be used, or a recurrent neural network such as RNN (Recurrent Neural Network) may be used.

そして、学習部２１５は、機械学習を終了する所定の条件が満たされると、構築した学習モデルを学習モデル記憶部２５２に記憶させる。機械学習を終了する所定の条件は、任意に設定することができるが、例えば、出力とラベルの誤差が所定の基準以下となることや、重み付けの調整の繰り返し回数が所定回数に達したことや、機械学習を開始してから所定時間が経過したこと等を所定の条件とすることができる。なお、学習モデルを構築するとは、新たに学習モデルを作成することのみならず、既存の学習モデルを新たな教師データにより更新することも含むものとする。 Then, when the predetermined condition for ending the machine learning is satisfied, the learning unit 215 stores the constructed learning model in the learning model storage unit 252. Predetermined conditions for terminating machine learning can be set arbitrarily, but for example, the error between the output and the label is less than or equal to the predetermined reference, and the number of repetitions of weighting adjustment reaches the predetermined number. , The predetermined condition can be that a predetermined time has passed since the start of machine learning. It should be noted that constructing a learning model includes not only creating a new learning model but also updating an existing learning model with new teacher data.

場面検出部２１６は、特徴量抽出部２１４が抽出した３つの特徴量と、学習部２１５が構築して学習モデル記憶部２５２に記憶させた学習モデルとに基づいて、所定の場面（ここでは、切開場面）を検出する。ここで、再生制御対象とする動画データ内の各画像データの３つの特徴量については、上述したように特徴量抽出部２１４から入力されることにより取得される。 The scene detection unit 216 is based on three feature quantities extracted by the feature quantity extraction unit 214 and a learning model constructed by the learning unit 215 and stored in the learning model storage unit 252, and a predetermined scene (here, here). Incision scene) is detected. Here, the three feature amounts of each image data in the moving image data to be reproduced and controlled are acquired by being input from the feature amount extraction unit 214 as described above.

場面検出部２１６は、このようにして取得した３つの特徴量を、学習モデルの入力層に対して入力データとして与え、ニューラルネットワークの出力層の出力に基づいて所定の場面（ここでは、切開場面）を検出する。例えば、場面検出部２１６は、出力層の出力が正解を示す情報（例えば、値「１」又は「所定の閾値以上の１に近い値」）であれば、その画像データは、所定の場面に対応する画像データであるとして検出する。
一方で、場面検出部２１６は、出力層の出力が不正解を示す情報（例えば、値「０」又は「所定の閾値未満の０に近い値」）であれば、その画像データは、所定の場面に対応する画像データとしては検出しない。すなわち、他の場面に対応する画像データとして検出する。 The scene detection unit 216 gives the three feature quantities acquired in this way as input data to the input layer of the learning model, and determines a predetermined scene (here, an incision scene) based on the output of the output layer of the neural network. ) Is detected. For example, if the output of the output layer is information indicating a correct answer (for example, a value "1" or "a value close to 1 above a predetermined threshold value"), the scene detection unit 216 puts the image data in a predetermined scene. Detect as corresponding image data.
On the other hand, if the output of the output layer is information indicating an incorrect answer (for example, a value "0" or "a value close to 0 below a predetermined threshold value"), the scene detection unit 216 can obtain the predetermined image data. It is not detected as image data corresponding to the scene. That is, it is detected as image data corresponding to other scenes.

そして、場面検出部２１６は、動画データ内の全ての画像データに対して、この検出する処理を行うと共に、検出した所定の場面に対応する画像データが何れの画像データであるかを示す情報を動画データに追加する。また、場面検出部２１６は、このように情報を追加した動画データを再生制御部２１７に対して出力すると共に、動画データ記憶部２５１に記憶させる。 Then, the scene detection unit 216 performs this detection process on all the image data in the moving image data, and also provides information indicating which image data is the image data corresponding to the detected predetermined scene. Add to video data. Further, the scene detection unit 216 outputs the moving image data to which the information is added in this way to the reproduction control unit 217 and stores the moving image data storage unit 251.

再生制御部２１７は、場面検出部２１６が情報を追加した動画データの再生において、場面検出部２１６が追加した情報に基づいて、再生する画像データが所定の場面に対応する画像データであるか否かを判定し、判定結果に基づいて再生に関する制御を行う。具体的に、再生制御部２１７は、動画データに含まれる、複数の画像データを連続的に再生する場合に、所定の場面に対応する画像データの再生の態様（以下、「第１の態様」と称する。）と、それ以外の画像データ（すなわち、他の場面に対応する画像データ）の再生の態様（以下、「第２の態様」と称する。）と、を異ならせる。 In the reproduction of the moving image data to which the scene detection unit 216 has added information, the reproduction control unit 217 determines whether or not the image data to be reproduced is the image data corresponding to a predetermined scene based on the information added by the scene detection unit 216. Is determined, and control related to reproduction is performed based on the determination result. Specifically, when the reproduction control unit 217 continuously reproduces a plurality of image data included in the moving image data, the reproduction mode of the image data corresponding to a predetermined scene (hereinafter, "first aspect"). The mode of reproduction of other image data (that is, image data corresponding to other scenes) (hereinafter referred to as "second mode") is different from that of the other mode.

前提として、所定の場面は、例えば、閲覧者が閲覧の目的とする場面であるので、他の場面よりも見やすい態様でユーザに閲覧させることが望ましい。
そこで、再生制御部２１７は、例えば、第１の態様での再生速度を、第２の態様での再生速度よりも遅くする。例えば、第１の態様での再生速度を、撮影時のフレームレートに沿った等速としたり、それよりも遅い再生速度（いわゆる、スロー再生）としたりする。一方で、第２の態様での再生速度を、撮影時のフレームレートに沿った等速よりも早い再生速度（いわゆる、早送り）とする。これにより、所定の場面を、他の場面よりもじっくりと閲覧者に閲覧させることができる。 As a premise, since the predetermined scene is, for example, a scene intended by the viewer for viewing, it is desirable to allow the user to view the scene in a manner that is easier to see than other scenes.
Therefore, the reproduction control unit 217 makes the reproduction speed in the first aspect slower than the reproduction speed in the second aspect, for example. For example, the reproduction speed in the first aspect may be a constant speed along the frame rate at the time of shooting, or a reproduction speed slower than that (so-called slow reproduction). On the other hand, the reproduction speed in the second aspect is set to a reproduction speed faster than the constant speed along the frame rate at the time of shooting (so-called fast forward). This makes it possible for the viewer to view a predetermined scene more carefully than other scenes.

他にも、再生制御部２１７は、例えば、第１の態様で再生する場合に、所定の場面に対応する画像データの一部の領域を拡大して再生する。一方で、第２の態様で再生する場合に、特に拡大等の処理は行わない。これにより、所定の場面を、他の場面よりも事細かにユーザに閲覧させることができる。この場合に、拡大する領域としては、例えば、領域検出部２１２が検出した対象領域としたり、注視点検出部２１３が検出した注視点の周辺の領域としたり、動作部位の周辺の領域としたり、作業者が使用する道具（ここでは、メス）の周辺の領域としたりすることができる。 In addition, the reproduction control unit 217 expands and reproduces a part of the image data corresponding to a predetermined scene, for example, when reproducing in the first aspect. On the other hand, when the reproduction is performed in the second aspect, no particular processing such as enlargement is performed. As a result, the user can view a predetermined scene in more detail than other scenes. In this case, the area to be expanded may be, for example, a target area detected by the area detection unit 212, an area around the gazing point detected by the gazing point detection unit 213, or an area around the moving part. It can be the area around the tool (here, the scalpel) used by the worker.

なお、再生制御部２１７は、このように再生速度を異ならせることと、拡大を行うことの双方を組み合わせて行うようにしてもよい。また、他にも、例えば、第１の態様として、所定の場面であることを示すテキストを表示することや、所定の場面であることを示す音を出力するようにしてもよい。更に、他にも、例えば、第１の態様として、所定の場面に対応する、説明等のテキスト（例えば、切開場面において、切開の方法について解説するテキスト等）を表示するようにしてもよい。 The reproduction control unit 217 may perform both the different reproduction speeds and the enlargement in this way. In addition, for example, as the first aspect, a text indicating that the scene is a predetermined scene may be displayed, or a sound indicating that the scene is a predetermined scene may be output. Further, for example, as the first aspect, a text such as an explanation corresponding to a predetermined scene (for example, a text explaining the method of incision in the incision scene) may be displayed.

図８は、このような再生制御部２１７による再生の制御を伴う、再生時のユーザインタフェースの一例について示す模式図である。図８に示すように、再生画面５１は、再生領域５２、シークバー５３、スライダー５４、所定の場面箇所５５、及び操作用アイコン群５６を含む。 FIG. 8 is a schematic view showing an example of a user interface at the time of reproduction, which is accompanied by reproduction control by such a reproduction control unit 217. As shown in FIG. 8, the reproduction screen 51 includes a reproduction area 52, a seek bar 53, a slider 54, a predetermined scene portion 55, and an operation icon group 56.

再生領域５２は、再生制御対象とする動画の再生画像が表示される。シークバー５３は、閲覧者の操作に応じて動画の再生位置を調整するために利用される。スライダー５４は、現在の再生箇所を示す。所定の場面箇所５５は、シークバー５３において、検出された所定の場面に対応する箇所を示す。図中では、所定の場面箇所５５をハッチングで表す。操作用アイコン群５６は、いわゆる停止ボタンや、いわゆる早送りボタンや、いわゆる巻き戻しボタンに対応するアイコンである。 In the playback area 52, a playback image of a moving image to be controlled for playback is displayed. The seek bar 53 is used to adjust the playback position of the moving image according to the operation of the viewer. The slider 54 indicates the current playback location. The predetermined scene portion 55 indicates a portion corresponding to the detected predetermined scene in the seek bar 53. In the figure, a predetermined scene portion 55 is represented by hatching. The operation icon group 56 is an icon corresponding to a so-called stop button, a so-called fast-forward button, and a so-called rewind button.

閲覧者は、再生開始指示操作のみを行えば、再生領域５２を参照することによって、所定の場面か否かに応じて異なる態様で再生される動画の再生画像を閲覧することができる。また、所定の場面箇所５５が表示されていることから、閲覧者は、スライダー５４や操作用アイコン群５６を操作する場合に、所定の場面に容易に到達することができる。そのため、閲覧者は、従来のように、所定の場面に到達するために煩雑な操作を行うような必要はなくなる。すなわち、本実施形態によれば、再生に関する制御によって、より適切に閲覧者であるユーザの閲覧を支援することができる。 The viewer can browse the reproduced image of the moving image reproduced in different modes depending on whether or not it is a predetermined scene by referring to the reproduction area 52 only by performing the reproduction start instruction operation. Further, since the predetermined scene portion 55 is displayed, the viewer can easily reach the predetermined scene when operating the slider 54 or the operation icon group 56. Therefore, the viewer does not need to perform complicated operations to reach a predetermined scene as in the conventional case. That is, according to the present embodiment, it is possible to more appropriately support the browsing of the user who is the viewer by controlling the reproduction.

［撮影処理］
次に、図９を参照して、ウェアラブルカメラ１０が実行する撮影処理の流れについて説明する。図９は、ウェアラブルカメラ１０が実行する撮影処理の流れを説明するフローチャートである。撮影処理は、作業を開始する作業者等のユーザからの、撮影開始指示操作に伴い実行される。 [Shooting process]
Next, with reference to FIG. 9, the flow of the photographing process executed by the wearable camera 10 will be described. FIG. 9 is a flowchart illustrating a flow of shooting processing executed by the wearable camera 10. The shooting process is executed in response to a shooting start instruction operation from a user such as a worker who starts the work.

ステップＳ１１において、視野画像撮影部１１１は、撮像部１８を用いて、所定の周期（すなわち、所定のフレームレート）で視野画像を撮影する。
ステップＳ１２において、注視点計測部１１２は、アイトラッキング部１９を用いて、視野画像撮影部１１１による撮影と同様の所定の周期（すなわち、所定のフレームレート）で注視点の位置に対応する二次元座標の座標値を算出する。 In step S11, the field of view image capturing unit 111 uses the imaging unit 18 to capture a field of view image at a predetermined cycle (that is, a predetermined frame rate).
In step S12, the gazing point measuring unit 112 uses the eye tracking unit 19 to perform two dimensions corresponding to the gazing point position at a predetermined period (that is, a predetermined frame rate) similar to that taken by the visual field image capturing unit 111. Calculate the coordinate value of the coordinates.

ステップＳ１３において、画像データ生成部１１３は、視野画像撮影部１１１から入力された視野画像と、注視点計測部１１２から入力された注視点の位置に対応する座標値とを、フレーム単位で対応付けする（すなわち、合成する）ことにより、注視点の情報を含んだ画像データを生成する。 In step S13, the image data generation unit 113 associates the field image input from the field image capturing unit 111 with the coordinate values corresponding to the position of the gazing point input from the gazing point measuring unit 112 on a frame-by-frame basis. By doing (that is, synthesizing), image data including information on the gazing point is generated.

ステップＳ１４において、画像データ生成部１１３は、作業を終了した作業者等のユーザからの、撮影終了指示操作があったか否かを判定する。撮影終了指示操作があった場合は、ステップＳ１４においてＹｅｓと判定され、処理はステップＳ１５に進む。一方で、撮影終了指示操作がない場合は、ステップＳ１４においてＮｏと判定され、処理はステップＳ１１から再度繰り返される。 In step S14, the image data generation unit 113 determines whether or not there has been a shooting end instruction operation from a user such as an operator who has completed the work. If there is a shooting end instruction operation, it is determined as Yes in step S14, and the process proceeds to step S15. On the other hand, if there is no shooting end instruction operation, it is determined as No in step S14, and the process is repeated again from step S11.

ステップＳ１５において、画像データ送信部１１４は、画像データ生成部１１３により生成された、時間的に連続した複数の画像データを、動画データの形式に変換して再生制御装置２０に対して送信する。これにより、本処理は終了する。 In step S15, the image data transmission unit 114 converts a plurality of time-continuous image data generated by the image data generation unit 113 into a moving image data format and transmits the image data to the reproduction control device 20. This ends this process.

［学習処理］
次に、図１０を参照して、再生制御装置２０が実行する学習処理の流れについて説明する。図１０は、再生制御装置２０が実行する学習処理の流れを説明するフローチャートである。学習処理は、閲覧者等のユーザからの、学習開始指示操作に伴い実行される。 [Learning process]
Next, the flow of the learning process executed by the reproduction control device 20 will be described with reference to FIG. FIG. 10 is a flowchart illustrating a flow of learning processing executed by the reproduction control device 20. The learning process is executed in accordance with a learning start instruction operation from a user such as a viewer.

ステップＳ２１において、画像データ取得部２１１は、ウェアラブルカメラ１０から複数の画像データを変換した動画データを、受信することにより取得する。
ステップＳ２２において、動画データ内の各視野画像（すなわち、各フレーム）のそれぞれに対して、画像認識を行うことにより、作業者の動作部位（ここでは、作業者の手）が含まれる領域である対象領域を検出する。 In step S21, the image data acquisition unit 211 acquires the moving image data obtained by converting a plurality of image data from the wearable camera 10 by receiving the moving image data.
In step S22, by performing image recognition for each of the visual field images (that is, each frame) in the moving image data, it is an area including the moving part of the worker (here, the worker's hand). Detect the target area.

ステップＳ２３において、注視点検出部２１３は、動画データ内の各視野画像（すなわち、各フレーム）のそれぞれから、画像データ生成部１１３が画像データ生成時に画像データに含ませた、作業者の注視点の情報（ここでは、注視点の位置を示す座標値）を検出する。
ステップＳ２４において、特徴量抽出部２１４は、領域検出部２１２及び注視点検出部２１３の検出結果や動画データの間での変化等に基づいて、動画データ内の各動画データ（すなわち、各フレーム）それぞれの特徴量を抽出する。 In step S23, the gazing point detection unit 213 includes the gazing point of the operator from each of the field images (that is, each frame) in the moving image data, which the image data generating unit 113 includes in the image data at the time of image data generation. Information (here, coordinate values indicating the position of the gazing point) is detected.
In step S24, the feature amount extraction unit 214 sets each moving image data (that is, each frame) in the moving image data based on the detection results of the area detecting unit 212 and the gazing point detecting unit 213, changes between the moving image data, and the like. Each feature is extracted.

ステップＳ２５において、学習部２１５は、閲覧者の操作に基づいて生成された所定の場面（ここでは、切開場面）を示すラベルを取得する。
ステップＳ２６において、学習部２１５は、特徴量と、対応するラベルとを組にして教師データを生成し、この教師データを用いて機械学習を行う。 In step S25, the learning unit 215 acquires a label indicating a predetermined scene (here, an incision scene) generated based on the operation of the viewer.
In step S26, the learning unit 215 generates teacher data by combining the feature amount and the corresponding label, and performs machine learning using the teacher data.

ステップＳ２７において、学習部２１５は、機械学習を終了する所定の条件が満たされたか否かを判定する。なお、この機械学習を終了する所定の条件の具体的な内容については、学習部２１５の説明において上述した通りである。機械学習を終了する所定の条件が満たされた場合は、ステップＳ２７においてＹｅｓと判定され、処理はステップＳ２８に進む。一方で、機械学習を終了する所定の条件が満たされていない場合は、ステップＳ２７においてＮｏと判定され、処理はステップＳ２６を再度繰り返す。 In step S27, the learning unit 215 determines whether or not a predetermined condition for terminating machine learning is satisfied. The specific contents of the predetermined condition for terminating the machine learning are as described above in the explanation of the learning unit 215. When the predetermined condition for ending the machine learning is satisfied, it is determined as Yes in step S27, and the process proceeds to step S28. On the other hand, if the predetermined condition for ending the machine learning is not satisfied, it is determined as No in step S27, and the process repeats step S26 again.

ステップＳ２８において、学習部２１５は、機械学習の結果に基づいて、学習モデルを構築（学習モデルの更新を含む）する。これにより、本処理は終了する。 In step S28, the learning unit 215 builds a learning model (including updating the learning model) based on the result of machine learning. This ends this process.

［再生制御処理］
次に、図１１を参照して、再生制御装置２０が実行する再生制御処理の流れについて説明する。図１１は、再生制御装置２０が実行する再生制御処理の流れを説明するフローチャートである。再生制御処理は、閲覧者等のユーザからの、再生開始指示操作に伴い実行される。 [Playback control processing]
Next, with reference to FIG. 11, the flow of the reproduction control process executed by the reproduction control device 20 will be described. FIG. 11 is a flowchart illustrating the flow of the reproduction control process executed by the reproduction control device 20. The playback control process is executed in response to a playback start instruction operation from a user such as a viewer.

処理対象とする動画データが学習対象とする動画データから動画再生制御の対象とする動画データに代わる以外は、ステップＳ３１からステップＳ３４までの処理内容と、ステップＳ２１からステップＳ２４までの処理内容は同じであるので、重複する説明を省略する。 The processing contents from step S31 to step S34 and the processing contents from step S21 to step S24 are the same except that the moving image data to be processed replaces the moving image data to be learned from the moving image data to be controlled to play back the moving image. Therefore, the duplicate description will be omitted.

ステップＳ３５において、場面検出部２１６は、特徴量抽出部２１４が抽出した特徴量と、学習部２１５が構築した学習モデルとに基づいて、所定の場面（ここでは、切開場面）を検出する。そして、動画データ内の全ての画像データに対して、この検出する処理を行う。
ステップＳ３６において、場面検出部２１６は、検出した所定の場面に対応する画像データが何れの画像データであるかを示す情報を動画データに追加する。 In step S35, the scene detection unit 216 detects a predetermined scene (here, an incision scene) based on the feature amount extracted by the feature amount extraction unit 214 and the learning model constructed by the learning unit 215. Then, this detection process is performed on all the image data in the moving image data.
In step S36, the scene detection unit 216 adds information indicating which image data corresponds to the detected predetermined scene to the moving image data.

ステップＳ３７において、再生制御部２１７は、場面検出部２１６が所定の場面に対応する画像データが何れの画像データであるかを示す情報を追加した動画データを再生する。なお、ステップＳ３６とステップＳ３７は連続して実行されてもよいが、ステップＳ３６の終了後、閲覧者等のユーザからの、再生開始指示操作に伴いステップＳ３７が実行されてもよい。 In step S37, the reproduction control unit 217 reproduces the moving image data to which the scene detection unit 216 adds information indicating which image data corresponds to the predetermined scene. Although step S36 and step S37 may be executed continuously, step S37 may be executed after the end of step S36 in accordance with a playback start instruction operation from a user such as a viewer.

ステップＳ３８において、再生制御部２１７は、再生する動画データ内の画像データが所定の場面に対応する画像データであるか否かを判定する。所定の場面に対応する画像データである場合は、ステップＳ３８においてＹｅｓと判定され、処理はステップＳ３９に進む。一方で、所定の場面に対応する画像データでない場合（すなわち、他の場面に対応する画像データである場合）は、ステップＳ３８においてＮｏと判定され、処理はステップＳ４０に進む。 In step S38, the reproduction control unit 217 determines whether or not the image data in the moving image data to be reproduced is the image data corresponding to a predetermined scene. If the image data corresponds to a predetermined scene, it is determined to be Yes in step S38, and the process proceeds to step S39. On the other hand, if the image data does not correspond to a predetermined scene (that is, the image data corresponds to another scene), No is determined in step S38, and the process proceeds to step S40.

ステップＳ３９において、再生制御部２１７は、所定の場面に対応する画像データを第１の態様で再生する。
ステップＳ４０において、再生制御部２１７は、他の場面に対応する画像データを第２の態様で再生する。 In step S39, the reproduction control unit 217 reproduces the image data corresponding to the predetermined scene in the first aspect.
In step S40, the reproduction control unit 217 reproduces the image data corresponding to another scene in the second aspect.

ステップＳ４１において、２１８は、動画を最後まで再生したことにより動画が終了したか否かを判定する。動画が終了した場合は、ステップＳ４１においてＹｅｓと判定され、本処理は終了する。一方で、動画が終了していない場合は、ステップＳ４１においてＮｏと判定され、処理はステップＳ３８から再度繰り返される。 In step S41, 218 determines whether or not the moving image has ended by playing the moving image to the end. When the moving image is finished, it is determined as Yes in step S41, and this process is finished. On the other hand, if the moving image is not finished, it is determined as No in step S41, and the process is repeated again from step S38.

以上説明した、撮影処理、学習処理、及び再生制御処理によれば、再生に関する制御によって、より適切にユーザの閲覧を支援することができる。
例えば、これらの処理によれば、長時間となりがちな出術の動画から、動画の閲覧の目的となる切開場面を検出し、この検出した切開場面を、他の場面（例えば、準備場面や片付け画面）よりも、閲覧者であるユーザにとってより見やすい態様で閲覧できるようにする。これにより、閲覧者であるユーザは、再生制御のための煩雑な操作を行うことなく、容易に切開場面を閲覧することができる。また、画像データ内の自転車や人物といった、画像認識によって識別可能な汎用的な手がかりに基づいて単純に機械学習を繰り返すような場合よりも、所定の場面を検出するために適切な注視点等の特徴量に基づいて、より短期間な機械学習で所定の場面を検出することができる。 According to the shooting process, the learning process, and the reproduction control process described above, it is possible to more appropriately support the user's browsing by controlling the reproduction.
For example, according to these processes, an incision scene to be viewed in a video is detected from a video of an operation that tends to take a long time, and the detected incision scene is used as another scene (for example, a preparation scene or tidying up). Make it possible to browse in a manner that is easier for the user who is the viewer than the screen). As a result, the user who is a viewer can easily browse the incision scene without performing a complicated operation for playback control. In addition, rather than simply repeating machine learning based on general-purpose clues that can be identified by image recognition, such as a bicycle or a person in image data, a gaze point that is appropriate for detecting a predetermined scene, etc. Based on the feature quantity, a predetermined scene can be detected by machine learning in a shorter period of time.

［変形例］
以上、本発明の実施形態について説明したが、この実施形態は例示に過ぎず、本発明の技術的範囲を限定するものではない。本発明は、本発明の要旨を逸脱しない範囲で、その他の様々な実施形態を取ることが可能である共に、省略及び置換等種々の変形を行うことができる。この場合に、これら実施形態及びその変形は、本明細書等に記載された発明の範囲及び要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。
一例として、以上説明した本発明の実施形態を、以下の変形例のようにして変形してもよい。 [Modification example]
Although the embodiment of the present invention has been described above, this embodiment is merely an example and does not limit the technical scope of the present invention. The present invention can take various other embodiments without departing from the gist of the present invention, and can be modified in various ways such as omission and substitution. In this case, these embodiments and modifications thereof are included in the scope and gist of the invention described in the present specification and the like, and are included in the scope of the invention described in the claims and the equivalent scope thereof.
As an example, the embodiment of the present invention described above may be modified as in the following modification.

＜第１の変形例＞
上述した実施形態では、手術における所定の場面（ここでは、切開場面）の特徴を適切に表していると考えられる３つの特徴量を用いて、学習モデルの構築及び所定の場面の検出を行っていた。これに限らず、検出しようとする所定の場面がどのような場面かに応じて、他の特徴量を追加して用いるようにしてもよいし、他の特徴量を代わりに用いるようにしてもよい。 <First modification>
In the above-described embodiment, a learning model is constructed and a predetermined scene is detected by using three feature quantities that are considered to appropriately represent the characteristics of a predetermined scene (here, an incision scene) in surgery. It was. Not limited to this, other feature quantities may be added and used depending on what kind of scene the predetermined scene to be detected is, or other feature quantities may be used instead. Good.

例えば、上述した実施形態では、作業者の手を動作部位としていたが、指や足といった作業者の他の部位を動作部位として特徴量を抽出して、これを用いるようにしてもよい。他にも、作業者の用いる道具（例えば、メス）等を動作部位として特徴量を抽出して、これを用いるようにしてもよい。
他にも、例えば、作業が行われる場所の周辺環境や、患部の形状や色の変遷等を考慮するために、各画素が示す色情報や明度情報の変化から特徴量を抽出して、これを用いるようにしてもよい。 For example, in the above-described embodiment, the worker's hand is used as the moving part, but the feature amount may be extracted and used by using other parts of the worker such as fingers and feet as the moving part. In addition, a tool (for example, a scalpel) used by an operator may be used as a moving part to extract a feature amount and use the feature amount.
In addition, for example, in order to consider the surrounding environment of the place where the work is performed, the shape and color transition of the affected area, etc., the feature amount is extracted from the change of the color information and the brightness information indicated by each pixel. May be used.

他にも、例えば、協働作業の場面をより精度高く検出するために、動作部位の数（例えば、手の数）を特徴量として抽出して、これを用いるようにしてもよい。協働作業においては、検出される手の数が３つ以上になる可能性が高いと考えられる。そのため、手のような動作部位の数も特徴量とすることで、より精度高く協働作業を検出することができる。また、協働作業を行う作業者それぞれにウェアラブルカメラ１０を装着し、それぞれのウェアラブルカメラ１０が撮影した各作業者の視野画像の画像データそれぞれから特徴量を抽出して、これを用いるようにしてもよい。すなわち、複数の視野画像から特徴量を抽出して、これを用いるようにしてもよい。例えば、協働作業においては、各作業者の注視点が近傍になる可能性が高いと考えられる。そのため、複数の視野画像から特徴量を抽出して、これを用いることで、より精度高く協働作業を検出することができる。また、この場合に、検出した場面に応じて各作業者の視野画像の何れを再生するべきかについて機械学習（又は設定）しておき、各作業者の視野画像の何れを再生するかを機械学習結果（又は設定内容）に基づいて切り替えるようにしてもよい。 In addition, for example, in order to detect the scene of collaborative work with higher accuracy, the number of moving parts (for example, the number of hands) may be extracted as a feature amount and used. In collaborative work, it is highly likely that the number of detected hands will be three or more. Therefore, by setting the number of moving parts such as hands as a feature amount, it is possible to detect collaborative work with higher accuracy. In addition, a wearable camera 10 is attached to each worker who performs collaborative work, and a feature amount is extracted from each image data of the visual field image of each worker taken by each wearable camera 10 and used. May be good. That is, the feature amount may be extracted from a plurality of field images and used. For example, in collaborative work, it is highly likely that the gaze points of each worker will be in the vicinity. Therefore, by extracting a feature amount from a plurality of visual field images and using the feature amount, it is possible to detect collaborative work with higher accuracy. Further, in this case, machine learning (or setting) is performed on which of the visual field images of each worker should be reproduced according to the detected scene, and which of the visual field images of each worker should be reproduced is machined. It may be switched based on the learning result (or the setting content).

他にも、例えば、所定の場面として検出したい場面が、複数種類（例えば、切開場面と、縫合場面）存在する場合は、それぞれの場面に応じた複数種類のラベル付けを行うようにすればよい。この場合に、複数種類の場面が所定の順番で行われることが分かっているのであれば、その所定の順番も特徴量の１つとして、これを用いるようにしてもよい。例えば、切開場面が行われた後に、縫合場面が行われることは手術計画から分かるので、この順番に基づいて、各時間帯で行わる可能性が高い作業の種類を、特徴量の１つとして用いるようにしてもよい。或いは、学習モデルの出力において、各場面それぞれについての尤度の値が出力されるような場合に、各時間帯で行わる可能性が高い作業の種類について尤度が高くなるように重み付けを行うようにしてもよい。すなわち、場面が所定の順番を示す手術計画のような情報を、特徴量としたり、出力される尤度の重み付けに利用したりしてもよい。 In addition, for example, when there are a plurality of types of scenes to be detected as predetermined scenes (for example, an incision scene and a suture scene), a plurality of types of labeling may be performed according to each scene. .. In this case, if it is known that a plurality of types of scenes are performed in a predetermined order, the predetermined order may be used as one of the feature quantities. For example, since it can be seen from the surgical plan that the suturing scene is performed after the incision scene is performed, the type of work that is likely to be performed in each time zone is set as one of the features based on this order. You may use it. Alternatively, when the likelihood value for each scene is output in the output of the learning model, weighting is performed so that the likelihood is high for the types of work that are likely to be performed in each time zone. You may do so. That is, information such as an operation plan in which the scenes indicate a predetermined order may be used as a feature amount or for weighting the output likelihood.

＜第２の変形例＞
ユーザが、抽出した各特徴量に任意の拡大倍率の重み付けを行って、学習モデルの構築及び所定の場面の検出を行えるようにしてもよい。例えば、抽出した特徴量それそれに対応したスライダー等の、重み付けの程度を調整するユーザインタフェースを用意する。そして、このユーザインタフェースを利用したユーザの操作に応じて、何れの特徴量にどの程度の重み付けを行うのかを設定する。そして、各特徴量に、設定に応じた重み付けを行って、学習モデルの構築及び所定の場面の検出を行う。重み付けを行うことができる特徴量は、例えば、上述した３つの特徴量以外にも、検出した動作部位の存在の有無、検出した動作部位のサイズ、検出した各特徴量の画面中心からの距離、検出した動作部位と注視点の距離、等であってよい。 <Second modification>
The user may be able to construct a learning model and detect a predetermined scene by weighting each extracted feature amount with an arbitrary magnification. For example, a user interface for adjusting the degree of weighting, such as an extracted feature amount and a slider corresponding to the extracted feature amount, is prepared. Then, according to the operation of the user using this user interface, which feature amount is weighted and how much is set. Then, each feature amount is weighted according to the setting to construct a learning model and detect a predetermined scene. In addition to the above-mentioned three feature amounts, the feature amounts that can be weighted include, for example, the presence / absence of the detected moving part, the size of the detected moving part, and the distance of each detected feature amount from the screen center. It may be the distance between the detected motion site and the gazing point, and the like.

＜第３の変形例＞
上述の実施形態では、ウェアラブルカメラ１０により撮影処理を行い、動画データを生成することを想定していた。これに限らず、他の装置により撮影処理を行い、動画データを生成するようにしてもよい。例えば、内視鏡等の医療機器により撮影処理を行い、動画データを生成するようにしてもよい。すなわち、本実施形態での再生制御の対象とする動画データを、ウェアラブルカメラ１０以外の装置による撮影で生成された動画データとしてもよい。他にも、例えば、ウェアラブルカメラ１０（或いは、撮影処理を行う他の装置）と、再生制御装置２０とを一体にして実現するようにしてもよい。 <Third modification example>
In the above-described embodiment, it is assumed that the wearable camera 10 performs a shooting process to generate moving image data. Not limited to this, the shooting process may be performed by another device to generate moving image data. For example, the imaging process may be performed by a medical device such as an endoscope to generate moving image data. That is, the moving image data targeted for the reproduction control in the present embodiment may be the moving image data generated by shooting with a device other than the wearable camera 10. Alternatively, for example, the wearable camera 10 (or another device that performs shooting processing) and the playback control device 20 may be integrated.

以上のように、本実施形態に係る再生制御装置２０は、画像データ取得部２１１と、領域検出部２１２と、特徴量抽出部２１４と、場面検出部２１６と、再生制御部２１７と、を備える。
画像データ取得部２１１は、時間的に連続した複数の画像データを取得する。
領域検出部２１２は、複数の画像データそれぞれの画像内から所定の対象を含んだ対象領域を検出する。
特徴量抽出部２１４は、複数の画像データ間の画像の変化と、当該変化している領域が対象領域であるか否かと、に基づいて複数の画像データそれぞれから特徴量を抽出する。
場面検出部２１６は、特徴量を学習モデルに入力することにより、所定の場面に対応する画像データを検出する。
再生制御部２１７は、場面検出部２１６が検出した所定の場面に対応する画像データを示す情報に基づいて、複数の画像データの再生を制御する。
このように、再生制御装置２０は、動画内の複数の画像データから抽出した特徴量と、学習モデルとに基づいて、所定の場面を検出すると共に、所定の場面であるか否かに基づいて、複数の画像データからなる動画の再生を制御することができる。
従って、再生制御装置２０によれば、再生に関する制御によって、より適切にユーザの閲覧を支援することができる。 As described above, the reproduction control device 20 according to the present embodiment includes an image data acquisition unit 211, an area detection unit 212, a feature amount extraction unit 214, a scene detection unit 216, and a reproduction control unit 217. ..
The image data acquisition unit 211 acquires a plurality of image data that are continuous in time.
The area detection unit 212 detects a target area including a predetermined target from the images of each of the plurality of image data.
The feature amount extraction unit 214 extracts the feature amount from each of the plurality of image data based on the change of the image between the plurality of image data and whether or not the changing area is the target area.
The scene detection unit 216 detects the image data corresponding to a predetermined scene by inputting the feature amount into the learning model.
The reproduction control unit 217 controls the reproduction of a plurality of image data based on the information indicating the image data corresponding to the predetermined scene detected by the scene detection unit 216.
As described above, the reproduction control device 20 detects a predetermined scene based on the feature amount extracted from the plurality of image data in the moving image and the learning model, and based on whether or not the scene is a predetermined scene. , It is possible to control the playback of a moving image composed of a plurality of image data.
Therefore, according to the reproduction control device 20, it is possible to more appropriately support the user's browsing by controlling the reproduction.

再生制御部２１７は、複数の画像データを連続的に再生する場合に、場面検出部２１６が検出した所定の場面に対応する画像データの再生の態様と、それ以外の画像データの再生の態様とを異ならせる。
これにより、所定の場面を、他の場面よりも見やすい態様でユーザに閲覧させることができる。 When the reproduction control unit 217 continuously reproduces a plurality of image data, the reproduction mode of the image data corresponding to the predetermined scene detected by the scene detection unit 216 and the reproduction mode of the other image data. To make it different.
As a result, the user can view a predetermined scene in a manner that is easier to see than other scenes.

再生制御部２１７は、複数の画像データを連続的に再生する場合に、場面検出部２１６が検出した所定の場面に対応する画像データの再生速度を、それ以外の画像データの再生速度よりも遅くする。
これにより、所定の場面を、他の場面よりもじっくりとユーザに閲覧させることができる。 When the reproduction control unit 217 continuously reproduces a plurality of image data, the reproduction speed of the image data corresponding to the predetermined scene detected by the scene detection unit 216 is slower than the reproduction speed of the other image data. To do.
As a result, the user can browse a predetermined scene more carefully than other scenes.

再生制御部２１７は、複数の画像データを連続的に再生する場合に、場面検出部２１６が検出した所定の場面に対応する画像データの一部の領域を拡大して再生する。
これにより、所定の場面を、他の場面よりも事細かにユーザに閲覧させることができる。 When a plurality of image data are continuously reproduced, the reproduction control unit 217 expands and reproduces a part of the image data corresponding to a predetermined scene detected by the scene detection unit 216.
As a result, the user can view a predetermined scene in more detail than other scenes.

所定の場面は、連続的に再生される複数の画像データの閲覧の目的となる場面であって、複数のユーザによる協働作業が行われている場面である。
所定の対象は、協働作業を行う複数のユーザそれぞれの部位である。
複数の画像データは、協働作業を行う何れかのユーザの視野に相当する空間を撮影した画像データである。
これにより、閲覧の目的となる協働作業が行われている際の、作業者を行うユーザの視野に相当する画像を、画像を閲覧するユーザに閲覧させることができる。 A predetermined scene is a scene for viewing a plurality of image data to be continuously reproduced, and is a scene in which a plurality of users collaborate with each other.
A predetermined target is a part of each of a plurality of users who perform collaborative work.
The plurality of image data are image data obtained by photographing a space corresponding to the field of view of any user who performs collaborative work.
As a result, the user who browses the image can browse the image corresponding to the field of view of the user who performs the worker when the collaborative work which is the purpose of browsing is being performed.

再生制御装置２０は、注視点検出部２１３をさらに備える。
注視点検出部２１３は、複数の画像データの撮影時に撮影対象を視認したユーザの注視点を検出する。
特徴量抽出部２１４は、複数の画像データ間の撮影対象を視認したユーザの注視点の変化に基づいて、複数の画像データそれぞれから特徴量をさらに抽出する。
これにより、ユーザの注視点の変化という指標も考慮して、精度高く所定の場面を検出することができる。 The reproduction control device 20 further includes a gazing point detection unit 213.
The gazing point detection unit 213 detects the gazing point of the user who visually recognizes the imaging target when capturing a plurality of image data.
The feature amount extraction unit 214 further extracts the feature amount from each of the plurality of image data based on the change in the gazing point of the user who visually recognizes the shooting target between the plurality of image data.
As a result, a predetermined scene can be detected with high accuracy in consideration of an index of a change in the user's gaze point.

再生制御装置２０は、学習部２１５をさらに備える。
学習部２１５は、特徴量を含む入力データと、所定の場面に対応する画像データを示すラベルとの組を教師データとして機械学習を行うことにより、学習モデルを構築する
これにより、動画内の複数の画像データから抽出した特徴量に基づいて、所定の場面を検出するための学習モデルを構築することができる。 The reproduction control device 20 further includes a learning unit 215.
The learning unit 215 constructs a learning model by performing machine learning using a set of input data including a feature amount and a label indicating image data corresponding to a predetermined scene as teacher data, thereby constructing a plurality of learning models in a moving image. A learning model for detecting a predetermined scene can be constructed based on the feature amount extracted from the image data of.

所定の場面は、所定の順番で行われる複数の場面である。
教師データには、所定の順番を示す情報も含まれる。
これにより、所定の順番を示す情報（例えば、手術の作業の順番を示す手術計画）に基づいた学習を行い、より精度高く所定の場面を検出することができる学習モデルを構築することができる。 A predetermined scene is a plurality of scenes performed in a predetermined order.
The teacher data also includes information indicating a predetermined order.
As a result, it is possible to construct a learning model capable of performing learning based on information indicating a predetermined order (for example, a surgical plan indicating the order of surgical operations) and detecting a predetermined scene with higher accuracy.

［ハードウェアやソフトウェアによる機能の実現］
上述した実施形態による一連の処理を実行させる機能は、ハードウェアにより実現することもできるし、ソフトウェアにより実現することもできるし、これらの組み合わせにより実現することもできる。換言すると、上述した一連の処理を実行する機能が、再生制御システムＳの何れかにおいて実現されていれば足り、この機能をどのような態様で実現するのかについては、特に限定されない。 [Realization of functions by hardware and software]
The function of executing a series of processes according to the above-described embodiment can be realized by hardware, software, or a combination thereof. In other words, it suffices if the function of executing the above-mentioned series of processes is realized in any one of the reproduction control systems S, and the mode in which this function is realized is not particularly limited.

例えば、上述した一連の処理を実行する機能を、演算処理を実行するプロセッサによって実現する場合、この演算処理を実行するプロセッサは、シングルプロセッサ、マルチプロセッサ及びマルチコアプロセッサ等の各種処理装置単体によって構成されるものの他、これら各種処理装置と、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）又はＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の処理回路とが組み合わせられたものを含む。 For example, when the function of executing the above-mentioned series of processes is realized by a processor that executes arithmetic processing, the processor that executes this arithmetic processing is composed of various processing units such as a single processor, a multiprocessor, and a multicore processor. In addition to the above, the present invention includes a combination of these various processing units and a processing circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array).

また、例えば、上述した一連の処理を実行する機能を、ソフトウェアにより実現する場合、そのソフトウェアを構成するプログラムは、ネットワーク又は記録媒体を介してコンピュータにインストールされる。この場合、コンピュータは、専用のハードウェアが組み込まれているコンピュータであってもよいし、プログラムをインストールすることで所定の機能を実行することが可能な汎用のコンピュータ（例えば、汎用のパーソナルコンピュータ等の電子機器一般）であってもよい。また、プログラムを記述するステップは、その順序に沿って時系列的に行われる処理のみを含んでいてもよいが、並列的或いは個別に実行される処理を含んでいてもよい。また、プログラムを記述するステップは、本発明の要旨を逸脱しない範囲内において、任意の順番に実行されてよい。 Further, for example, when the function of executing the above-mentioned series of processes is realized by software, the programs constituting the software are installed in the computer via a network or a recording medium. In this case, the computer may be a computer in which dedicated hardware is incorporated, or a general-purpose computer capable of performing a predetermined function by installing a program (for example, a general-purpose personal computer or the like). Electronic devices in general). Further, the step of describing the program may include only the processes performed in time series in the order thereof, but may include the processes executed in parallel or individually. Further, the steps for describing the program may be executed in any order within a range that does not deviate from the gist of the present invention.

このようなプログラムを記録した記録媒体は、コンピュータ本体とは別に配布されることによりユーザに提供されてもよく、コンピュータ本体に予め組み込まれた状態でユーザに提供されてもよい。この場合、コンピュータ本体とは別に配布される記憶媒体は、例えば、磁気ディスク（フロッピディスクを含む）、光ディスク、又は光磁気ディスク等により構成される。光ディスクは、例えば、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、或いはＢｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ（ブルーレイディスク）等により構成される。光磁気ディスクは、例えば、ＭＤ（ＭｉｎｉＤｉｓｃ）等により構成される。また、コンピュータ本体に予め組み込まれた状態でユーザに提供される記録媒体は、例えば、プログラムが記録されている図２のＲＯＭ１２、図３のＲＯＭ２２、図２の記憶部１６、或いは図３の記憶部２５に含まれるハードディスク等により構成される。 The recording medium on which such a program is recorded may be provided to the user by being distributed separately from the computer main body, or may be provided to the user in a state of being incorporated in the computer main body in advance. In this case, the storage medium distributed separately from the computer itself is composed of, for example, a magnetic disk (including a floppy disk), an optical disk, a magneto-optical disk, or the like. The optical disk is composed of, for example, a CD-ROM (Compact Disc-Read Only Memory), a DVD (Digital Versailles Disc), a Blu-ray (registered trademark) Disc (Blu-ray Disc), or the like. The magneto-optical disk is composed of, for example, MD (Mini Disc) or the like. The recording medium provided to the user in a state of being preliminarily incorporated in the computer body is, for example, the ROM 12 of FIG. 2 in which the program is recorded, the ROM 22 of FIG. 3, the storage unit 16 of FIG. 2, or the storage of FIG. It is composed of a hard disk and the like included in the unit 25.

１０ウェアラブルカメラ、２０再生制御装置、１１，２１ＣＰＵ、１２，２２ＲＯＭ、１３，２３ＲＡＭ、１４，２４通信部、１５センサ部、１６，２５記憶部、１７，２６入力部、１８撮像部、１９アイトラッキング部、２７出力部、２８ドライブ、１１１視野画像撮影部、１１２注視点計測部、１１３画像データ生成部、１１４画像データ送信部、１６１画像データ記憶部、２１１画像データ取得部、２１２領域検出部、２１３注視点検出部、２１４特徴量抽出部、２１５学習部、２１６場面検出部、２１７再生制御部、２５１動画データ記憶部、２１７学習モデル記憶部、Ｓ再生制御システム、Ｕユーザ 10 Wearable camera, 20 Playback control device, 11,21 CPU, 12,22 ROM, 13,23 RAM, 14,24 communication unit, 15 sensor unit, 16,25 storage unit, 17,26 input unit, 18 imaging unit, 19 Eye tracking unit, 27 output unit, 28 drive, 111 field image capturing unit, 112 gazing point measurement unit, 113 image data generation unit, 114 image data transmission unit, 161 image data storage unit, 211 image data acquisition unit, 212 area Detection unit, 213 gazing point detection unit, 214 feature amount extraction unit, 215 learning unit, 216 scene detection unit, 217 playback control unit, 251 video data storage unit, 217 learning model storage unit, S playback control system, U user

Claims

An image data acquisition means for acquiring a plurality of image data that are continuous in time, and
An area detecting means for detecting a target area including a predetermined target from the image of each of the plurality of image data, and an area detecting means.
A feature amount extraction means for extracting a feature amount from each of the plurality of image data based on a change in an image between the plurality of image data and whether or not the changing region is the target area.
A scene detection means for detecting image data corresponding to a predetermined scene by inputting the feature amount into the learning model, and
A reproduction control means for controlling the reproduction of the plurality of image data based on the information indicating the image data corresponding to the predetermined scene detected by the scene detection means.
A reproduction control device characterized by comprising.

When the plurality of image data are continuously reproduced, the reproduction control means reproduces the image data corresponding to the predetermined scene detected by the scene detecting means, and reproduces the other image data. The reproduction control device according to claim 1, wherein the aspect is different from that of the reproduction control device.

When the plurality of image data are continuously reproduced, the reproduction control means sets the reproduction speed of the image data corresponding to the predetermined scene detected by the scene detection means from the reproduction speed of the other image data. The reproduction control device according to claim 1 or 2, wherein the reproduction control device is also slowed down.

When the plurality of image data are continuously reproduced, the reproduction control means expands and reproduces a part of the image data corresponding to the predetermined scene detected by the scene detection means. The reproduction control device according to any one of claims 1 to 3.

The predetermined scene is a scene for viewing the plurality of image data to be continuously reproduced, and is a scene in which a plurality of users collaborate with each other.
The predetermined target is a part of each of a plurality of users who perform the collaborative work.
The plurality of image data are image data obtained by photographing a space corresponding to the field of view of any user performing the collaborative work.
The reproduction control device according to any one of claims 1 to 4, wherein the reproduction control device is characterized.

Further provided with a gaze point detecting means for detecting the gaze point of the user who visually recognizes the shooting target when the plurality of image data are taken.
The feature amount extraction means further extracts a feature amount from each of the plurality of image data based on a change in the gazing point of the user who visually recognizes the photographing target between the plurality of image data.
The reproduction control device according to any one of claims 1 to 5, wherein the reproduction control device is characterized.

It is characterized by further providing a learning means for constructing the learning model by performing machine learning using a set of input data including the feature amount and a label indicating image data corresponding to the predetermined scene as teacher data. The reproduction control device according to any one of claims 1 to 6.

The predetermined scene is a plurality of scenes performed in a predetermined order.
The reproduction control device according to claim 7, wherein the feature amount also includes information indicating the predetermined order.

An image data acquisition function that acquires multiple image data that are continuous in time,
An area detection function that detects a target area including a predetermined target from the images of each of the plurality of image data, and an area detection function.
Feature extraction that extracts features from each of the plurality of image data based on the change in the image between the plurality of image data and whether or not the changing region is a region included in the target region. Features and
A scene detection function that detects image data corresponding to a predetermined scene by inputting the feature amount into the learning model, and
A playback control function that controls playback of the plurality of image data based on information indicating image data corresponding to the predetermined scene detected by the scene detection function, and
A playback control program characterized by realizing a computer.