JP2022056744A

JP2022056744A - Annotation device, annotation method and annotation program

Info

Publication number: JP2022056744A
Application number: JP2020164661A
Authority: JP
Inventors: 明宏杉田; Akihiro Sugita; 睦子鈴木; Mutsuko Suzuki; 征也畠山; Masaya Hatakeyama; 亜矢橋本; Aya Hashimoto
Original assignee: Yazaki Corp
Current assignee: Yazaki Corp
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-04-11

Abstract

To provide an annotation device, an annotation method and an annotation program with which it is possible to create training data appropriately.SOLUTION: The annotation device, the annotation method and the annotation program are designed to add annotation information to moving-image data in accordance with operation and create training data that is used for the machine learning of a trained model. In the process of creating training data, the position of an object included in the moving-image is specified and an object label that represents the type of the object is added as annotation information to the moving-image data in accordance with operation, a relationship label that represents the type of an event that the plurality of objects included in the moving-image are correlated is added as annotation information to the moving-image data in accordance with operation, and the object label of the object related to the event which the relationship label is concerned with is specified from the added object labels and added as annotation information to the moving-image data in accordance with operation.SELECTED DRAWING: Figure 25

Description

本発明は、アノテーション装置、アノテーション方法、及び、アノテーションプログラムに関する。 The present invention relates to an annotation device, an annotation method, and an annotation program.

人工知能（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）や深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）を用いて、ドライブレコーダ等の動画像から種々の検出を行う従来の技術として、例えば、特許文献１には、情報処理装置が開示されている。この情報処理装置は、取得部と、信号領域認識部と、速度情報取得部と、加速度情報取得部と、判断部と、を備える。取得部は、車両に搭載された撮影装置により撮影された撮影画像を取得する。信号領域認識部は、取得部により取得された撮影画像のうち、信号機の赤信号を示す赤信号領域を認識する。速度情報取得部は、車両の速度を示す速度情報を取得する。加速度情報取得部は、車両の加速度を示す加速度情報を取得する。判断部は、速度情報または加速度情報と、赤信号領域と、車両が赤信号を無視した運転を行っていることを識別するために予め定められた赤信号無視識別情報と、に基づいて、車両が赤信号を無視した運転を行っているか否かを判断する。この場合において、赤信号無視識別情報は、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）を用いた機械学習方法を利用して予め作成される。 As a conventional technique for performing various detections from moving images such as a drive recorder using artificial intelligence and deep learning, for example, Patent Document 1 discloses an information processing apparatus. .. This information processing device includes an acquisition unit, a signal area recognition unit, a speed information acquisition unit, an acceleration information acquisition unit, and a determination unit. The acquisition unit acquires a photographed image taken by a photographing device mounted on the vehicle. The signal area recognition unit recognizes a red signal area indicating a red signal of a traffic light from the captured images acquired by the acquisition unit. The speed information acquisition unit acquires speed information indicating the speed of the vehicle. The acceleration information acquisition unit acquires acceleration information indicating the acceleration of the vehicle. The determination unit is based on speed information or acceleration information, a red light region, and predetermined red light ignoring identification information for identifying that the vehicle is driving ignoring the red light. Determines whether or not the vehicle is driving ignoring the red light. In this case, the red light ignoring identification information is created in advance by using a machine learning method using SVM (Support Vector Machine).

特開２０１８－０７２９４０号公報Japanese Unexamined Patent Publication No. 2018-072940

ところで、上述の特許文献１に記載の情報処理装置は、例えば、複数の物体が関係することで生じる事象を検出するために学習済みモデルを用いる場合、当該学習済みモデルを機械学習させるための適正な教師データが必要となる。 By the way, when the information processing apparatus described in Patent Document 1 described above uses a trained model for detecting an event caused by the relationship of a plurality of objects, it is appropriate for the trained model to be machine-learned. Teacher data is required.

本発明は、上記の事情に鑑みてなされたものであって、適正に教師データの作成を行うことができるアノテーション装置、アノテーション方法、及び、アノテーションプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an annotation device, an annotation method, and an annotation program capable of appropriately creating teacher data.

上記目的を達成するために、本発明に係るアノテーション装置は、動画像データが表す動画像を表示可能である表示部と、操作を受け付ける操作部と、前記操作部への操作に応じて前記動画像データにアノテーション情報を付加して学習済みモデルの機械学習に用いる教師データを作成するアノテーション処理を実行可能である処理部とを備え、前記処理部は、前記アノテーション処理において、前記操作部への操作に応じて前記動画像に含まれる物体の位置を特定し当該物体の種類を表す物体ラベルを前記アノテーション情報として前記動画像データに付加する処理、前記操作部への操作に応じて前記動画像に含まれる複数の物体が相関する事象の種類を表す関係ラベルを前記アノテーション情報として前記動画像データに付加する処理、及び、前記操作部への操作に応じて付加した前記物体ラベルから前記関係ラベルの対象とされた前記事象に関係する前記物体の前記物体ラベルを指定し前記アノテーション情報として前記動画像データに付加する処理を実行することを特徴とする。 In order to achieve the above object, the annotation device according to the present invention has a display unit capable of displaying a moving image represented by moving image data, an operation unit that accepts an operation, and the moving image according to an operation on the operation unit. It is provided with a processing unit capable of executing annotation processing for adding annotation information to image data and creating teacher data used for machine learning of a trained model, and the processing unit transfers to the operation unit in the annotation processing. The process of specifying the position of the object included in the moving image according to the operation and adding the object label indicating the type of the object to the moving image data as the annotation information, and the moving image according to the operation to the operation unit. The process of adding a relational label indicating the type of an event in which a plurality of objects included in the above correlates to the moving image data as the annotation information, and the relational label from the object label added according to the operation to the operation unit. It is characterized in that a process of designating the object label of the object related to the target of the event and adding it to the moving image data as the annotation information is executed.

また、上記アノテーション装置では、前記処理部は、前記アノテーション処理において、予め指定された複数の動画像データファイルを一連の前記動画像を表す前記動画像データとして取り扱うものとすることができる。 Further, in the annotation device, the processing unit can handle a plurality of predetermined moving image data files as the moving image data representing a series of the moving images in the annotation processing.

上記目的を達成するために、本発明に係るアノテーション方法は、動画像データが表す動画像を表示するステップと、操作を受け付けるステップと、操作に応じて前記動画像データにアノテーション情報を付加して学習済みモデルの機械学習に用いる教師データを作成するステップとを含み、前記教師データを作成するステップでは、操作に応じて前記動画像に含まれる物体の位置を特定し当該物体の種類を表す物体ラベルを前記アノテーション情報として前記動画像データに付加し、操作に応じて前記動画像に含まれる複数の物体が相関する事象の種類を表す関係ラベルを前記アノテーション情報として前記動画像データに付加し、操作に応じて付加した前記物体ラベルから前記関係ラベルの対象とされた前記事象に関係する前記物体の前記物体ラベルを指定し前記アノテーション情報として前記動画像データに付加することを特徴とする。 In order to achieve the above object, the annotation method according to the present invention includes a step of displaying a moving image represented by moving image data, a step of accepting an operation, and adding annotation information to the moving image data according to the operation. Including the step of creating the teacher data used for machine learning of the trained model, in the step of creating the teacher data, the position of the object included in the moving image is specified according to the operation, and the object representing the type of the object is specified. A label is added to the moving image data as the annotation information, and a relational label indicating the type of an event in which a plurality of objects included in the moving image correlate according to an operation is added to the moving image data as the annotation information. It is characterized in that the object label of the object related to the event targeted by the relational label is designated from the object label added according to the operation and added to the moving image data as the annotation information.

上記目的を達成するために、本発明に係るアノテーションプログラムは、動画像データが表す動画像を表示し、操作を受け付け、操作に応じて前記動画像データにアノテーション情報を付加して学習済みモデルの機械学習に用いる教師データを作成する、各処理をコンピュータに実行させ、前記教師データを作成する処理において、操作に応じて前記動画像に含まれる物体の位置を特定し当該物体の種類を表す物体ラベルを前記アノテーション情報として前記動画像データに付加し、操作に応じて前記動画像に含まれる複数の物体が相関する事象の種類を表す関係ラベルを前記アノテーション情報として前記動画像データに付加し、操作に応じて付加した前記物体ラベルから前記関係ラベルの対象とされた前記事象に関係する前記物体の前記物体ラベルを指定し前記アノテーション情報として前記動画像データに付加する、各処理を前記コンピュータに実行させることを特徴とする。 In order to achieve the above object, the annotation program according to the present invention displays the moving image represented by the moving image data, accepts an operation, and adds annotation information to the moving image data according to the operation to obtain a trained model. An object that specifies the position of an object included in the moving image according to the operation and represents the type of the object in the process of creating the teacher data used for machine learning, causing the computer to execute each process and creating the teacher data. A label is added to the moving image data as the annotation information, and a relational label indicating the type of an event in which a plurality of objects included in the moving image correlate according to an operation is added to the moving image data as the annotation information. Each process of designating the object label of the object related to the event targeted by the relation label from the object label added according to the operation and adding it to the moving image data as the annotation information is performed by the computer. It is characterized by having it executed.

本発明に係るアノテーション装置、アノテーション方法、及び、アノテーションプログラムは、適正に教師データの作成を行うことができる、という効果を奏する。 The annotation device, annotation method, and annotation program according to the present invention have an effect that teacher data can be appropriately created.

図１は、実施形態に係るアノテーション装置の概略構成を表すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an annotation device according to an embodiment. 図２は、物体検出について説明する模式図である。FIG. 2 is a schematic diagram illustrating object detection. 図３は、物体検出について説明する模式図である。FIG. 3 is a schematic diagram illustrating object detection. 図４は、行動検出について説明する模式図である。FIG. 4 is a schematic diagram illustrating behavior detection. 図５は、関係行動検出について説明する模式図である。FIG. 5 is a schematic diagram illustrating the relationship behavior detection. 図６は、関係行動検出について説明する模式図である。FIG. 6 is a schematic diagram illustrating relationship behavior detection. 図７は、関係行動検出について説明する模式図である。FIG. 7 is a schematic diagram illustrating the relationship behavior detection. 図８は、学習フェーズ、及び、使用フェーズの処理を示す模式図である。FIG. 8 is a schematic diagram showing the processing of the learning phase and the use phase. 図９は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 9 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１０は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 10 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１１は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 11 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１２は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 12 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１３は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 13 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１４は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 14 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１５は、実施形態に係るアノテーション装置における動画像データファイルについて説明する模式図である。FIG. 15 is a schematic diagram illustrating a moving image data file in the annotation device according to the embodiment. 図１６は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 16 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１７は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 17 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１８は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 18 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図１９は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 19 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図２０は、実施形態に係るアノテーション装置の表示機器において表示される画面の一例を表す図である。FIG. 20 is a diagram showing an example of a screen displayed on the display device of the annotation device according to the embodiment. 図２１は、実施形態に係るアノテーション装置におけるファイル形式について説明する模式図である。FIG. 21 is a schematic diagram illustrating a file format in the annotation device according to the embodiment. 図２２は、実施形態に係るアノテーション装置におけるファイル形式について説明する模式図である。FIG. 22 is a schematic diagram illustrating a file format in the annotation device according to the embodiment. 図２３は、実施形態に係るアノテーション装置における処理の一例を説明するフローチャートである。FIG. 23 is a flowchart illustrating an example of processing in the annotation device according to the embodiment. 図２４は、実施形態に係るアノテーション装置における処理の一例を説明するフローチャートである。FIG. 24 is a flowchart illustrating an example of processing in the annotation device according to the embodiment. 図２５は、実施形態に係るアノテーション装置における処理の一例を説明するフローチャートである。FIG. 25 is a flowchart illustrating an example of processing in the annotation device according to the embodiment.

以下に、本発明に係る実施形態を図面に基づいて詳細に説明する。なお、この実施形態によりこの発明が限定されるものではない。また、下記実施形態における構成要素には、当業者が置換可能かつ容易なもの、あるいは実質的に同一のものが含まれる。 Hereinafter, embodiments according to the present invention will be described in detail with reference to the drawings. The present invention is not limited to this embodiment. In addition, the components in the following embodiments include those that can be easily replaced by those skilled in the art, or those that are substantially the same.

［実施形態］
図１に示す本実施形態のアノテーション装置１は、動画像データに対してアノテーション処理を行うためのアノテーションツールを構成するものである。ここで、アノテーション処理とは、動画像データにアノテーション情報を付加して学習済みモデルＭ（図８参照）の機械学習に用いる教師データＤ２（図８参照）を作成する処理である。ここでのアノテーション情報とは、動画像データを、学習済みモデルＭの機械学習における教師データＤ２として成り立たせるために当該動画像データに付加されるメタデータである。 [Embodiment]
The annotation device 1 of the present embodiment shown in FIG. 1 constitutes an annotation tool for performing annotation processing on moving image data. Here, the annotation process is a process of adding annotation information to the moving image data to create teacher data D2 (see FIG. 8) used for machine learning of the trained model M (see FIG. 8). The annotation information here is metadata added to the moving image data in order to establish the moving image data as the teacher data D2 in the machine learning of the trained model M.

本実施形態のアノテーション装置１において、アノテーション処理によって作成された教師データＤ２は、典型的には、関係行動検出（ＲｅｌａｔｉｏｎａｌＡｃｔｉｏｎＤｅｔｅｃｔｉｏｎ）用の学習済みモデルＭを機械学習によって生成する際に用いられる。ここで、関係行動検出とは、画像を用いた検出技術であり、複数の物体が関係することで生じる事象を検出するものである。 In the annotation device 1 of the present embodiment, the teacher data D2 created by the annotation process is typically used when a trained model M for relational action detection is generated by machine learning. Here, the relational behavior detection is a detection technique using an image, and detects an event caused by the relation of a plurality of objects.

ここで、画像を用いた検出技術としては、関係行動検出の他、例えば、物体検出（ＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎ）、行動検出（ＡｃｔｉｏｎＤｅｔｅｃｔｉｏｎ）等がある。 Here, as the detection technique using an image, in addition to the relational action detection, for example, there are object detection (Object Detection), action detection (Action Detection) and the like.

物体検出とは、図２に示すように、動画像（映像）を構成する静止画像内から物体の種類と位置を検出するものである。さらに言えば、物体検出とは、静止画像からある瞬間の物体の静的な位置、種類を検出するものである。物体検出によって検出される物体は、例えば、車両、通行人、自転車、障害物、街灯、看板、電柱、標識、停止線等である。図２は、一例として、静止画像内の物体検出によって、物体として「自転車」、「停止線」が検出された場合を表しており、当該検出された「自転車」、「停止線」の位置が矩形枠によって示されている。また、この物体検出では、図３に示すように、動画像を構成する複数の静止画像を比較して物体の位置変化を検出することも可能である。図３は、一例として、動画像を構成する複数の静止画像内の物体検出によって、物体として「自転車」、「停止線」が検出され、「自転車」の位置が右から左に変化したことが検出された場合を表している。 As shown in FIG. 2, the object detection is to detect the type and position of an object from the still image constituting the moving image (video). Furthermore, object detection is to detect the static position and type of an object at a certain moment from a still image. Objects detected by object detection are, for example, vehicles, passers-by, bicycles, obstacles, street lights, signboards, utility poles, signs, stop lines, and the like. FIG. 2 shows, as an example, a case where a "bicycle" and a "stop line" are detected as an object by detecting an object in a still image, and the positions of the detected "bicycle" and the "stop line" are It is indicated by a rectangular frame. Further, in this object detection, as shown in FIG. 3, it is also possible to detect a change in the position of an object by comparing a plurality of still images constituting a moving image. In FIG. 3, as an example, "bicycle" and "stop line" are detected as objects by object detection in a plurality of still images constituting a moving image, and the position of "bicycle" changes from right to left. Represents the case where it is detected.

一方、行動検出とは、図４に示すように、動画像を構成する複数の静止画像から物体検出によって検出された物体の位置、状態等の時間的な変化から物体単独の動き（行動）を検出するものである。言い換えれば、行動検出とは、時間変化に伴う物体の動き（行動）を検出するものである。なおこの場合、例えば、「停止線」等のように、検出された物体がその位置で静止していること（言い換えれば、動いていないこと）も物体の動き（行動）の一例として、当該物体の動きの概念の範疇に含まれる。行動検出では、例えば、動画像を構成する静止画像から複数の物体が検出された場合でも、それぞれの物体単独の動きを検出する。図４は、一例として、動画像を構成する複数の静止画像内の物体検出によって２台の「自転車」が検出された上で、行動検出によって当該２台の「自転車」が「それぞれふらついて走行している」という物体単独の動きが検出された場合を表している。 On the other hand, as shown in FIG. 4, the behavior detection is the movement (behavior) of an object alone from the temporal change of the position, state, etc. of the object detected by the object detection from a plurality of still images constituting the moving image. It is to detect. In other words, behavior detection is to detect the movement (behavior) of an object with time change. In this case, for example, the fact that the detected object is stationary at that position (in other words, that it is not moving), such as a "stop line", is also an example of the movement (behavior) of the object. It is included in the category of the concept of movement. In the behavior detection, for example, even when a plurality of objects are detected from the still images constituting the moving image, the motion of each object alone is detected. In FIG. 4, as an example, two "bicycles" are detected by object detection in a plurality of still images constituting a moving image, and then the two "bicycles" are "staggered and run" by behavior detection. It represents the case where the movement of the object alone is detected.

この行動検出では、物体単独の状態や変化（例えば、「自転車」がふらついて走行している動き）を検出し、例えば、車両の運転者等に対して危険予知を行うことはできるが、検出した複数の物体同士を関係づけて挙動の意味などを検出することはできない。このため、例えば、図４の例では、ふらついて走行している２台の「自転車」が他の物体との関係に関わりなく双方ともに危険であると判断されることとなる。 In this behavior detection, it is possible to detect the state or change of an object alone (for example, the movement of a "bicycle" swaying and running), and for example, it is possible to predict danger to the driver of a vehicle, but it is detected. It is not possible to detect the meaning of behavior by associating a plurality of objects with each other. Therefore, for example, in the example of FIG. 4, it is determined that the two "bicycles" that are staggering and traveling are both dangerous regardless of their relationship with other objects.

これに対して、上述した関係行動検出では、動画像から検出した物体同士の関係性を検出し、これらの動きを関係づけて相互の挙動の意味（種別）を検出する。より詳細には、関係行動検出とは、図５に示すように、動画像を構成する複数の静止画像から行動検出によって検出された物体単独の動きから、複数の物体が関係することで生じる挙動（振る舞い）、物体同士の関係性を検出するものである。言い換えれば、関係行動検出とは、複数の物体のそれぞれの時間変化に伴う動きが関係して生じる挙動を検出するものである。複数の物体の動きが関係して生じる挙動は、例えば、「複数の物体の関係性の変化に応じて生じる事象」と言い換えることもできる。図５は、一例として、動画像を構成する複数の静止画像内の物体検出によって２台の「自転車」が検出され、行動検出によって当該２台の「自転車」が「それぞれふらついて走行している」という物体単独の動きが検出された場合を表している。そして、図５は、関係行動検出によって、左折しようとしている車両の動きに対して、ふらついて走行している手前側の「自転車」が車両の進行方向に向かっている一方、奥側の「自転車」が車両の進行方向とは異なる方向に向かっているという挙動（事象）が検出された場合を表している。関係行動検出では、このように車両の進行方向に向かっている手前側の「自転車」に対しては危険であると判断することが可能である一方、ふらついているものの車両とは異なる方向に向かっている奥側の「自転車」に対しては危険ではないと判断することも可能となる。 On the other hand, in the above-mentioned relationship behavior detection, the relationship between the objects detected from the moving image is detected, and these movements are related to detect the meaning (type) of the mutual behavior. More specifically, as shown in FIG. 5, the relational behavior detection is a behavior caused by the relation of a plurality of objects from the movement of a single object detected by the behavior detection from a plurality of still images constituting a moving image. (Behavior), it detects the relationship between objects. In other words, the relational behavior detection is to detect the behavior caused by the movement of a plurality of objects with time change. The behavior that occurs in relation to the movement of a plurality of objects can be rephrased as, for example, "an event that occurs in response to a change in the relationship between a plurality of objects". In FIG. 5, as an example, two "bicycles" are detected by object detection in a plurality of still images constituting a moving image, and the two "bicycles" are "staggering and running" by behavior detection. It represents the case where the movement of a single object is detected. Then, in FIG. 5, the "bicycle" on the front side, which is swaying and traveling, is heading toward the traveling direction of the vehicle in response to the movement of the vehicle trying to turn left by the related behavior detection, while the "bicycle" on the back side. Represents the case where the behavior (event) that "is heading in a direction different from the traveling direction of the vehicle" is detected. In the relationship behavior detection, it is possible to judge that it is dangerous for the "bicycle" on the front side that is heading in the direction of travel of the vehicle, but it is swaying but heading in a different direction from the vehicle. It is also possible to judge that it is not dangerous for the "bicycle" on the back side.

他の例として、図６は、複数の物体の動きが関係して生じる挙動（事象）の種別として、例えば、「赤信号」、「停止線」、「車両」の動きの関係性から「赤信号で車両が停止線上で停止したという交通違反ではない安全な挙動（事象）」が検出された場合を表している。一方、図７は、複数の物体の動きが関係して生じる挙動（事象）の種別として、例えば、「赤信号」、「停止線」、「車両」の動きの関係性から「赤信号で車両が停止線上で停止しない（信号無視）という交通違反である危険な挙動（事象）」が検出された場合を表している。 As another example, FIG. 6 shows, as a type of behavior (event) caused by the movement of a plurality of objects, for example, “red” from the relationship of movement of “red light”, “stop line”, and “vehicle”. It represents the case where "safe behavior (event) that is not a traffic violation that the vehicle stopped on the stop line" is detected at the traffic light. On the other hand, FIG. 7 shows the types of behaviors (events) that occur in relation to the movements of a plurality of objects. Represents the case where "dangerous behavior (event)", which is a traffic violation that does not stop on the stop line (ignoring the signal), is detected.

上記のような画像に基づいた関係行動検出は、例えば、図８に示すような検出システムＳｙｓ１によって学習済みモデルＭを用いて行われる。この検出システムＳｙｓ１では、図８に示すように、学習済みモデルＭを生成する処理を行う学習フェーズと、学習済みモデルＭを用いて上記のような関係行動検出を実際に行う使用フェーズとがある。以下、検出システムＳｙｓ１の概要について簡単に説明する。 The relationship behavior detection based on the above image is performed, for example, by using the trained model M by the detection system Sys1 as shown in FIG. In this detection system Sys1, as shown in FIG. 8, there is a learning phase in which a process of generating a trained model M is performed, and a usage phase in which the above-mentioned relationship behavior detection is actually performed using the trained model M. .. Hereinafter, the outline of the detection system Sys1 will be briefly described.

検出システムＳｙｓ１は、学習フェーズでは、学習済みモデルＭを学習させるために、入力データＤ１と教師データＤ２とからなる学習用データセットＤ３を取得する。学習用データセットＤ３は、学習済みモデルＭを機械学習によって生成する際に用いられる教師データセットである。学習用データセットＤ３は、説明変数となる入力データＤ１と、当該入力データＤ１に対応する目的変数となる教師データＤ２とが１組のセットとして紐づけられることで構成される。さらに言えば、学習用データセットＤ３は、説明変数として定量化された当該入力データＤ１と、目的変数として定量化された当該教師データＤ２とから構成される。本実施形態において、学習用データセットＤ３を構成する入力データＤ１は、例えば、車両に搭載されたカメラから撮影された車両の周囲の動画像を表す動画像データである。一方、学習用データセットＤ３を構成する教師データＤ２は、入力データＤ１である当該動画像データが表す動画像中の複数の物体が関係することで生じる挙動（事象）の種別を表すデータであり、本実施形態のアノテーション装置１によって作成される。検出システムＳｙｓ１は、例えば、後述する学習用データ提供システム（サーバー）Ｓｖ（図１参照）から学習用データセットＤ３を取得することができる。 In the learning phase, the detection system Sys1 acquires a learning data set D3 including input data D1 and teacher data D2 in order to train the trained model M. The training data set D3 is a teacher data set used when the trained model M is generated by machine learning. The learning data set D3 is configured by associating the input data D1 as an explanatory variable and the teacher data D2 as an objective variable corresponding to the input data D1 as a set. Furthermore, the training data set D3 is composed of the input data D1 quantified as an explanatory variable and the teacher data D2 quantified as an objective variable. In the present embodiment, the input data D1 constituting the learning data set D3 is, for example, moving image data representing a moving image around the vehicle taken by a camera mounted on the vehicle. On the other hand, the teacher data D2 constituting the learning data set D3 is data representing the type of behavior (event) caused by the involvement of a plurality of objects in the moving image represented by the moving image data represented by the input data D1. , Created by the annotation device 1 of the present embodiment. The detection system Sys1 can acquire the learning data set D3 from, for example, the learning data providing system (server) Sv (see FIG. 1) described later.

そして、検出システムＳｙｓ１は、学習フェーズでは、複数の学習用データセットＤ３を教師データセットとして、種々の機械学習アルゴリズムＡＬに基づく機械学習を行うことによって、学習済みモデルＭを生成する。使用する機械学習アルゴリズムＡＬとしては、例えば、ディープラーニング（ＤｅｅｐＬｅａｒｎｉｎｇ）、ニューラルネットワーク（ＮｅｕｒａｌＮｅｔｗｏｒｋ）、ロジスティック（Ｌｏｇｉｓｔｉｃ）回帰、アンサンブル学習（ＥｎｓｅｍｂｌｅＬｅａｒｎｉｎｇ）、サポートベクターマシン（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）、ランダムフォレスト（ＲａｎｄｏｍＦｏｒｅｓｔ）、ナイーブベイズ（ＮａｉｖｅＢａｙｓ）等の公知のアルゴリズムが挙げられる。検出システムＳｙｓ１は、学習用データセットＤ３のうち、入力データＤ１を説明変数とし、教師データＤ２を目的変数として、学習済みモデルＭの機械学習を行う。 Then, in the learning phase, the detection system Sys1 generates a trained model M by performing machine learning based on various machine learning algorithms AL using a plurality of learning data sets D3 as teacher data sets. The machine learning algorithm AL to be used includes, for example, deep learning, neural network, logistic regression, ensemble learning, support vector machine, and random machine. Known algorithms such as Random Forest) and Naive Bays can be mentioned. The detection system Sys1 performs machine learning of the trained model M with the input data D1 as the explanatory variable and the teacher data D2 as the objective variable in the training data set D3.

本実施形態において、検出システムＳｙｓ１は、上記の機械学習の結果として、上述したように動画像に基づいた関係行動検出に用いられる学習済みモデルＭを生成する。学習済みモデルＭは、例えば、ニューラルネットワークにより実現される。ニューラルネットワークとしては、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）、ＬＳＴＭ(Ｌｏｎｇｓｈｏｒｔ－ｔｅｒｍｍｅｍｏｒｙ) ネットワーク等の任意の構成を有するニューラルネットワークを採用することができる。検出システムＳｙｓ１は、複数の学習用データセットＤ３を用いた機械学習を行うことにより、当該ニューラルネットワークにおいて重み付けとして用いられる学習重み付け係数等を学習し、当該学習済みモデルＭを生成する。なお、この学習済みモデルＭは、単一のモデルに限らず、複数のモデルが組み合わさることで構成されてもよい。 In the present embodiment, the detection system Sys1 generates a trained model M used for the relationship behavior detection based on the moving image as described above as a result of the above machine learning. The trained model M is realized by, for example, a neural network. As the neural network, for example, a neural network having an arbitrary configuration such as a CNN (Convolutional Neural Network), an RNN (Recurrent Neural Network), or an LSTM (Long short-term memory) network can be adopted. The detection system Sys1 learns the learning weighting coefficient and the like used as weighting in the neural network by performing machine learning using a plurality of learning data sets D3, and generates the trained model M. The trained model M is not limited to a single model, and may be configured by combining a plurality of models.

本実施形態において、学習フェーズで機械学習によって生成される学習済みモデルＭは、上述したように動画像を表す動画像データから複数の物体が関係することで生じる挙動（事象）の種別を特定するために用いられるモデルである。つまり、この学習済みモデルＭは、入力を「動画像データ」とし、出力を「挙動（事象）の種別」としたモデルである。すなわち、学習済みモデルＭは、動画像データの入力を受け付けて当該動画像データから複数の物体が関係することで生じる挙動（事象）の種別を出力するように機能付けられる。 In the present embodiment, the trained model M generated by machine learning in the learning phase specifies the type of behavior (event) caused by the involvement of a plurality of objects from the moving image data representing the moving image as described above. Is the model used for. That is, this trained model M is a model in which the input is "moving image data" and the output is "type of behavior (event)". That is, the trained model M is functioned to accept the input of the moving image data and output the type of the behavior (event) caused by the involvement of a plurality of objects from the moving image data.

検出システムＳｙｓ１は、使用フェーズでは、学習フェーズで生成した上記学習済みモデルＭを用いて、複数の物体が関係することで生じる挙動（事象）の種別を特定する。使用フェーズにおいて、当該挙動（事象）の種別を特定する処理は、上述の関係行動検出に関する処理に相当する。検出システムＳｙｓ１は、使用フェーズでは、検出対象データ（入力データ）Ｄ４として、検出対象となる動画像データを学習済みモデルＭに入力し、当該動画像データから複数の物体が関係することで生じる挙動（事象）の種別を出力する。検出システムＳｙｓ１は、出力された上記挙動の種別を定量化した値を、種別特定結果データ（出力データ）Ｄ５として出力する。 In the use phase, the detection system Sys1 uses the trained model M generated in the learning phase to specify the type of behavior (event) caused by the involvement of a plurality of objects. In the use phase, the process of specifying the type of the behavior (event) corresponds to the process related to the above-mentioned related behavior detection. In the use phase, the detection system Sys1 inputs the moving image data to be detected as the detection target data (input data) D4 into the trained model M, and the behavior caused by the involvement of a plurality of objects from the moving image data. Output the type of (event). The detection system Sys1 outputs the output value quantifying the type of the above behavior as the type specification result data (output data) D5.

そして、本実施形態のアノテーション装置１は、上記のように構成される検出システムＳｙｓ１において、関係行動検出用の学習済みモデルＭの機械学習に用いられる学習用データセットＤ３の教師データＤ２を作成するものである。アノテーション装置１によって作成される教師データＤ２は、上述したように学習済みモデルＭの機械学習における目的変数として成り立たせるためのアノテーション情報として、動画像中に含まれる複数の物体が関係することで生じる挙動（事象）の種別を表すデータを動画像データに付加したものである。言い換えれば、教師データＤ２は、動画像に映った事象における複数の物体の関係性を追跡した関係追跡データということもできる。そして、このアノテーション装置１は、動画像データにおいて複数の物体が関係することで生じる挙動（事象）を追跡する関係追跡（ＲｅｌａｔｉｏｎａｌＴｒａｃｋｉｎｇ）アノテーションツールを構成するものであるということもできる。ここでは、アノテーション装置１は、例えば、パーソナルコンピュータ、ワークステーション、タブレット端末等の種々のコンピュータ機器によって実現される。また、アノテーション装置１は、単一のコンピュータ機器によって実現されてもよいし、複数のコンピュータ機器によって実現されてもよい。以下、図１、図９～図１４を参照してアノテーション装置１の各構成について詳細に説明する。 Then, the annotation device 1 of the present embodiment creates the teacher data D2 of the learning data set D3 used for machine learning of the trained model M for detecting the relational behavior in the detection system Sys1 configured as described above. It is a thing. The teacher data D2 created by the annotation device 1 is generated when a plurality of objects included in the moving image are involved as annotation information for being established as an objective variable in machine learning of the trained model M as described above. Data representing the type of behavior (event) is added to the moving image data. In other words, the teacher data D2 can also be said to be relationship tracking data that tracks the relationship between a plurality of objects in an event reflected in a moving image. It can also be said that the annotation device 1 constitutes a relational tracking annotation tool that tracks behaviors (events) caused by the involvement of a plurality of objects in moving image data. Here, the annotation device 1 is realized by various computer devices such as a personal computer, a workstation, and a tablet terminal. Further, the annotation device 1 may be realized by a single computer device or may be realized by a plurality of computer devices. Hereinafter, each configuration of the annotation device 1 will be described in detail with reference to FIGS. 1 and 9 to 14.

具体的には、アノテーション装置１は、表示機器１０と、操作機器２０と、データ入出力機器３０と、記憶回路４０と、処理回路５０とを備える。表示機器１０、操作機器２０、データ入出力機器３０、記憶回路４０、及び、処理回路５０は、ネットワークを介して相互に通信可能に接続されている。 Specifically, the annotation device 1 includes a display device 10, an operation device 20, a data input / output device 30, a storage circuit 40, and a processing circuit 50. The display device 10, the operation device 20, the data input / output device 30, the storage circuit 40, and the processing circuit 50 are connected to each other so as to be communicable with each other via a network.

表示機器１０は、動画像データが表す動画像を表示可能な表示部である。表示機器１０は、例えば、液晶ディスプレイ、プラズマディスプレイ、有機ＥＬディスプレイ等、各種画像情報を出力して表示する画像表示装置によって構成される。 The display device 10 is a display unit capable of displaying a moving image represented by moving image data. The display device 10 is composed of an image display device that outputs and displays various image information such as a liquid crystal display, a plasma display, and an organic EL display.

操作機器２０は、アノテーション装置１に対する作業者等による種々の操作を受け付ける操作部である。表示機器１０は、例えば、マウス、キーボード、トラックボール、スイッチ、ボタン、ジョイスティック、タッチパッド、タッチスクリーン、非接触入力回路、音声入力回路等、作業者等からの各種の操作入力を受け付ける操作入力機器によって構成される。 The operation device 20 is an operation unit that receives various operations by an operator or the like on the annotation device 1. The display device 10 is an operation input device that receives various operation inputs from an operator or the like, such as a mouse, a keyboard, a trackball, a switch, a button, a joystick, a touch pad, a touch screen, a non-contact input circuit, and a voice input circuit. Consists of.

データ入出力機器３０は、アノテーション装置１に対するデータ（情報）の入出力を行うデータ入出力部である。データ入出力機器３０は、アノテーション装置１外の他の機器からのデータ（情報）入力を受け付けると共に他の機器に対するデータ（情報）出力を行う。データ入出力機器３０は、例えば、通信インターフェース、記録媒体インターフェース等によって構成される。通信インターフェースは、有線、無線を問わず通信を介して、アノテーション装置１と他の機器との間で各種データの送受信を行う。記録媒体インターフェースは、例えば、ハードディスクドライブ（ＨＤＤ）、ソリッドステートドライブ（ＳＳＤ）、フレキシブルディスク（ＦＤ）、光磁気ディスク（Ｍａｇｎｅｔｏ－Ｏｐｔｉｃａｌｄｉｓｋ）、ＣＤ－ＲＯＭ、ＤＶＤ、ＵＳＢメモリ、ＳＤカードメモリ、Ｆｌａｓｈメモリ等の記録媒体に対して各種データの読み出し、書き込みを行う。 The data input / output device 30 is a data input / output unit that inputs / outputs data (information) to the annotation device 1. The data input / output device 30 accepts data (information) input from another device other than the annotation device 1 and outputs data (information) to the other device. The data input / output device 30 is composed of, for example, a communication interface, a recording medium interface, and the like. The communication interface transmits / receives various data between the annotation device 1 and other devices via communication regardless of whether it is wired or wireless. The recording medium interface includes, for example, a hard disk drive (HDD), a solid state drive (SSD), a flexible disk (FD), a magneto-optical disk (Magnet-Optical disk), a CD-ROM, a DVD, a USB memory, an SD card memory, and a Flash. Reads and writes various data to a recording medium such as a memory.

本実施形態のデータ入出力機器３０は、少なくとも学習用データ提供システムＳｖとの間でデータの送受信が可能である。学習用データ提供システムＳｖは、例えば、多数の入力データＤ１、教師データＤ２、学習用データセットＤ３等をデータベース化して記憶しており、これらのデータを必要に応じて検索し、他の機器に提供するシステムである。アノテーション装置１は、典型的には、この学習用データ提供システムＳｖからデータ入出力機器３０を介してアノテーション処理を施す対象となる動画像データが入力される。また、アノテーション装置１によって作成された教師データＤ２は、データ入出力機器３０を介してこの学習用データ提供システムＳｖに登録され、記憶、管理される。 The data input / output device 30 of the present embodiment can transmit / receive data to / from at least the learning data providing system Sv. The learning data providing system Sv stores, for example, a large number of input data D1, teacher data D2, learning data set D3, etc. in a database, searches for these data as necessary, and stores them in another device. It is a system to provide. Typically, the annotation device 1 inputs moving image data to be annotated from the learning data providing system Sv via the data input / output device 30. Further, the teacher data D2 created by the annotation device 1 is registered, stored and managed in the learning data providing system Sv via the data input / output device 30.

記憶回路４０は、各種データを記憶する回路である。記憶回路４０は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリ等の半導体メモリ素子、ハードディスク、光ディスク等によって実現される。記憶回路４０は、例えば、アノテーション装置１が各種の機能を実現するためのプログラムを記憶する。記憶回路４０に記憶されるプログラムには、表示機器１０を機能させるプログラム、操作機器２０を機能させるプログラム、データ入出力機器３０を機能させるプログラム、処理回路５０を機能させるプログラム等が含まれる。また、記憶回路４０は、データ入出力機器３０を介して入力された動画像データ、処理回路５０での各種処理に必要なデータ、アノテーション装置１において作成された教師データＤ２等の各種データを記憶する。記憶回路４０は、処理回路５０等によってこれらの各種データが必要に応じて読み出される。なお、記憶回路４０は、ネットワークを介してアノテーション装置１に接続されたクラウドサーバ等により実現されてもよい。 The storage circuit 40 is a circuit for storing various data. The storage circuit 40 is realized by, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, a hard disk, an optical disk, or the like. The storage circuit 40 stores, for example, a program for the annotation device 1 to realize various functions. The program stored in the storage circuit 40 includes a program for functioning the display device 10, a program for functioning the operating device 20, a program for functioning the data input / output device 30, a program for functioning the processing circuit 50, and the like. Further, the storage circuit 40 stores various data such as moving image data input via the data input / output device 30, data necessary for various processing in the processing circuit 50, and teacher data D2 created in the annotation device 1. do. In the storage circuit 40, these various data are read out as needed by the processing circuit 50 and the like. The storage circuit 40 may be realized by a cloud server or the like connected to the annotation device 1 via a network.

処理回路５０は、アノテーション装置１における各種処理機能を実現する回路を構成する処理部である。処理回路５０は、例えば、プロセッサによって実現される。プロセッサとは、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の回路を意味する。処理回路５０は、例えば、記憶回路４０から読み込んだプログラムを実行することにより、各処理機能を実現する。 The processing circuit 50 is a processing unit that constitutes a circuit that realizes various processing functions in the annotation device 1. The processing circuit 50 is realized by, for example, a processor. The processor is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Circuit), or the like. The processing circuit 50 realizes each processing function by, for example, executing a program read from the storage circuit 40.

以上、本実施形態に係るアノテーション装置１の全体構成の概略について説明した。このような構成のもと、本実施形態に係る処理回路５０は、操作機器２０への操作に応じて動画像データにアノテーション情報を付加して学習済みモデルＭの機械学習に用いる教師データＤ２を作成するアノテーション処理を実行可能とするための機能を有している。 The outline of the overall configuration of the annotation device 1 according to the present embodiment has been described above. Under such a configuration, the processing circuit 50 according to the present embodiment adds annotation information to the moving image data according to the operation to the operating device 20, and uses the teacher data D2 for machine learning of the trained model M. It has a function to enable the annotation processing to be created.

具体的には、本実施形態の処理回路５０は、上記各種処理機能を実現するために、機能概念的に、表示処理部５１、操作処理部５２、タスク作成処理部５４、及び、アノテーション処理部５５を含んで構成される。処理回路５０は、例えば、記憶回路４０から読み込んだプログラムを実行することにより、これら表示処理部５１、操作処理部５２、データ入出力処理部５３、タスク作成処理部５４、及び、アノテーション処理部５５の各処理機能を実現する。 Specifically, the processing circuit 50 of the present embodiment has a display processing unit 51, an operation processing unit 52, a task creation processing unit 54, and an annotation processing unit in terms of functional concepts in order to realize the various processing functions. Consists of 55. The processing circuit 50 may execute, for example, a program read from the storage circuit 40 to execute the display processing unit 51, the operation processing unit 52, the data input / output processing unit 53, the task creation processing unit 54, and the annotation processing unit 55. Realize each processing function of.

表示処理部５１は、表示機器１０を制御し、当該表示機器１０によって動画像等を表示するための各種処理を実行可能な機能を有する部分である。 The display processing unit 51 is a part having a function of controlling the display device 10 and executing various processes for displaying a moving image or the like by the display device 10.

操作処理部５２は、操作機器２０を制御し、当該操作機器２０によって操作を受け付けるための各種処理を実行可能な機能を有する部分である。 The operation processing unit 52 is a part having a function of controlling the operation device 20 and executing various processes for receiving the operation by the operation device 20.

データ入出力処理部５３は、データ入出力機器３０を制御し、当該データ入出力機器３０によってアノテーション装置１と他の機器との間でデータを送受信するための各種処理を実行可能な機能を有する部分である。 The data input / output processing unit 53 has a function of controlling the data input / output device 30 and executing various processes for transmitting / receiving data between the annotation device 1 and another device by the data input / output device 30. It is a part.

タスク作成処理部５４は、アノテーション処理部５５によって処理する一連のタスクを作成するための各種処理を実行可能な機能を有する部分である。 The task creation processing unit 54 is a part having a function capable of executing various processes for creating a series of tasks to be processed by the annotation processing unit 55.

アノテーション処理部５５は、動画像データにアノテーション情報を付加して教師データＤ２を作成するアノテーション処理のための各種処理を実行可能な機能を有する部分である。アノテーション処理部５５は、例えば、タスク作成処理部５４によって作成された一連のタスクに対応したアノテーション処理を実行する。 The annotation processing unit 55 is a part having a function capable of executing various processes for annotation processing for creating teacher data D2 by adding annotation information to moving image data. The annotation processing unit 55 executes, for example, annotation processing corresponding to a series of tasks created by the task creation processing unit 54.

より具体的には、アノテーション処理部５５は、図９、図１０に示すように、アノテーション処理において、操作機器２０に対する管理者、作業者等の操作（以下、単に「操作」と略記する場合がある。）に応じてアノテーション情報として物体ラベルを動画像データに付加する処理を実行可能である。アノテーション情報として動画像データに付加される物体ラベルは、動画像データが表す動画像に含まれる物体の位置を特定し当該物体の種類を表すメタデータである。 More specifically, as shown in FIGS. 9 and 10, the annotation processing unit 55 may operate the operation device 20 by an administrator, a worker, or the like (hereinafter, simply abbreviated as “operation”) in the annotation processing. There is.) It is possible to execute the process of adding the object label to the moving image data as annotation information. The object label added to the moving image data as annotation information is metadata indicating the type of the object by specifying the position of the object included in the moving image represented by the moving image data.

また、アノテーション処理部５５は、図１１、図１２に示すように、アノテーション処理において、操作に応じてアノテーション情報として関係ラベルを動画像データに付加する処理を実行可能である。アノテーション情報として動画像データに付加される関係ラベルは、動画像データが表す動画像に含まれる複数の物体が相関する事象の種類を表すメタデータである。 Further, as shown in FIGS. 11 and 12, the annotation processing unit 55 can execute a process of adding a relational label as annotation information to the moving image data according to the operation in the annotation process. The relationship label added to the moving image data as annotation information is metadata representing the type of event in which a plurality of objects included in the moving image represented by the moving image data correlate.

そして、アノテーション処理部５５は、図１３、図１４に示すように、アノテーション処理において、操作に応じて、上記のように付加された物体ラベルから上記関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定しアノテーション情報として動画像データに付加する処理を実行可能である。言い換えれば、アノテーション処理部５５は、物体ラベルによって特定された物体から関係ラベルによって特定された事象に関係する物体を指定しアノテーション情報として動画像データに付加する処理を実行可能である。 Then, as shown in FIGS. 13 and 14, the annotation processing unit 55 is an object related to the event targeted by the relation label from the object label added as described above according to the operation in the annotation processing. It is possible to specify the object label of the above and add it to the moving image data as annotation information. In other words, the annotation processing unit 55 can execute a process of designating an object related to the event specified by the relation label from the object specified by the object label and adding it to the moving image data as annotation information.

これらの処理の結果、アノテーション処理部５５は、動画像データにアノテーション情報を付加して教師データＤ２を作成することができる。上記のようにアノテーション情報として付加される物体ラベルや関係ラベルは、動画像データが表す動画像に映った物体ごとの関係性を追跡した関係追跡情報を記録したものということもできる。アノテーション処理部５５は、作成した教師データＤ２を記憶回路４０に記憶させる。そして、アノテーション処理部５５によって作成された教師データＤ２は、データ入出力処理部５３の処理によってデータ入出力機器３０を介して学習用データ提供システムＳｖに送信され登録、記憶、管理される。以下、各図を参照してアノテーション処理における上記各処理の一例をより具体的に説明する。 As a result of these processes, the annotation processing unit 55 can add annotation information to the moving image data to create the teacher data D2. As described above, the object label and the relationship label added as the annotation information can also be said to record the relationship tracking information that traces the relationship of each object reflected in the moving image represented by the moving image data. The annotation processing unit 55 stores the created teacher data D2 in the storage circuit 40. Then, the teacher data D2 created by the annotation processing unit 55 is transmitted to the learning data providing system Sv via the data input / output device 30 by the processing of the data input / output processing unit 53, and is registered, stored, and managed. Hereinafter, an example of each of the above processes in the annotation process will be described more specifically with reference to each figure.

図９、図１０は、動画像データに物体ラベルを付加する際に表示機器１０に表示されるアノテーション画面１００Ａの一例を模式的に表している。ここでは、図９は、アノテーション画面１００Ａの全体を表し、図１０は、アノテーション画面１００Ａの一部における表示の遷移を表している。 9 and 10 schematically show an example of the annotation screen 100A displayed on the display device 10 when an object label is added to the moving image data. Here, FIG. 9 shows the entire annotation screen 100A, and FIG. 10 shows the transition of the display in a part of the annotation screen 100A.

表示処理部５１は、物体ラベルを動画像データに付加する操作を行う際に図９に示すようなアノテーション画面１００Ａを表示機器１０に表示させる。アノテーション処理部５５は、このアノテーション画面１００Ａを介して操作に応じて動画像データに物体ラベルを付加する処理を実行する。 The display processing unit 51 causes the display device 10 to display the annotation screen 100A as shown in FIG. 9 when performing an operation of adding an object label to the moving image data. The annotation processing unit 55 executes a process of adding an object label to the moving image data according to an operation via the annotation screen 100A.

図９に例示したアノテーション画面１００Ａは、種々の情報と共に、動画像表示領域１０１、ラベル追加操作領域１０２、ラベル選択領域１０３、及び、タスク終了操作領域１０４が表示される。 On the annotation screen 100A exemplified in FIG. 9, a moving image display area 101, a label addition operation area 102, a label selection area 103, and a task end operation area 104 are displayed together with various information.

動画像表示領域１０１は、アノテーション処理の対象となる動画像データの動画像を表示する領域である。動画像表示領域１０１は、典型的には、タスク作成処理部５４によって作成された一連のタスクに対応した動画像データの動画像を表示する。動画像表示領域１０１は、典型的には、動画像を構成する各フレームを静止画像として表示可能である。動画像表示領域１０１は、アノテーション画面１００Ａに表示された動画像操作領域１０１ａの操作に応じて、表示する動画像を再生し、当該動画像表示領域１０１に表示されるフレームを順次遷移させることができる。アノテーション処理部５５は、動画像表示領域１０１に表示されるフレームが順次遷移されることで、操作に応じて各フレームに対して順次アノテーション処理を施すことができる。 The moving image display area 101 is an area for displaying a moving image of moving image data to be annotated. The moving image display area 101 typically displays a moving image of moving image data corresponding to a series of tasks created by the task creation processing unit 54. The moving image display area 101 can typically display each frame constituting the moving image as a still image. The moving image display area 101 may reproduce the moving image to be displayed in response to the operation of the moving image operation area 101a displayed on the annotation screen 100A, and sequentially transition the frames displayed in the moving image display area 101. can. The annotation processing unit 55 can sequentially annotate each frame according to the operation by sequentially transitioning the frames displayed in the moving image display area 101.

ラベル追加操作領域１０２は、動画像表示領域１０１に表示されている動画像にラベルを付加する際に操作される領域である。 The label addition operation area 102 is an area operated when a label is added to the moving image displayed in the moving image display area 101.

ラベル選択領域１０３は、動画像に付加するラベルの種類を選択する際に操作される領域である。このラベル選択領域１０３に表示されるラベルは、言い換えれば、アノテーション処理において選択可能なラベルの候補に相当する。アノテーション処理において選択可能なラベルは、例えば、アノテーション装置１に読み込まれたラベル定義ファイルによって定義されてもよいし、操作に応じて適宜設定されてもよい。典型的には、アノテーション処理において選択可能なラベルは、動画像データに対してラベルを付加する一連のタスクが終了するまで固定とされる。 The label selection area 103 is an area operated when selecting the type of label to be added to the moving image. In other words, the label displayed in the label selection area 103 corresponds to a label candidate that can be selected in the annotation process. The label that can be selected in the annotation process may be defined by, for example, a label definition file read into the annotation device 1, or may be appropriately set according to the operation. Typically, the labels that can be selected in the annotation process are fixed until the end of a series of tasks for adding labels to the moving image data.

タスク終了操作領域１０４は、動画像データに対してラベルを付加する一連のタスクを終了する場合に操作される領域である。 The task end operation area 104 is an area operated when a series of tasks for adding a label to moving image data is completed.

アノテーション処理部５５は、アノテーション画面１００Ａにおいて、操作に応じてラベル追加操作領域１０２が選択されると、動画像表示領域１０１に表示されている動画像に位置指定用の矩形枠（円形枠、多角形枠等でもよい）の表示を追加する。そして、アノテーション処理部５５は、操作に応じて当該矩形枠によって動画像に含まれる物体の位置が指定されることで当該物体の位置を特定する。 When the label addition operation area 102 is selected in response to the operation on the annotation screen 100A, the annotation processing unit 55 has a rectangular frame (circular frame, multiple) for designating a position on the moving image displayed in the moving image display area 101. (It may be a rectangular frame, etc.) is added. Then, the annotation processing unit 55 specifies the position of the object by designating the position of the object included in the moving image by the rectangular frame according to the operation.

そして、アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているラベルリストからいずれかの１つのラベルが選択されることで、上記で位置が特定された物体の種類を表す物体ラベルを決定する。そして、アノテーション処理部５５は、決定した当該物体の種類を表す物体ラベルをアノテーション情報として動画像データに付加する。 Then, the annotation processing unit 55 selects one of the labels from the label list displayed in the label selection area 103 according to the operation, so that the object representing the type of the object whose position is specified above is represented. Determine the label. Then, the annotation processing unit 55 adds an object label representing the determined type of the object to the moving image data as annotation information.

図９に例示したラベル選択領域１０３は、選択可能なラベル（候補ラベル）を表すメインリスト１０３Ａを表示している。一例として、図９に例示したメインリスト１０３Ａは、「関係」、「信号機」、「速度標識」、「一時停止標識」等を表すラベルをリスト表示している。このメインリスト１０３Ａにおける「信号機」、「速度標識」、「一時停止標識」は、物体ラベルを表している一方、「関係」は、後述する関係ラベルを表している。アノテーション装置１は、動画像データに物体ラベルを付加する際には、典型的には、操作に応じて「信号機」、「速度標識」、「一時停止標識」等の物体ラベルからいずれかの１つのラベルが選択される。 The label selection area 103 illustrated in FIG. 9 displays the main list 103A representing selectable labels (candidate labels). As an example, the main list 103A illustrated in FIG. 9 displays a list of labels representing "relationship", "traffic light", "speed sign", "pause sign", and the like. The "traffic light", "speed sign", and "pause sign" in the main list 103A represent the object label, while the "relationship" represents the relationship label described later. When adding an object label to moving image data, the annotation device 1 typically includes one of object labels such as a "traffic light", a "speed sign", and a "stop sign" depending on the operation. One label is selected.

そしてここでは、図１０に例示するように、表示処理部５１は、操作に応じてラベル選択領域１０３に表示されているメインリスト１０３Ａからいずれかの１つの物体ラベルが選択されると、選択された物体ラベルの種類に応じてサブリスト１０３Ｂをラベル選択領域１０３に表示させる。サブリスト１０３Ｂは、選択された物体ラベルの種類をさらに細分化したより詳細な属性を選択するためのラベルリストである。図１０の例では、表示処理部５１は、操作に応じて「信号機」の物体ラベルが選択されると、「青」、「黄」、「赤」等の属性を表すサブリスト１０３Ｂをラベル選択領域１０３に表示させる。 Here, as illustrated in FIG. 10, the display processing unit 51 is selected when any one object label is selected from the main list 103A displayed in the label selection area 103 according to the operation. The sublist 103B is displayed in the label selection area 103 according to the type of the object label. Sublist 103B is a label list for selecting more detailed attributes that further subdivide the selected object label type. In the example of FIG. 10, when the object label of the "traffic light" is selected according to the operation, the display processing unit 51 selects the label of the sublist 103B representing the attributes such as "blue", "yellow", and "red". It is displayed in the area 103.

アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているサブリスト１０３Ｂからいずれかの１つの属性が選択されることで、上記で位置が特定された物体の種類（属性も含む）を表す物体ラベルを決定する。このとき、アノテーション処理部５５は、上記のように物体ラベルが決定された今回のフレームを、当該物体ラベルの対象とされた物体が映る開始キーフレーム（初回のフレーム）であるものとして記憶回路４０に記憶させる。ここで、上記のように決定された物体ラベルの対象とされた物体とは、言い換えれば、当該物体ラベルによって特定される物体に相当する。 The annotation processing unit 55 selects one of the attributes from the sublist 103B displayed in the label selection area 103 according to the operation, so that the type (including the attribute) of the object whose position is specified above is selected. ) Is determined. At this time, the annotation processing unit 55 stores the frame in which the object label is determined as described above as a start key frame (first frame) in which the object targeted by the object label is displayed. To memorize. Here, the object targeted by the object label determined as described above corresponds to, in other words, the object specified by the object label.

なお、表示処理部５１は、上記で選択された属性の種類に応じてさらに詳細な属性を選択可能は場合には、当該詳細な属性を選択させるためのサブリストをラベル選択領域１０３に表示させることも可能である。 If a more detailed attribute can be selected according to the type of the attribute selected above, the display processing unit 51 displays a sublist for selecting the detailed attribute in the label selection area 103. It is also possible.

そして、表示処理部５１は、上記のようにして物体ラベルが決定されると、例えば、図１０に例示するように、決定ラベル表示画像１０５と共に消去操作領域１０６、及び、終了操作領域１０７をアノテーション画面１００Ａに表示させる。 Then, when the object label is determined as described above, the display processing unit 51 annotates the erase operation area 106 and the end operation area 107 together with the determination label display image 105, for example, as illustrated in FIG. Display on screen 100A.

決定ラベル表示画像１０５は、上記のようにして決定された物体ラベルの種類を表す画像である。例えば、アノテーション処理部５５は、操作に応じて当該決定ラベル表示画像１０５が選択されると、再度、物体ラベルを選び直せるようにしてもよい。 The determination label display image 105 is an image showing the type of the object label determined as described above. For example, the annotation processing unit 55 may be able to reselect the object label when the determination label display image 105 is selected according to the operation.

消去操作領域１０６は、上記のようにして決定された物体ラベルを一旦消去（ｄｅｌｅｔｅ）する場合に操作される領域である。表示処理部５１は、操作に応じて当該消去操作領域１０６が選択されると、消去決定画像１０８をアノテーション画面１００Ａに表示させる。アノテーション処理部５５は、消去決定画像１０８において操作に応じて「ＯＫ」が選択されると、上記のようにして決定された物体ラベルを一旦消去する。一方、表示処理部５１は、消去決定画像１０８において操作に応じて「Ｃａｎｃｅｌ」が選択されると、当該消去決定画像１０８を非表示とし、元の表示画面に復帰させる。 The erasing operation area 106 is an area operated when the object label determined as described above is once deleted. When the erasing operation area 106 is selected according to the operation, the display processing unit 51 displays the erasing decision image 108 on the annotation screen 100A. When "OK" is selected in response to the operation in the deletion decision image 108, the annotation processing unit 55 temporarily erases the object label determined as described above. On the other hand, when "Cancel" is selected in response to the operation in the erasure decision image 108, the display processing unit 51 hides the erasure decision image 108 and returns it to the original display screen.

終了操作領域１０７は、上記のように決定された物体ラベルの対象とされた物体が映る最終のフレームを決定する場合に操作される領域である。言い換えれば、終了操作領域１０７は、当該物体ラベルの対象が映る終了のフレームで選択されることで、当該物体ラベルの最終のフレームを特定する操作を行うための領域である。 The end operation area 107 is an area operated when determining the final frame in which the object targeted by the object label determined as described above appears. In other words, the end operation area 107 is an area for performing an operation for specifying the final frame of the object label by being selected by the end frame in which the object of the object label is reflected.

例えば、表示処理部５１は、操作に応じて動画像表示領域１０１に表示される動画像のフレームを順次遷移させながら、当該ラベルの対象となった物体が映る最終のフレームを動画像表示領域１０１に表示させる。この間、当該ラベルの対象となった物体の位置が変化した場合には、アノテーション処理部５５は、操作に応じて当該物体の位置の変化に対応させて矩形枠で指定する物体の位置を修正する。このとき、アノテーション処理部５５は、先に物体の位置を指定したフレームと、ここで物体の位置を再度指定したフレームとの間の各フレームでの物体の位置を、先のフレームでの物体の位置と後のフレームでの物体の位置とに基づいて線形補間によって算出、特定する。つまり、アノテーション処理部５５は、先のフレームで指定した物体の位置と、このフレームで再指定した物体の位置とに基づいてこれらの間の各フレームでの物体の位置を線形補間等によって算出、特定する。これにより、アノテーション処理部５５は、物体ラベルの対象とされた物体の位置変化を追跡する。 For example, the display processing unit 51 sequentially shifts the frames of the moving image displayed in the moving image display area 101 according to the operation, and sets the final frame in which the object targeted by the label is displayed in the moving image display area 101. To display. During this time, if the position of the object targeted by the label changes, the annotation processing unit 55 corrects the position of the object specified by the rectangular frame in response to the change in the position of the object according to the operation. .. At this time, the annotation processing unit 55 sets the position of the object in each frame between the frame in which the position of the object is previously specified and the frame in which the position of the object is specified again, and the position of the object in the previous frame. Calculated and specified by linear interpolation based on the position and the position of the object in the subsequent frame. That is, the annotation processing unit 55 calculates the position of the object in each frame between them based on the position of the object specified in the previous frame and the position of the object redesignated in this frame by linear interpolation or the like. Identify. As a result, the annotation processing unit 55 tracks the position change of the object targeted by the object label.

そして、アノテーション処理部５５は、操作に応じて当該終了操作領域１０７が選択されると、当該フレームを、当該物体ラベルによって特定される物体が映る終了キーフレーム（最終のフレーム）であるものとして記憶回路４０に記憶させる。この結果、アノテーション処理部５５は、当該物体ラベルの時間範囲を決定することができる。 Then, when the end operation area 107 is selected according to the operation, the annotation processing unit 55 stores the frame as an end key frame (final frame) in which the object specified by the object label is reflected. Store in circuit 40. As a result, the annotation processing unit 55 can determine the time range of the object label.

上記のようにしてアノテーション処理部５５は、操作に応じてアノテーション処理の対象となる物体に対して、当該物体の位置、及び、当該物体の種類に応じた物体ラベルを決定すると共に、当該物体ラベルの開始キーフレームと終了キーフレームとを特定することで、この動画像における当該物体ラベルの時間範囲を決定することができる。 As described above, the annotation processing unit 55 determines the position of the object and the object label according to the type of the object for the object to be annotated according to the operation, and determines the object label according to the type of the object. By specifying the start key frame and the end key frame of, the time range of the object label in this moving image can be determined.

なおこのとき、表示処理部５１は、図１０に例示するように、終了操作領域１０７が選択されたことを表すように当該終了操作領域１０７の表示態様を変更（例えば、「Ｅｎｄ？」を「Ｅｎｄ」にすると共にグレーアウト）すると共に消去操作領域１０６を非表示とする。ここで、アノテーション処理部５５は、例えば、操作に応じて表示態様変更後の終了操作領域１０７が再度選択されることで、終了キーフレームの決定についての上記の処理を一旦解除することもできる。表示処理部５１は、終了キーフレームが確定すると、決定ラベル表示画像１０５、終了操作領域１０７等を非表示とし、アノテーション処理部５５は、次のフレームのアノテーション処理に移行する。 At this time, as illustrated in FIG. 10, the display processing unit 51 changes the display mode of the end operation area 107 so as to indicate that the end operation area 107 has been selected (for example, “End?” Is changed to “End?”. It is set to "End" and grayed out), and the erase operation area 106 is hidden. Here, the annotation processing unit 55 can temporarily cancel the above-mentioned processing for determining the end key frame by, for example, reselecting the end operation area 107 after changing the display mode according to the operation. When the end key frame is confirmed, the display processing unit 51 hides the decision label display image 105, the end operation area 107, and the like, and the annotation processing unit 55 shifts to the annotation processing of the next frame.

アノテーション処理部５５は、上記のようにして動画像表示領域１０１に表示されている動画像に映る物体ごとに決定した物体ラベルをアノテーション情報として動画像データに付加することができる。またこのとき、アノテーション処理部５５は、矩形枠で特定されている物体の位置を表す座標値等も当該物体ラベルの情報として付加する。物体ラベルは、１つではなく、個数に制限なく付加することができる。 The annotation processing unit 55 can add the object label determined for each object reflected in the moving image displayed in the moving image display area 101 to the moving image data as annotation information as described above. At this time, the annotation processing unit 55 also adds a coordinate value or the like representing the position of the object specified by the rectangular frame as the information of the object label. An unlimited number of object labels can be added instead of one.

次に、図１１、図１２は、動画像データに関係ラベルを付加する際に表示機器１０に表示されるアノテーション画面１００Ｂの一例を模式的に表している。ここでも、図１１は、アノテーション画面１００Ｂの全体を表し、図１２は、アノテーション画面１００Ａの一部における表示の遷移を表している。 Next, FIGS. 11 and 12 schematically show an example of the annotation screen 100B displayed on the display device 10 when a related label is added to the moving image data. Again, FIG. 11 shows the entire annotation screen 100B, and FIG. 12 shows the transition of the display in a part of the annotation screen 100A.

表示処理部５１は、関係ラベルを動画像データに付加する操作を行う際に図１１に示すようなアノテーション画面１００Ｂを表示機器１０に表示させる。アノテーション処理部５５は、このアノテーション画面１００Ｂを介して操作に応じて動画像データに関係ラベルを付加する処理を実行する。 The display processing unit 51 causes the display device 10 to display the annotation screen 100B as shown in FIG. 11 when performing an operation of adding the related label to the moving image data. The annotation processing unit 55 executes a process of adding a relational label to the moving image data according to an operation via the annotation screen 100B.

図１１に例示したアノテーション画面１００Ｂは、アノテーション画面１００Ａと同様に、動画像表示領域１０１、ラベル追加操作領域１０２、ラベル選択領域１０３、及び、タスク終了操作領域１０４等が表示される。そして、これらに加えて、アノテーション画面１００Ｂは、追加済みラベル表示領域１０９が表示される。 Similar to the annotation screen 100A, the annotation screen 100B illustrated in FIG. 11 displays a moving image display area 101, a label addition operation area 102, a label selection area 103, a task end operation area 104, and the like. Then, in addition to these, the added label display area 109 is displayed on the annotation screen 100B.

追加済みラベル表示領域１０９は、動画像表示領域１０１に表示されている動画像のフレームにおいて、上記のようにして既に付加されている物体ラベルの種類をリスト表示する領域である。 The added label display area 109 is an area for displaying a list of the types of object labels already added as described above in the moving image frame displayed in the moving image display area 101.

この場合、表示処理部５１は、操作に応じて上記のようにして付加された物体ラベルの対象とされた物体が相関する事象（挙動）が始まるフレームを動画像表示領域１０１に表示させる。そして、アノテーション処理部５５は、当該事象が始まるフレームから関係ラベルの作成を開始する。アノテーション処理部５５は、この状態でアノテーション画面１００Ｂにおいて、操作に応じてラベル追加操作領域１０２が選択されると、動画像表示領域１０１に表示されている動画像に矩形枠の表示を追加する。アノテーション処理部５５は、動画像データに関係ラベルを付加する場合、操作に応じて当該矩形枠によって動画像の当該フレームの全体が指定される。 In this case, the display processing unit 51 causes the moving image display area 101 to display a frame at which an event (behavior) in which the target object of the object label added as described above starts to correlate with each other according to the operation starts. Then, the annotation processing unit 55 starts creating the relation label from the frame at which the event starts. In this state, when the label addition operation area 102 is selected in response to the operation on the annotation screen 100B, the annotation processing unit 55 adds the display of the rectangular frame to the moving image displayed in the moving image display area 101. When the annotation processing unit 55 adds a relational label to the moving image data, the entire frame of the moving image is designated by the rectangular frame according to the operation.

そして、アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているラベルリストから「関係」を表す関係ラベルが選択されることで、関係ラベルの種類の選択に移行する。 Then, the annotation processing unit 55 shifts to the selection of the type of the relation label by selecting the relation label representing the "relationship" from the label list displayed in the label selection area 103 according to the operation.

表示処理部５１は、図１２に例示するように、操作に応じてラベル選択領域１０３に表示されているメインリスト１０３Ａから「関係」を表す関係ラベルが選択されると、関係ラベルの種類に応じてサブリスト１０３Ｃをラベル選択領域１０３に表示させる。サブリスト１０３Ｃは、関係ラベルの種類、言い換えれば、複数の物体が相関する事象（挙動）の種類を選択するためのラベルリストである。図１２の例では、表示処理部５１は、操作に応じて「関係」の関係ラベルが選択されると、「安全」、「注意」、「違反」等の関係ラベルの種類を表すサブリスト１０３Ｃをラベル選択領域１０３に表示させる。 As illustrated in FIG. 12, when the relationship label representing "relationship" is selected from the main list 103A displayed in the label selection area 103 according to the operation, the display processing unit 51 responds to the type of the relationship label. The sublist 103C is displayed in the label selection area 103. Sublist 103C is a label list for selecting the type of relational label, in other words, the type of event (behavior) in which a plurality of objects correlate. In the example of FIG. 12, when the relationship label of "relationship" is selected according to the operation, the display processing unit 51 represents the type of the relationship label such as "safety", "caution", and "violation". Is displayed in the label selection area 103.

さらにここでは、表示処理部５１は、図１２に例示するように、操作に応じてラベル選択領域１０３に表示されているサブリスト１０３Ｃからいずれかの１つの関係ラベルの種類が選択されると、当該選択された関係ラベルの種類に応じてサブリスト１０３Ｄをラベル選択領域１０３に表示させる。サブリスト１０３Ｄは、選択された関係ラベルの種類をさらに細分化したより詳細な属性を選択するためのラベルリストである。図１２の例では、表示処理部５１は、操作に応じて「違反」の関係ラベルが選択されると、「信号機無視」、「一時停止線無視」等の属性を表すサブリスト１０３Ｄをラベル選択領域１０３に表示させる。 Further, here, as illustrated in FIG. 12, the display processing unit 51 selects one of the related label types from the sublist 103C displayed in the label selection area 103 according to the operation. The sublist 103D is displayed in the label selection area 103 according to the type of the selected relational label. The sublist 103D is a label list for selecting more detailed attributes by further subdividing the selected relation label type. In the example of FIG. 12, when the relational label of "violation" is selected according to the operation, the display processing unit 51 selects the label of the sublist 103D representing the attributes such as "ignore traffic light" and "ignore stop line". It is displayed in the area 103.

アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているサブリスト１０３Ｄからいずれかの１つの属性が選択されることで、今回のフレームで発生が認められる事象（複数の物体が相関する事象）の種類（属性も含む）を表す関係ラベルを決定する。このとき、アノテーション処理部５５は、上記のように関係ラベルが決定された今回のフレームを、当該関係ラベルの対象とされた事象が映る開始キーフレーム（初回のフレーム）であるものとして記憶回路４０に記憶させる。ここで、上記のように決定された関係ラベルの対象とされた事象とは、言い換えれば、当該関係ラベルによって特定される事象に相当する。 The annotation processing unit 55 selects one of the attributes from the sublist 103D displayed in the label selection area 103 according to the operation, so that the event (multiple objects) that occur in the current frame is recognized. Determine the relationship label that represents the type (including attributes) of the correlated event). At this time, the annotation processing unit 55 stores the current frame in which the relational label is determined as described above as a start key frame (first frame) in which the event targeted by the relational label is displayed. To memorize. Here, the event targeted by the relationship label determined as described above corresponds to, in other words, the event specified by the relationship label.

そして、表示処理部５１は、上記のようにして関係ラベルが決定されると、図１０と同様に、例えば、図１２に例示するように、消去操作領域１０６、及び、終了操作領域１０７をアノテーション画面１００Ｂに表示させる。 Then, when the relational label is determined as described above, the display processing unit 51 annotates the erase operation area 106 and the end operation area 107, for example, as illustrated in FIG. 12, as in FIG. Display on screen 100B.

消去操作領域１０６は、上記と同様に、上記のようにして決定された関係ラベルを一旦消去（ｄｅｌｅｔｅ）する場合に操作される領域である。表示処理部５１は、操作に応じて当該消去操作領域１０６が選択されると、消去決定画像１０８をアノテーション画面１００Ｂに表示させる。アノテーション処理部５５は、消去決定画像１０８において操作に応じて「ＯＫ」が選択されると、上記のようにして決定された関係ラベルを一旦消去する。一方、表示処理部５１は、消去決定画像１０８において操作に応じて「Ｃａｎｃｅｌ」が選択されると、当該消去決定画像１０８を非表示とし、元の表示画面に復帰させる。 The erase operation area 106 is an area operated when the relational label determined as described above is temporarily deleted, as described above. When the erasing operation area 106 is selected according to the operation, the display processing unit 51 displays the erasing decision image 108 on the annotation screen 100B. When "OK" is selected in response to the operation in the deletion decision image 108, the annotation processing unit 55 temporarily erases the relation label determined as described above. On the other hand, when "Cancel" is selected in response to the operation in the erasure decision image 108, the display processing unit 51 hides the erasure decision image 108 and returns it to the original display screen.

終了操作領域１０７は、上記と同様に、上記のように決定された関係ラベルの対象とされた事象が映る最終のフレームを決定する場合に操作される領域である。言い換えれば、終了操作領域１０７は、関係ラベルの対象が映る終了のフレームで選択されることで、当該関係ラベルの最終のフレームを特定する操作を行うための領域である。 Similar to the above, the end operation area 107 is an area operated when determining the final frame in which the event targeted by the relationship label determined as described above appears. In other words, the end operation area 107 is an area for performing an operation for specifying the final frame of the relation label by being selected in the end frame in which the target of the relation label is reflected.

例えば、表示処理部５１は、操作に応じて動画像表示領域１０１に表示される動画像のフレームを順次遷移させながら、当該ラベルの対象となった事象が映る最終のフレームを動画像表示領域１０１に表示させる。 For example, the display processing unit 51 sequentially shifts the frames of the moving image displayed in the moving image display area 101 according to the operation, and sets the final frame in which the event targeted by the label is displayed in the moving image display area 101. To display.

そして、アノテーション処理部５５は、操作に応じて当該終了操作領域１０７が選択されると、当該フレームを、当該関係ラベルによって特定される事象が映る終了キーフレーム（最終のフレーム）であるものとして記憶回路４０に記憶させる。この結果、アノテーション処理部５５は、当該関係ラベルの時間範囲を決定することができる。 Then, when the end operation area 107 is selected according to the operation, the annotation processing unit 55 stores the frame as an end key frame (final frame) in which the event specified by the relational label is reflected. Store in circuit 40. As a result, the annotation processing unit 55 can determine the time range of the relation label.

上記のようにしてアノテーション処理部５５は、操作に応じてアノテーション処理の対象となる事象に対して、当該事象の種類に応じた関係ラベルを決定すると共に、当該関係ラベルの開始キーフレームと終了キーフレームとを特定することで、この動画像における当該関係ラベルの時間範囲を決定することができる。 As described above, the annotation processing unit 55 determines the relational label according to the type of the event for the event to be annotated according to the operation, and the start key frame and the end key of the relational label. By specifying the frame, the time range of the relation label in this moving image can be determined.

このとき、表示処理部５１は、上記と同様に、図１２に例示するように、終了操作領域１０７が選択されたことを表すように当該終了操作領域１０７の表示態様を変更すると共に消去操作領域１０６を非表示とする。ここでも、アノテーション処理部５５は、例えば、操作に応じて表示態様変更後の終了操作領域１０７が再度選択されることで、終了キーフレームの決定についての上記の処理を一旦解除することもできる。表示処理部５１は、終了キーフレームが確定すると、終了操作領域１０７等を非表示とし、アノテーション処理部５５は、次のフレームのアノテーション処理に移行する。 At this time, similarly to the above, the display processing unit 51 changes the display mode of the end operation area 107 so as to indicate that the end operation area 107 has been selected, and the erase operation area 107, as illustrated in FIG. 106 is hidden. Here, too, the annotation processing unit 55 can temporarily cancel the above-mentioned processing for determining the end key frame by, for example, reselecting the end operation area 107 after the display mode is changed according to the operation. When the end key frame is confirmed, the display processing unit 51 hides the end operation area 107 and the like, and the annotation processing unit 55 shifts to the annotation processing of the next frame.

アノテーション処理部５５は、上記のようにして動画像表示領域１０１に表示されている動画像の事象ごとに決定した関係ラベルをアノテーション情報として動画像データに付加することができる。関係ラベルは、１つではなく、個数に制限なく付加することができる。 The annotation processing unit 55 can add the relational label determined for each event of the moving image displayed in the moving image display area 101 as the annotation information to the moving image data as described above. The number of relation labels may be limited to one and not one.

次に、図１３、図１４は、付加した物体ラベルから関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定する際に表示機器１０に表示されるアノテーション画面１００Ｃの一例を模式的に表している。ここでも、図１３は、アノテーション画面１００Ｃの全体を表し、図１４は、アノテーション画面１００Ｃの一部における表示の遷移を表している。 Next, FIGS. 13 and 14 schematically show an example of the annotation screen 100C displayed on the display device 10 when designating the object label of the object related to the event targeted by the relation label from the added object label. It is represented by. Again, FIG. 13 shows the entire annotation screen 100C, and FIG. 14 shows the transition of the display in a part of the annotation screen 100C.

表示処理部５１は、関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定するアノテーション情報を動画像データに付加する操作を行う際に図１３に示すようなアノテーション画面１００Ｃを表示機器１０に表示させる。アノテーション処理部５５は、このアノテーション画面１００Ｃを介して操作に応じて、動画像データに、事象に関係する物体を指定するアノテーション情報を付加する処理を実行する。 The display processing unit 51 displays the annotation screen 100C as shown in FIG. 13 when performing an operation of adding annotation information for designating an object label of an object related to an event targeted by the relation label to the moving image data. Display on 10. The annotation processing unit 55 executes a process of adding annotation information that specifies an object related to an event to the moving image data in response to an operation via the annotation screen 100C.

図１３に例示したアノテーション画面１００Ｃは、アノテーション画面１００Ａ、１００Ｂと同様に、動画像表示領域１０１、ラベル追加操作領域１０２、ラベル選択領域１０３、及び、タスク終了操作領域１０４等が表示される。そして、これらに加えて、アノテーション画面１００Ｂは、スライドバー表示領域１１０が表示される。 Similar to the annotation screens 100A and 100B, the annotation screen 100C illustrated in FIG. 13 displays a moving image display area 101, a label addition operation area 102, a label selection area 103, a task end operation area 104, and the like. Then, in addition to these, the annotation screen 100B displays the slide bar display area 110.

スライドバー表示領域１１０は、上記のように付加、決定された物体ラベル、関係ラベルそれぞれの動画像における時間範囲をスライドバーによって表示する領域である。なお、アノテーション処理部５５は、操作に応じてこのスライドバーの始端位置、終端位置を変更することにより、それぞれのラベルにおける時間範囲の開始キーフレーム、終了キーフレームを事後的に修正することができる。 The slide bar display area 110 is an area in which the time range in the moving image of each of the object label and the relational label added and determined as described above is displayed by the slide bar. In addition, the annotation processing unit 55 can modify the start key frame and the end key frame of the time range in each label after the fact by changing the start end position and the end position of the slide bar according to the operation. ..

アノテーション処理部５５は、上記のように関係ラベルの開始キーフレームと終了キーフレームとが決定され、対象の動画像における当該関係ラベルの時間範囲が決定されると、当該時間範囲において１フレームでも含まれる物体ラベルを検索、抽出する。そして、表示処理部５１は、アノテーション処理部５５によって抽出されたすべての物体ラベルの時間範囲、及び、関係ラベルの時間範囲をそれぞれ異なるスライドバーでスライドバー表示領域１１０に表示させる。 When the start key frame and the end key frame of the relation label are determined as described above and the time range of the relation label in the target moving image is determined, the annotation processing unit 55 includes even one frame in the time range. Search and extract object labels. Then, the display processing unit 51 displays the time range of all the object labels extracted by the annotation processing unit 55 and the time range of the relational labels in the slide bar display area 110 with different slide bars.

図１３の例では、表示処理部５１は、スライドバー表示領域１１０において、各スライドバーの左端に各物体ラベルに対応した物体の種類（名称）を表示させる。ここで、表示処理部５１は、同じ種類の物体ラベルが複数ある場合には、例えば、表示名に連番の数字を追加する等、これらを相互に区別するための情報を付加して表示させてもよい。また、表示処理部５１は、スライドバー表示領域１１０の各スライドバー、及び、動画像表示領域１０１の動画像上の矩形枠も各物体ラベル応じて色分けで区別して表示させてもよい。また、表示処理部５１は、操作に応じてスライドバーや矩形枠にポインタをあわせることで該当する物体ラベルについての情報をポップアップで表示させるようにしてもよい。 In the example of FIG. 13, the display processing unit 51 displays the type (name) of the object corresponding to each object label at the left end of each slide bar in the slide bar display area 110. Here, when there are a plurality of object labels of the same type, the display processing unit 51 adds information for distinguishing them from each other, such as adding a serial number to the display name, and displays the label. You may. Further, the display processing unit 51 may display each slide bar in the slide bar display area 110 and the rectangular frame on the moving image in the moving image display area 101 by color coding according to each object label. Further, the display processing unit 51 may display information about the corresponding object label in a pop-up by moving the pointer to the slide bar or the rectangular frame according to the operation.

そして、アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているラベルリストから「関係」を表す関係ラベルが選択されることで、関係ラベルの対象とされた事象に関係する物体の選択に移行する。 Then, the annotation processing unit 55 selects a relational label representing "relationship" from the label list displayed in the label selection area 103 according to the operation, so that the object related to the event targeted by the relational label is selected. Move to the selection of.

表示処理部５１は、図１４に例示するように、操作に応じてラベル選択領域１０３に表示されているメインリスト１０３Ａから「関係」を表す関係ラベルが選択されると、サブリスト１０３Ｅをラベル選択領域１０３に表示させる。サブリスト１０３Ｅは、当該関係ラベルの対象とされた事象に関係する物体の候補となる物体ラベルのラベルリストである。ここでは、サブリスト１０３Ｅは、上記のようにアノテーション処理部５５によって当該関係ラベルの時間範囲に含まれるラベルとして抽出された物体ラベル（候補ラベル）のラベルリストである。言い換えれば、サブリスト１０３Ｅは、当該関係ラベルの時間範囲において１フレームでも含まれる物体ラベルのラベルリストである。図１４の例では、表示処理部５１は、操作に応じて「関係」の関係ラベルが選択されると、「信号機」、「一時停止線１」、「一時停止線２」等の物体ラベルを表すサブリスト１０３Ｅをラベル選択領域１０３に表示させる。 As illustrated in FIG. 14, the display processing unit 51 selects the sublist 103E as a label when the relation label representing the "relationship" is selected from the main list 103A displayed in the label selection area 103 according to the operation. It is displayed in the area 103. Sublist 103E is a label list of object labels that are candidates for objects related to the event targeted by the relation label. Here, the sublist 103E is a label list of object labels (candidate labels) extracted as labels included in the time range of the relational label by the annotation processing unit 55 as described above. In other words, the sublist 103E is a label list of object labels that are included in at least one frame in the time range of the relational label. In the example of FIG. 14, when the relationship label of "relationship" is selected according to the operation, the display processing unit 51 displays an object label such as "traffic light", "pause line 1", and "pause line 2". The sublist 103E to be represented is displayed in the label selection area 103.

アノテーション処理部５５は、操作に応じてラベル選択領域１０３に表示されているサブリスト１０３Ｅからいずれかの物体ラベルが選択されることで、既に付加されている物体ラベルから当該関係ラベルの対象とされた事象に関係する物体の物体ラベルを決定する。例えば、「信号機無視」という事象に対しては、アノテーション処理部５５は、操作に応じて「信号機」を表す物体ラベル、対象の車両の車線における「一時停止線１」を表す物体ラベルが選択される一方、対象の車両の車線の反対車線における「一時停止線２」を表す物体ラベルは選択されない。アノテーション処理部５５は、操作に応じて当該関係ラベルによって特定された事象に関係する物体の物体ラベルを複数決定することができる。 The annotation processing unit 55 selects any object label from the sublist 103E displayed in the label selection area 103 according to the operation, so that the object label already added is targeted by the related label. Determine the object label of the object involved in the event. For example, for the event of "ignoring the traffic light", the annotation processing unit 55 selects an object label representing the "traffic light" and an object label representing the "stop line 1" in the lane of the target vehicle according to the operation. On the other hand, the object label representing "stop line 2" in the opposite lane of the target vehicle's lane is not selected. The annotation processing unit 55 can determine a plurality of object labels of the object related to the event specified by the relation label according to the operation.

そして、表示処理部５１は、上記のようにして当該関係ラベルの対象とされた事象に関係する物体の物体ラベルが決定されると、上記と同様に、例えば、図１４に例示するように、消去操作領域１０６、及び、終了操作領域１０７をアノテーション画面１００Ｃに表示させる。 Then, when the object label of the object related to the event targeted by the relation label is determined as described above, the display processing unit 51, for example, as illustrated in FIG. 14, in the same manner as above. The erase operation area 106 and the end operation area 107 are displayed on the annotation screen 100C.

消去操作領域１０６は、上記と同様に、上記のようにして決定された物体ラベルを一旦消去（ｄｅｌｅｔｅ）する場合に操作される領域である。表示処理部５１は、操作に応じて当該消去操作領域１０６が選択されると、消去決定画像１０８をアノテーション画面１００Ｃに表示させる。アノテーション処理部５５は、消去決定画像１０８において操作に応じて「ＯＫ」が選択されると、上記のようにして決定された物体ラベルを一旦消去する。一方、表示処理部５１は、消去決定画像１０８において操作に応じて「Ｃａｎｃｅｌ」が選択されると、当該消去決定画像１０８を非表示とし、元の表示画面に復帰させる。 Similarly to the above, the erasing operation area 106 is an area operated when the object label determined as described above is temporarily deleted. When the erasing operation area 106 is selected according to the operation, the display processing unit 51 displays the erasing decision image 108 on the annotation screen 100C. When "OK" is selected in response to the operation in the deletion decision image 108, the annotation processing unit 55 temporarily erases the object label determined as described above. On the other hand, when "Cancel" is selected in response to the operation in the erasure decision image 108, the display processing unit 51 hides the erasure decision image 108 and returns it to the original display screen.

終了操作領域１０７は、上記のように当該関係ラベルによって特定された事象に関係する物体の物体ラベルの決定を終了する場合に操作される領域である。アノテーション処理部５５は、操作に応じて当該終了操作領域１０７が選択されると、当該関係ラベルによって特定された事象に関係する物体の物体ラベルの決定を終了する。 The end operation area 107 is an area operated when the determination of the object label of the object related to the event specified by the relation label as described above is completed. When the end operation area 107 is selected according to the operation, the annotation processing unit 55 ends the determination of the object label of the object related to the event specified by the relation label.

このとき、表示処理部５１は、上記と同様に、図１４に例示するように、終了操作領域１０７が選択されたことを表すように当該終了操作領域１０７の表示態様を変更すると共に消去操作領域１０６を非表示とする。 At this time, similarly to the above, the display processing unit 51 changes the display mode of the end operation area 107 so as to indicate that the end operation area 107 has been selected, and the erase operation area 107, as illustrated in FIG. 106 is hidden.

アノテーション処理部５５は、上記のようにして付加された物体ラベルから上記関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定しアノテーション情報として動画像データに付加することができる。 The annotation processing unit 55 can specify the object label of the object related to the event targeted by the relation label from the object label added as described above and add it to the moving image data as annotation information.

以上のように、アノテーション処理部５５は、上記のようにして動画像データに各種アノテーション情報を付加して教師データＤ２を作成し、当該作成した教師データＤ２を記憶回路４０に記憶させる。教師データＤ２は、例えば、ラベル毎に対応する「動画像データファイルを特定可能なパス」、ラベルを付加した物体、事象が映っている「開始キーフレーム（最初に確認したフレームの番号）」、「終了キーフレーム（最後に確認したフレームの番号）」等の情報を含んでいる。そして、アノテーション処理部５５は、動画像データに対してラベルを付加する一連のタスクが終了した際に、操作に応じてタスク終了操作領域１０４が選択されることで当該タスクを終了する。 As described above, the annotation processing unit 55 adds various annotation information to the moving image data to create the teacher data D2, and stores the created teacher data D2 in the storage circuit 40. The teacher data D2 is, for example, a "path that can identify a moving image data file" corresponding to each label, an object to which a label is attached, and a "start key frame (number of the first confirmed frame)" that shows an event. It contains information such as "end key frame (the number of the last confirmed frame)". Then, when the series of tasks for adding a label to the moving image data is completed, the annotation processing unit 55 ends the task by selecting the task end operation area 104 according to the operation.

ここで、上記のようなアノテーション処理の対象となる動画像データのファイル（動画像データファイル）は、サイズ（データ量）が相対的に大きくなり易い傾向にある。このため、動画像データファイルは、一連の動画像を一定時間や一定サイズごとにファイルを分割して保存することが一般的に行われる。この場合に、例えば、既存のアノテーション装置では、例えば、１つの分割ファイルに対するアノテーション処理によってラベルを作成し、当該１つの分割ファイルに対しての作業結果として、アノテーションファイル（教師データ）を１つ保存することがある。 Here, the size (data amount) of the moving image data file (moving image data file) that is the target of the annotation processing as described above tends to be relatively large. For this reason, as a moving image data file, it is generally performed to divide a series of moving images into a file for a certain period of time or at a certain size and save the file. In this case, for example, in an existing annotation device, a label is created by annotation processing for one divided file, and one annotation file (teacher data) is saved as a work result for the one divided file. I have something to do.

ここで、例えば、既存のアノテーション装置は、分割された分割ファイル群において、ある分割ファイルの動画像中に映っている物体に対して物体ラベルを付加する処理を行っていた際、処理の対象となっている分割ファイルの動画像中ではその物体が映り終わることなく、当該物体が時系列的に次の分割ファイルの動画像にまで継続して映り続ける場合がある。 Here, for example, when an existing annotation device performs a process of adding an object label to an object appearing in a moving image of a certain divided file in a divided divided file group, it is a target of processing. The object may not be completely reflected in the moving image of the divided file, and the object may be continuously reflected in the moving image of the next divided file in chronological order.

このとき、作業者は、例えば、既存のアノテーション装置上で現在処理中の分割ファイルを閉じ、次の分割ファイルを開いて作業を継続するが、先ほど追加した物体ラベルが次の分割ファイルの動画像でも前の分割ファイルの動画像から継続している取り扱いとなるように設定し直さなければならない場合がある。この場合、例えば、その設定にわずかでも違いがあれば、教師データに誤差を生むことになり、この結果、教師データの品質を低下させてしまうおそれがある。 At this time, for example, the worker closes the split file currently being processed on the existing annotation device, opens the next split file, and continues the work, but the object label added earlier is the moving image of the next split file. However, it may be necessary to reset it so that it will be handled continuously from the moving image of the previous split file. In this case, for example, if there is even a slight difference in the settings, an error may occur in the teacher data, and as a result, the quality of the teacher data may be deteriorated.

また、上記のように一連の動画像を複数の分割ファイルとした場合、一連の動画像であるにもかかわらず複数の分割ファイルに対して、複数のアノテーションファイル（教師データ）が保存される場合がある。既存のアノテーション装置は、例えば、アノテーションファイル（教師データ）に対して備考のような任意のテキストを入力することができる機能を有するものもあり、その任意項目にＩＤ番号等を入力することができるものもある。このような機能をする既存のアノテーション装置では、例えば、異なる分割ファイルに分かれて映っている同一の物体に対して、同じＩＤ番号を設定する場合がある。そして、例えば、そのＩＤ番号に基づいて複数のアノテーションファイルから一連となるラベルを抽出しその一連のラベルを１つのアノテーションファイルとして連結して保存する機能を持つ変換ツールを用いることで、一連の動画像が複数の分割ファイルに分割されていても、同一の物体に対するラベルが１つのアノテーションファイル（教師データ）となるように変換することができるものもある。 In addition, when a series of moving images is made into a plurality of divided files as described above, a plurality of annotation files (teacher data) are saved for a plurality of divided files even though the series of moving images is a series of moving images. There is. Some existing annotation devices have a function of inputting arbitrary text such as remarks to an annotation file (teacher data), and an ID number or the like can be input to the arbitrary item. There are also things. In the existing annotation device having such a function, for example, the same ID number may be set for the same object divided into different divided files. Then, for example, by using a conversion tool having a function of extracting a series of labels from a plurality of annotation files based on the ID number and concatenating and saving the series of labels as one annotation file, a series of moving images can be obtained. Even if the image is divided into a plurality of divided files, there are some that can be converted so that the label for the same object becomes one annotation file (teacher data).

しかしながら、このような変換ツールを作成することは煩雑である。また、一連のラベルであることを示すＩＤ番号は、別途、作業者等によって記録しておくことが必要となる。ラベルの総数は、例えば、数千や数万を超えることも多々あり、このような作業は、非常に煩雑であり、間違いも発生し易く、この点でも教師データの品質を低下させてしまうおそれがある。 However, creating such a conversion tool is cumbersome. Further, the ID number indicating that the label is a series of labels needs to be separately recorded by an operator or the like. The total number of labels often exceeds, for example, thousands or tens of thousands, and such work is very cumbersome, error-prone, and can also reduce the quality of teacher data. There is.

そこで、本実施形態のアノテーション処理部５５は、上記のようなアノテーション処理において、予め指定された複数の動画像データファイルを一連の動画像を表す動画像データとして取り扱う。そして、アノテーション処理部５５は、典型的には、このように一連の動画像を表す動画像データとして取り扱った複数の動画像データファイルに対して１つのまとまった教師データＤ２（アノテーションファイル）を作成するように構成することができる。ここでは、タスク作成処理部５４は、予め指定された複数の動画像データファイルが一連の動画像を表す動画像データとして取り扱われるように、アノテーション処理部５５によって処理する一連のタスクを作成する。以下、具体的に説明する。 Therefore, in the annotation processing as described above, the annotation processing unit 55 of the present embodiment handles a plurality of predetermined moving image data files as moving image data representing a series of moving images. Then, the annotation processing unit 55 typically creates one set of teacher data D2 (annotation file) for a plurality of moving image data files handled as moving image data representing a series of moving images in this way. Can be configured to. Here, the task creation processing unit 54 creates a series of tasks to be processed by the annotation processing unit 55 so that a plurality of predetermined moving image data files are treated as moving image data representing a series of moving images. Hereinafter, a specific description will be given.

まず、本実施形態のタスク作成処理部５４は、図１５に例示するように、記憶回路４０の指定ファイル記憶領域４０ａにおいて、一連の動画像となる複数の動画像データファイルを保存している場所を時系列順に記録したリストファイルを作成しておく。ここで、このリストファイルの作成は、例えば、ビッグデータを保存したデータ基盤（例えば、学習用データ提供システムＳｖ等）に対して検索条件を設定した検索コマンドを送ると、その検索条件に合致したリストファイルを自動で出力するデータ基盤を用いることができるがこれに限らず、当該リストファイルを人手により作成してもよい。また、タスク作成処理部５４は、一連の動画像となる複数の動画像データファイルに対して時系列順に連番となるファイル名を設定するようにしてもよい。 First, as illustrated in FIG. 15, the task creation processing unit 54 of the present embodiment stores a plurality of moving image data files that are a series of moving images in the designated file storage area 40a of the storage circuit 40. Create a list file that records the data in chronological order. Here, the creation of this list file matches the search conditions when, for example, a search command in which search conditions are set is sent to a data infrastructure (for example, a learning data providing system Sv or the like) in which big data is stored. A data infrastructure that automatically outputs a list file can be used, but the present invention is not limited to this, and the list file may be manually created. Further, the task creation processing unit 54 may set file names that are serial numbers in chronological order for a plurality of moving image data files that are a series of moving images.

そして、本実施形態のアノテーション装置１は、上述したアノテーション画面１００Ａ、１００Ｂ、１００Ｃ等を用いた作業モードの他に、図１６に例示するような管理モード画面２００を用いた管理モードも実装している。なお、このアノテーション装置１は、例えば、作業モードと管理モードとの両方を実行可能である「管理者」や作業モードだけを実行可能である「作業者」等のユーザ権限を設定することができる機能を有していてもよい。また、アノテーション装置１は、「管理者」用の機器と「作業者」用の機器とが別個に構成されてもよい。 The annotation device 1 of the present embodiment implements a management mode using the management mode screen 200 as illustrated in FIG. 16 in addition to the work mode using the annotation screens 100A, 100B, 100C and the like described above. There is. In addition, this annotation device 1 can set user authority such as "administrator" who can execute both the work mode and the management mode and "worker" who can execute only the work mode. It may have a function. Further, in the annotation device 1, the device for the "administrator" and the device for the "worker" may be configured separately.

図１６に例示した管理モード画面２００は、管理モードの際に表示機器１０に表示される画面であり、タスク作成操作領域２０１が表示される。タスク作成操作領域２０１は、動画像データに対してラベルを付加する一連のタスクを作成する際に操作される領域である。なおここは、管理モード画面２００は、タスク作成操作領域２０１のみを図示しており、他の管理用操作領域の図示については省略している。 The management mode screen 200 illustrated in FIG. 16 is a screen displayed on the display device 10 in the management mode, and the task creation operation area 201 is displayed. The task creation operation area 201 is an area operated when creating a series of tasks for adding a label to moving image data. Here, the management mode screen 200 shows only the task creation operation area 201, and the illustration of other management operation areas is omitted.

表示処理部５１は、操作に応じてタスク作成操作領域２０１が選択されると、図１７に例示するようなタスク作成画面３００を表示機器１０に表示させる。図１７に例示するタスク作成画面３００は、動画像データに対してラベルを付加する一連のタスクを作成する際に表示機器１０に表示される画面である。さらに言えば、当該タスク作成画面３００は、アノテーション処理において、一連の動画像を表す動画像データとして取り扱う複数の動画像データファイルを指定する際に表示される画面である。表示処理部５１は、タスク作成画面３００において、ファイルダイアログを表示させることで、一連のタスクとして取り扱いたい動画像データファイル、言い換えれば、一連の動画像を表す動画像データとして取り扱いたい動画像データファイルを選択させる。 When the task creation operation area 201 is selected according to the operation, the display processing unit 51 causes the display device 10 to display the task creation screen 300 as illustrated in FIG. The task creation screen 300 illustrated in FIG. 17 is a screen displayed on the display device 10 when creating a series of tasks for adding a label to moving image data. Furthermore, the task creation screen 300 is a screen displayed when a plurality of moving image data files to be handled as moving image data representing a series of moving images are specified in the annotation process. The display processing unit 51 displays a file dialog on the task creation screen 300 to display a moving image data file to be handled as a series of tasks, in other words, a moving image data file to be handled as moving image data representing a series of moving images. Let me select.

タスク作成処理部５４は、例えば、このファイルダイアログを介して、操作に応じて上記リストファイルが選択されることで、一連の動画像を表す動画像データとして取り扱いたい複数の動画像データファイルを指定する。あるいは、タスク作成処理部５４は、例えば、ファイルが選択されていない状態で操作に応じてファイルダイアログの「開く」が選択されると、連番のファイル名の一連となる複数の動画像データファイルが保存されたディレクトリであると設定された、と認識する。 The task creation processing unit 54 specifies, for example, a plurality of moving image data files to be handled as moving image data representing a series of moving images by selecting the above list file according to an operation via this file dialog. do. Alternatively, the task creation processing unit 54 may, for example, select a plurality of moving image data files in a series of serial number file names when "Open" in the file dialog is selected in response to an operation when no file is selected. Recognizes that is set to be the saved directory.

このようにして、タスク作成処理部５４は、ここで指定された複数の動画像データファイルが一連の動画像を表す動画像データとして取り扱われるように、一連のタスクを作成する。この結果、アノテーション処理部５５は、アノテーション処理において、ここで指定された複数の動画像データファイルを一連の動画像を表す動画像データとして取り扱う。これにより、アノテーション装置１は、分割された複数の動画像データファイルであっても、アノテーション画面１００Ａ、１００Ｂ、１００Ｃの動画像表示領域１０１において、ここで指定された複数の動画像データファイルを一連の動画像データの動画像のように連続再生することができる。 In this way, the task creation processing unit 54 creates a series of tasks so that the plurality of moving image data files specified here are treated as moving image data representing a series of moving images. As a result, in the annotation processing, the annotation processing unit 55 handles the plurality of moving image data files specified here as moving image data representing a series of moving images. As a result, the annotation device 1 sets a series of the plurality of moving image data files specified here in the moving image display area 101 of the annotation screens 100A, 100B, 100C even if the plurality of divided moving image data files are divided. It can be continuously reproduced like the moving image of the moving image data of.

なおここでは、アノテーション装置１は、あくまでも上記のように指定された複数の動画像データファイルを一連の動画像データとして取り扱うだけであり、当該複数の動画像データファイルを１つの動画像データファイルに変換するわけではない。つまり、アノテーション装置１は、一連の複数の動画像データファイルをリストとして記憶している。 Here, the annotation device 1 only handles a plurality of moving image data files designated as described above as a series of moving image data, and the plurality of moving image data files are combined into one moving image data file. It does not convert. That is, the annotation device 1 stores a series of a plurality of moving image data files as a list.

また、上述した図９、図１１、図１３で上述したアノテーション画面１００Ａ、１００Ｂ、１００Ｃは、動画像表示領域１０１等と共にファイル名表示領域１１１、ファイル選択領域１１２等が表示される。ファイル名表示領域１１１は、動画像表示領域１０１に表示されている動画像に対応する動画像データファイルのファイル名を表示する領域である。ファイル選択領域１１２は、動画像表示領域１０１に表示されている動画像に対応する動画像データファイルを選択する際に操作される領域である。 Further, on the annotation screens 100A, 100B, 100C described in FIGS. 9, 11, and 13 described above, the file name display area 111, the file selection area 112, and the like are displayed together with the moving image display area 101 and the like. The file name display area 111 is an area for displaying the file name of the moving image data file corresponding to the moving image displayed in the moving image display area 101. The file selection area 112 is an area operated when selecting a moving image data file corresponding to the moving image displayed in the moving image display area 101.

例えば、表示処理部５１は、操作に応じてファイル選択領域１１２の「Ｎｅｘｔ」が選択されると、現在、ファイル名表示領域１１１にファイル名が表示されている動画像データファイルの次の動画像データファイルの動画像を先頭から動画像表示領域１０１に表示させる。同様に、表示処理部５１は、操作に応じてファイル選択領域１１２の「Ｐｒｅｖ」が選択されると、現在、ファイル名表示領域１１１にファイル名が表示されている動画像データファイルの前の動画像データファイルの動画像を先頭から動画像表示領域１０１に表示させる。 For example, when the display processing unit 51 selects "Next" in the file selection area 112 according to the operation, the display processing unit 51 is the next moving image of the moving image data file whose file name is currently displayed in the file name display area 111. The moving image of the data file is displayed in the moving image display area 101 from the beginning. Similarly, when the display processing unit 51 selects "Prev" in the file selection area 112 according to the operation, the display processing unit 51 currently displays the video in front of the moving image data file whose file name is displayed in the file name display area 111. The moving image of the image data file is displayed in the moving image display area 101 from the beginning.

また、例えば、表示処理部５１は、操作に応じてファイル名表示領域１１１が選択されると、図１８に例示するように、ファイル名表示領域１１１にファイルリスト１１１Ａを表示させることができる。ファイルリスト１１１Ａは、上記のように一連のタスクとして指定された複数の動画像データファイルのファイル名リストである。表示処理部５１は、例えば、ファイル名表示領域１１１に表示されるファイルリスト１１１Ａにおいて、既に再生済みの動画像データファイルのファイル名の文字色をグレー、現在再生中の動画像データファイルのファイル名の文字色を黒、未だ再生していない動画像データファイルのファイル名の文字色を赤色等のように表示態様を区別して表示させることができる。また、表示処理部５１は、操作に応じて別の動画像データファイルが選択されると、当該選択された動画像データファイルの動画像を先頭から動画像表示領域１０１に表示させることもできる。 Further, for example, when the file name display area 111 is selected according to the operation, the display processing unit 51 can display the file list 111A in the file name display area 111 as illustrated in FIG. The file list 111A is a file name list of a plurality of moving image data files designated as a series of tasks as described above. For example, in the file list 111A displayed in the file name display area 111, the display processing unit 51 sets the character color of the file name of the moving image data file that has already been played back to gray, and the file name of the moving image data file that is currently being played back. The text color of the file name of the moving image data file that has not been played back can be displayed differently, such as black and red. Further, when another moving image data file is selected according to the operation, the display processing unit 51 can display the moving image of the selected moving image data file in the moving image display area 101 from the beginning.

このようにアノテーション装置１は、作業者や管理者等の任意に動画像データファイルを指定し動画像を再生する機能を有するが、例えば、現在、作業している動画像データファイルの最終時刻で終了キーフレームが設定されていない作業継続状態のラベルが存在する場合、そのラベルの付加作業が終了していないことを記憶する機能を有する。表示処理部５１は、例えば、上記のように作業継続状態のラベルがある場合、当該動画像データファイルの後の動画像データファイルの動画像を再生している際には、ラベル選択領域１０３や追加済みラベル表示領域１０９に作業継続状態のラベルを表示させるようにしてもよい。また例えば、表示処理部５１は、物体ラベルによって特定される物体の位置を表す矩形枠の座標値を用いて、その座標の点を表示させ、それらの点と点との間を線で結ぶように表示してもよい。このようにして、アノテーション装置１は、異なる複数の動画像データファイルの動画像を再生しても当該物体ラベルが同一の物体に対する物体ラベルであると記憶することも可能である。 In this way, the annotation device 1 has a function of arbitrarily designating a moving image data file such as a worker or an administrator and reproducing the moving image. For example, at the final time of the moving image data file currently being worked on. If there is a work continuation state label for which an end key frame has not been set, it has a function to remember that the work of adding the label has not been completed. For example, when the display processing unit 51 has a label in the work continuation state as described above, when the moving image of the moving image data file after the moving image data file is being reproduced, the display processing unit 51 may use the label selection area 103 or the like. The label of the work continuation state may be displayed in the added label display area 109. Further, for example, the display processing unit 51 displays the points at the coordinates using the coordinate values of the rectangular frame representing the position of the object specified by the object label, and connects the points with a line. It may be displayed in. In this way, the annotation device 1 can also store that the object label is an object label for the same object even if the moving images of a plurality of different moving image data files are reproduced.

そして、アノテーション処理部５５は、上記のように指定された複数の動画像データファイルによる一連のタスクが終了した際に、操作に応じてタスク終了操作領域１０４（図９、図１１、図１３参照）が選択されることで当該タスクを終了する。一方、アノテーション処理部５５は、一連のタスクが終了しておらず、作業継続状態のラベルがある状態で、操作に応じてタスク終了操作領域１０４が選択されたり画面クローズボタンが選択されたりするとアノテーション作業の終了処理を始める。このとき、作業継続状態のラベルが存在する場合、表示処理部５１は、図１９に例示するような作業終了確認画像１１３をアノテーション画面１００Ａ、１００Ｂ、１００Ｃに表示させてもよい。作業終了確認画像１１３は、作業終了を確認させるための画像であり、例えば、「終了すると継続状態のラベルの作業データは消去されるが、それでも終了するか？」等の文字画像を表示している。アノテーション処理部５５は、作業終了確認画像１１３において操作に応じて「ＯＫ」が選択されると、作業継続状態のラベルの作業データを消去して終了するように処理することができる。一方、表示処理部５１は、作業終了確認画像１１３おいて操作に応じて「Ｃａｎｃｅｌ」が選択されると、作業終了確認画像１１３を非表示とし、元の表示画面に復帰させる。 Then, when the series of tasks by the plurality of moving image data files designated as described above is completed, the annotation processing unit 55 responds to the operation by the task end operation area 104 (see FIGS. 9, 11, and 13). ) Is selected to end the task. On the other hand, the annotation processing unit 55 annotates when the task end operation area 104 or the screen close button is selected according to the operation while the series of tasks has not been completed and the label of the work continuation state is present. Start the work end process. At this time, if there is a label in the work continuation state, the display processing unit 51 may display the work end confirmation image 113 as illustrated in FIG. 19 on the annotation screens 100A, 100B, 100C. The work end confirmation image 113 is an image for confirming the end of work, and displays, for example, a character image such as "When the work is completed, the work data of the label in the continuation state is erased, but is it still completed?" There is. When "OK" is selected in response to the operation in the work completion confirmation image 113, the annotation processing unit 55 can perform processing so as to erase the work data of the label in the work continuation state and end the work. On the other hand, when "Cancel" is selected in response to the operation in the work completion confirmation image 113, the display processing unit 51 hides the work completion confirmation image 113 and returns it to the original display screen.

また、表示処理部５１は、作業継続状態のラベルがない状態であっても、操作に応じてタスク終了操作領域１０４が選択された際に、一旦、図２０に例示するような作業終了決定画像１１４をアノテーション画面１００Ａ、１００Ｂ、１００Ｃに表示させてもよい。作業終了決定画像１１４は、作業終了を決定させるための画像であり、例えば、タスクを終了するか？」等の文字画像を表示している。アノテーション処理部５５は、作業終了決定画像１１４おいて操作に応じて「ＯＫ」が選択されると当該タスクを終了する。一方、表示処理部５１は、作業終了決定画像１１４において操作に応じて「Ｃａｎｃｅｌ」が選択されると、作業終了決定画像１１４を非表示とし、元の表示画面に復帰させる。 Further, even if there is no label for the work continuation state, the display processing unit 51 once selects the task end operation area 104 according to the operation, and the work end determination image as illustrated in FIG. 20 once. 114 may be displayed on the annotation screens 100A, 100B, 100C. The work end determination image 114 is an image for determining the end of work, for example, is the task completed? A character image such as "" is displayed. The annotation processing unit 55 ends the task when "OK" is selected according to the operation in the work end determination image 114. On the other hand, when "Cancel" is selected in response to the operation in the work end determination image 114, the display processing unit 51 hides the work end determination image 114 and returns it to the original display screen.

ここで、本実施形態のアノテーション処理部５５は、上記のように一連の動画像を表す動画像データとして取り扱う複数の動画像データファイルにおいて、ある動画像データファイルに映っている物体が、次の動画像データファイルにも引き続き映っている場合でも、例えば、図２１に例示するように、１つの物体ラベルについての情報を記録した１つの教師データＤ２（アノテーションファイル）として保存する機能を有する。 Here, in the plurality of moving image data files handled as moving image data representing a series of moving images as described above, the object reflected in a certain moving image data file is the next object in the annotation processing unit 55 of the present embodiment. Even when it is continuously reflected in the moving image data file, for example, as illustrated in FIG. 21, it has a function of saving as one teacher data D2 (annotation file) in which information about one object label is recorded.

また、関係ラベルでは、１つの物体ラベルが他の物体ラベルと連携することになる。本実施形態のアノテーション処理部５５は、関係ラベルについては、例えば、図２２に例示するように、ヘッダ情報の１つに、関係するラベルのＩＤを記録するキーを設け、そのキーの値として、関係ラベルに関係する物体ラベルのＩＤ番号を記録するフォーマットによって教師データＤ２（アノテーションファイル）として保存する機能を有していてもよい。 Further, in the relational label, one object label is linked with another object label. Regarding the relational label, for example, as illustrated in FIG. 22, the annotation processing unit 55 of the present embodiment provides a key for recording the ID of the related label in one of the header information, and sets the value of the key as the value of the key. It may have a function of saving as teacher data D2 (annotation file) by a format for recording the ID number of the object label related to the relation label.

なお、図２１、図２２に例示した教師データＤ２（アノテーションファイル）は、テキストファイルのフォーマットであるものとして例示したが、ＪＳＯＮ（ＪａｖａＳｃｒｉｐｔ（登録商標）ＯｂｊｅｃｔＮｏｔａｔｉｏｎ）やＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）のようなデータ記述言語でフォーマットを定めた半構造化データとされてもよい。この場合、アノテーション処理部５５は、例えば、関係ラベルの教師データＤ２（アノテーションファイル）では、関係ラベルに関係する物体ラベルの情報はＩＤ番号だけでなく、物体ラベルが関係ラベルに関係している時間範囲等も記録することができる。ここで、ＪＳＯＮは、プログラミング言語ＪａｖａＳｃｒｉｐｔ（登録商標）だけに使われるものではなく、ＸＭＬと同様に汎用的なデータ交換用フォーマットの１つとして使用されているものである。 The teacher data D2 (annotation file) illustrated in FIGS. 21 and 22 is exemplified as a text file format, but is like JSON (JavaScript (registered trademark) Object Notification) or XML (Extensible Markup Language). It may be semi-structured data whose format is defined by a data description language. In this case, the annotation processing unit 55, for example, in the teacher data D2 (annotation file) of the relation label, the information of the object label related to the relation label is not only the ID number but also the time when the object label is related to the relation label. The range etc. can also be recorded. Here, JSON is not only used only in the programming language Javascript (registered trademark), but is also used as one of the general-purpose data exchange formats like XML.

次に、図２３、図２４、図２５のフローチャートを参照して、タスク作成からアノテーション作業の一連の処理手順について説明する。以下で説明する方法は、操作に応じてアノテーション装置１の処理回路５０によって各種プログラムが実行されることで各ステップに関する処理が実行される。 Next, a series of processing procedures from task creation to annotation work will be described with reference to the flowcharts of FIGS. 23, 24, and 25. In the method described below, various programs are executed by the processing circuit 50 of the annotation device 1 according to the operation, so that the processing related to each step is executed.

まず、図２３を参照して、アノテーション作業に提供するタスクの作成方法における各処理について説明する。このタスクの作成方法は、典型的には、主に管理者用の機器を構成するアノテーション装置１によって行われる。 First, with reference to FIG. 23, each process in the method of creating the task provided for the annotation work will be described. The method of creating this task is typically performed mainly by the annotation device 1 that constitutes the device for the administrator.

まず、処理回路５０のデータ入出力処理部５３は、例えば、管理者の操作に応じてアノテーションサーバーに通信接続する（ステップＳ１）。アノテーションサーバーは、例えば、上述した学習用データ提供システムＳｖ等によって構成されるものである。そして、処理回路５０のタスク作成処理部５４は、管理者の操作に応じて教師データを作成する対象とする動画像データの検索条件を入力し（ステップＳ２）、データ入出力処理部５３は、当該入力された検索条件をアノテーションサーバーに送信する（ステップＳ３）。タスク作成処理部５４は、検索条件として、例えば、対象とする動画像データの日時や車両等を指定することができる。 First, the data input / output processing unit 53 of the processing circuit 50 makes a communication connection to the annotation server according to, for example, an operation of the administrator (step S1). The annotation server is configured by, for example, the above-mentioned learning data providing system Sv or the like. Then, the task creation processing unit 54 of the processing circuit 50 inputs the search condition of the moving image data for which the teacher data is to be created according to the operation of the administrator (step S2), and the data input / output processing unit 53 sets the data input / output processing unit 53. The input search condition is transmitted to the annotation server (step S3). As the search condition, the task creation processing unit 54 can specify, for example, the date and time of the target moving image data, the vehicle, and the like.

そして、データ入出力処理部５３は、アノテーションサーバーにおいて検索条件に応じて検索された検索結果を受信し（ステップＳ４）、タスク作成処理部５４は、検索条件に合致したデータが存在しないか否かを判定する（ステップＳ５）。タスク作成処理部５４は、検索条件に合致したデータが存在しないと判定した場合（ステップＳ５：Ｙｅｓ）、ステップＳ１の処理に戻って以降の処理を繰り返し実行する。処理回路５０の表示処理部５１は、タスク作成処理部５４によって検索条件に合致したデータが存在すると判定された場合（ステップＳ５：Ｎｏ）、例えば、検索結果を表示機器１０に表示させることで、管理者に対して検索結果を出力する（ステップＳ６）。 Then, the data input / output processing unit 53 receives the search result searched according to the search conditions on the annotation server (step S4), and the task creation processing unit 54 determines whether or not data matching the search conditions exists. Is determined (step S5). When it is determined that the data matching the search condition does not exist (step S5: Yes), the task creation processing unit 54 returns to the processing of step S1 and repeatedly executes the subsequent processing. When the task creation processing unit 54 determines that the data matching the search conditions exists (step S5: No), the display processing unit 51 of the processing circuit 50 displays, for example, the search result on the display device 10. The search result is output to the administrator (step S6).

次に、タスク作成処理部５４は、管理者の操作に応じて検索結果に対応した動画像データファイルを割り振って一連の作業タスクとするためのデータ条件を設定し（ステップＳ７）、当該データ条件に応じた作業タスクを生成する（ステップＳ８）。そして、表示処理部５１は、例えば、生成された作業タスクの情報を表示機器１０に表示させることで、管理者に対して当該作業タスクの情報を出力する（ステップＳ９）。 Next, the task creation processing unit 54 sets data conditions for allocating moving image data files corresponding to the search results to form a series of work tasks according to the operation of the administrator (step S7), and the data conditions. Generate a work task according to (step S8). Then, the display processing unit 51 outputs the information of the work task to the administrator by displaying the generated work task information on the display device 10, for example (step S9).

次に、タスク作成処理部５４は、管理者の操作に応じて生成された各作業タスクを各作業者に割り当てた後（ステップＳ１０）、例えば、各作業者用の機器等に割り当てられた作業タスクの情報を連絡し（ステップＳ１１）、本フローチャートによる処理を終了する。 Next, the task creation processing unit 54 assigns each work task generated in response to the operation of the administrator to each worker (step S10), and then, for example, the work assigned to the device or the like for each worker. The task information is notified (step S11), and the process according to this flowchart is terminated.

次に、図２４を参照して、アノテーション作業として提供されたタスクの実行方法における各処理について説明する。このタスクの実行方法は、典型的には、主に作業者用の機器を構成するアノテーション装置１によって行われる。 Next, with reference to FIG. 24, each process in the execution method of the task provided as the annotation work will be described. The method of executing this task is typically performed mainly by the annotation device 1 that constitutes the device for the worker.

アノテーション処理部５５は、例えば、作業者の操作に応じて当該作業者に割り当てられた作業タスクを開始し（ステップＳ１０１）、データ入出力処理部５３は、割り当てられた作業タスクに付帯したデータ条件をアノテーションサーバーに送信する（ステップＳ１０２）。 The annotation processing unit 55 starts, for example, a work task assigned to the worker according to the operation of the worker (step S101), and the data input / output processing unit 53 uses data conditions attached to the assigned work task. To the annotation server (step S102).

そして、データ入出力処理部５３は、アノテーションサーバーから当該データ条件に一致する動画像データファイルを受信し（ステップＳ１０３）、記憶回路４０の指定ファイル記憶領域４０ａ等に記憶させる。 Then, the data input / output processing unit 53 receives the moving image data file matching the data condition from the annotation server (step S103), and stores it in the designated file storage area 40a or the like of the storage circuit 40.

そして、アノテーション処理部５５は、作業者の操作に応じて受信した動画像データファイルを、一連の動画像を表す動画像データとして取り扱ってアノテーション作業を実行させる（ステップＳ１０４）。 Then, the annotation processing unit 55 handles the moving image data file received in response to the operator's operation as moving image data representing a series of moving images, and executes the annotation work (step S104).

ここでは、表示処理部５１は、作業者の操作に応じて動画像を表示機器１０に表示して再生させ（ステップＳ１０５）、アノテーション処理部５５は、作業者の操作に応じてアノテーション処理の対象となる物体、事象が検出されたか否かを判定する（ステップＳ１０６）。アノテーション処理部５５は、例えば、ラベル追加操作領域１０２への操作の有無に応じてアノテーション処理の対象となる物体、事象が検出されたか否かを判定することができる。アノテーション処理部５５は、アノテーション処理の対象となる物体、事象が検出されていないと判定した場合（ステップＳ１０６：Ｎｏ）、ステップＳ１０５の処理に戻って以降の処理を繰り返し実行する。 Here, the display processing unit 51 displays a moving image on the display device 10 according to the operation of the worker and reproduces it (step S105), and the annotation processing unit 55 is the target of the annotation processing according to the operation of the worker. It is determined whether or not an object or an event is detected (step S106). The annotation processing unit 55 can determine, for example, whether or not an object or event to be annotated is detected depending on whether or not there is an operation on the label addition operation area 102. When it is determined that the object or event to be annotated is not detected (step S106: No), the annotation processing unit 55 returns to the process of step S105 and repeatedly executes the subsequent processes.

アノテーション処理部５５は、アノテーション処理の対象となる物体、事象が検出されたと判定した場合（ステップＳ１０６：Ｙｅｓ）、作業者の操作に応じて物体ラベルや関係ラベル等のアノテーション情報が追加されると（ステップＳ１０７）、当該アノテーション情報と共に対応する動画像データの情報を教師データ（アノテーションファイル）に追加する（ステップＳ１０８）。 When the annotation processing unit 55 determines that an object or event to be annotated is detected (step S106: Yes), when annotation information such as an object label or a related label is added according to the operation of the operator. (Step S107), the information of the corresponding moving image data is added to the teacher data (annotation file) together with the annotation information (step S108).

そして、アノテーション処理部５５は、データ条件に動画像データ以外の条件があるか否かを判定する（ステップＳ１０９）。アノテーション処理部５５は、データ条件に動画像データ以外の条件があると判定した場合（ステップＳ１０９：Ｙｅｓ）、上記アノテーション情報と対応する他データの情報も教師データ（アノテーションファイル）に追加する（ステップＳ１１０）。 Then, the annotation processing unit 55 determines whether or not the data condition includes a condition other than the moving image data (step S109). When the annotation processing unit 55 determines that the data condition includes a condition other than the moving image data (step S109: Yes), the annotation processing unit 55 also adds information of other data corresponding to the annotation information to the teacher data (annotation file) (step S109: Yes). S110).

その後、アノテーション処理部５５は、作業者の操作に応じてアノテーション作業の終了が入力されたか否か（言い換えれば、タスク終了操作領域１０４が選択されたか否か）を判定し（ステップＳ１１１）、アノテーション作業の終了が入力されたと判定した場合（ステップＳ１１１：Ｙｅｓ）、本フローチャートによる処理を終了する。アノテーション処理部５５は、アノテーション作業の終了が入力されていないと判定した場合（ステップＳ１１１：Ｎｏ）、ステップＳ１０５の処理に戻って以降の処理を繰り返し実行する。 After that, the annotation processing unit 55 determines whether or not the end of the annotation work has been input according to the operation of the worker (in other words, whether or not the task end operation area 104 has been selected) (step S111), and annotates. When it is determined that the end of the work has been input (step S111: Yes), the process according to this flowchart is terminated. When it is determined that the end of the annotation work has not been input (step S111: No), the annotation processing unit 55 returns to the processing of step S105 and repeatedly executes the subsequent processing.

アノテーション処理部５５は、ステップＳ１０９の処理において、データ条件に動画像データ以外の条件がないと判定した場合（ステップＳ１０９：Ｎｏ）、ステップＳ１１０の処理をとばしてステップＳ１１１の処理に移行する。 When the annotation processing unit 55 determines in the process of step S109 that the data condition does not include a condition other than moving image data (step S109: No), the process of step S110 is skipped and the process proceeds to the process of step S111.

次に、図２５を参照して、アノテーション作業のより具体的な実行方法における各処理について説明する。このアノテーション作業の実行方法は、典型的には、主に作業者用の機器を構成するアノテーション装置１によって行われる。 Next, with reference to FIG. 25, each process in a more specific execution method of the annotation work will be described. The method of executing this annotation work is typically performed mainly by the annotation device 1 that constitutes the device for the worker.

表示処理部５１は、作業者の操作に応じて動画像を表示機器１０に表示して再生させ（ステップＳ２０１）、アノテーション処理部５５は、作業者の操作に応じてアノテーション処理の対象となる物体が検出されたか否かを判定する（ステップＳ２０２）。アノテーション処理部５５は、例えば、ラベル追加操作領域１０２への操作の有無に応じてアノテーション処理の対象となる物体が検出されたか否かを判定することができる。アノテーション処理部５５は、アノテーション処理の対象となる物体が検出されていないと判定した場合（ステップＳ２０２：Ｎｏ）、ステップＳ２０１の処理に戻って以降の処理を繰り返し実行する。 The display processing unit 51 displays a moving image on the display device 10 and reproduces it according to the operation of the operator (step S201), and the annotation processing unit 55 is an object to be annotated according to the operation of the operator. Is determined (step S202). The annotation processing unit 55 can determine, for example, whether or not an object to be annotated is detected depending on whether or not there is an operation on the label addition operation area 102. When it is determined that the object to be annotated is not detected (step S202: No), the annotation processing unit 55 returns to the process of step S201 and repeatedly executes the subsequent processes.

表示処理部５１は、アノテーション処理部５５によってアノテーション処理の対象となる物体が検出されたと判定した場合（ステップＳ２０２：Ｙｅｓ）、作業者の操作に応じて表示機器１０に再生されている動画像を停止する（ステップＳ２０３）。 When the display processing unit 51 determines that the object to be annotated is detected by the annotation processing unit 55 (step S202: Yes), the display processing unit 51 displays the moving image reproduced on the display device 10 in response to the operator's operation. Stop (step S203).

そして、アノテーション処理部５５は、作業者の操作に応じて、上記で検出された物体についての物体ラベルを新規作成し（ステップＳ２０４）、この物体ラベルに対応する物体の種類や位置などをアノテーション情報として追加する（ステップＳ２０５）。またこのとき、アノテーション処理部５５は、現在のフレームを当該物体ラベルの開始キーフレームとして記憶回路４０に記憶させる。 Then, the annotation processing unit 55 newly creates an object label for the object detected above (step S204) according to the operation of the operator, and annotates information such as the type and position of the object corresponding to this object label. (Step S205). At this time, the annotation processing unit 55 stores the current frame in the storage circuit 40 as the start key frame of the object label.

次に、表示処理部５１は、作業者の操作に応じて動画像を表示機器１０に表示して再生させ（ステップＳ２０６）、アノテーション処理部５５は、作業者の操作に応じて当該物体ラベルに対応する物体の位置指定の修正を決定したか否かを判定する（ステップＳ２０７）。表示処理部５１は、アノテーション処理部５５によって物体の位置指定の修正を決定したと判定された場合（ステップＳ２０７：Ｙｅｓ）、作業者の操作に応じて表示機器１０に再生されている動画像を停止する（ステップＳ２０８）。 Next, the display processing unit 51 displays a moving image on the display device 10 and reproduces it according to the operation of the operator (step S206), and the annotation processing unit 55 displays the moving image on the object label according to the operation of the operator. It is determined whether or not the modification of the position designation of the corresponding object is determined (step S207). When it is determined by the annotation processing unit 55 that the annotation processing unit 55 has decided to correct the position designation of the object (step S207: Yes), the display processing unit 51 displays the moving image reproduced on the display device 10 according to the operation of the operator. Stop (step S208).

そして、アノテーション処理部５５は、作業者の操作に応じて物体ラベルの対象とされた物体の位置指定を修正し（ステップＳ２０９）、先のフレーム（例えば、開始キーフレーム）で指定した物体の位置と、このフレームで再指定した物体の位置とに基づいてこれらの間の各フレームでの物体の位置を線形補間等によって算出、特定する（ステップＳ２１０）。そして、表示処理部５１は、作業者の操作に応じて動画像を表示機器１０に表示して再生させる（ステップＳ２１１）。 Then, the annotation processing unit 55 corrects the position designation of the object targeted by the object label according to the operation of the operator (step S209), and the position of the object designated in the previous frame (for example, the start key frame). And, based on the position of the object redesignated in this frame, the position of the object in each frame between them is calculated and specified by linear interpolation or the like (step S210). Then, the display processing unit 51 displays the moving image on the display device 10 and reproduces it according to the operation of the operator (step S211).

次に、アノテーション処理部５５は、作業者の操作に応じて上記のように決定された物体ラベルの対象の物体が映るフレームが終了したか否かを判定する（ステップＳ２１２）。アノテーション処理部５５は、例えば、終了操作領域１０７への操作の有無に応じて当該物体ラベルの対象とされた物体が映るフレームが終了したか否かを判定することができる。また、アノテーション処理部５５は、上述のステップＳ２０７の処理において、物体ラベルの対象とされた物体の位置指定の修正を決定していないと判定した場合（ステップＳ２０７：Ｎｏ）には、ステップＳ２０８～ステップＳ２１１の処理をとばしてこのステップＳ２１２の処理に移行する。 Next, the annotation processing unit 55 determines whether or not the frame in which the object of the object of the object label determined as described above is reflected has been completed according to the operation of the operator (step S212). The annotation processing unit 55 can determine, for example, whether or not the frame in which the object targeted by the object label is reflected has ended, depending on whether or not there is an operation on the end operation area 107. Further, when the annotation processing unit 55 determines in the process of step S207 described above that the correction of the position designation of the object targeted by the object label has not been determined (step S207: No), steps S208 to S208 to The process of step S211 is skipped and the process proceeds to the process of step S212.

アノテーション処理部５５は、検出対象の物体が映るフレームが終了していないと判定した場合（ステップＳ２１２：Ｎｏ）、ステップＳ２０６の処理に戻って以降の処理を繰り返し実行する。アノテーション処理部５５は、検出対象の物体が映るフレームが終了したと判定した場合（ステップＳ２１２：Ｙｅｓ）、作業者の操作に応じてこの物体ラベル作業の終了を実行し（ステップＳ２１３）、現在のフレームを当該物体ラベルの終了キーフレームとして記憶回路４０に記憶させ、この動画像における当該物体ラベルの時間範囲を決定する。 When it is determined that the frame in which the object to be detected is reflected has not been completed (step S212: No), the annotation processing unit 55 returns to the processing of step S206 and repeatedly executes the subsequent processing. When the annotation processing unit 55 determines that the frame in which the object to be detected is reflected has been completed (step S212: Yes), the annotation processing unit 55 executes the end of this object label work according to the operation of the operator (step S213), and is present. The frame is stored in the storage circuit 40 as the end key frame of the object label, and the time range of the object label in this moving image is determined.

次に、アノテーション処理部５５は、作業者の操作に応じて同時間帯にアノテーション処理の対象となる他の物体が検出されたか否かを判定する（ステップＳ２１４）。表示処理部５１は、アノテーション処理部５５によってアノテーション処理の対象となる他の物体が検出されたと判定された場合（ステップＳ２１４：Ｙｅｓ）、作業者の操作に応じて該当する再生フレームを検索、表示させ（ステップＳ２１５）、ステップＳ２０４の処理に戻って以降の処理を繰り返し実行する。 Next, the annotation processing unit 55 determines whether or not another object to be annotated is detected in the same time zone according to the operation of the operator (step S214). When the display processing unit 51 determines that another object to be annotated is detected by the annotation processing unit 55 (step S214: Yes), the display processing unit 51 searches for and displays the corresponding playback frame according to the operation of the operator. (Step S215), the process returns to the process of step S204, and the subsequent processes are repeatedly executed.

アノテーション処理部５５は、アノテーション処理の対象となる他の物体が検出されていないと判定した場合（ステップＳ２１４：Ｎｏ）、作業者の操作に応じてアノテーション処理の対象となる事象が検出されたか否かを判定する（ステップＳ２１６）。アノテーション処理部５５は、例えば、ラベル追加操作領域１０２への操作の有無に応じてアノテーション処理の対象となる事象が検出されたか否かを判定することができる。アノテーション処理部５５は、アノテーション処理の対象となる事象が検出されていないと判定した場合（ステップＳ２１６：Ｎｏ）、本フローチャートによる処理を終了する。 When the annotation processing unit 55 determines that another object to be annotated has not been detected (step S214: No), whether or not an event to be annotated has been detected according to the operation of the operator. (Step S216). The annotation processing unit 55 can determine, for example, whether or not an event to be annotated is detected depending on whether or not there is an operation on the label addition operation area 102. When the annotation processing unit 55 determines that the event to be annotated is not detected (step S216: No), the annotation processing unit 55 ends the processing according to this flowchart.

表示処理部５１は、アノテーション処理部５５によってアノテーション処理の対象となる事象が検出されたと判定された場合（ステップＳ２１６：Ｙｅｓ）、作業者の操作に応じて該当する再生フレームを検索、表示させる（ステップＳ２１７）。 When the display processing unit 51 determines that the event to be annotated is detected by the annotation processing unit 55 (step S216: Yes), the display processing unit 51 searches for and displays the corresponding playback frame according to the operation of the operator (step S216: Yes). Step S217).

そして、アノテーション処理部５５は、作業者の操作に応じて、上記で検出された事象についての関係ラベルを新規作成し（ステップＳ２１８）、この関係ラベルに対応する事象の種類などをアノテーション情報として追加する（ステップＳ２１９）。またこのとき、アノテーション処理部５５は、現在のフレームを当該関係ラベルの開始キーフレームとして記憶回路４０に記憶させる。 Then, the annotation processing unit 55 newly creates a relational label for the event detected above according to the operation of the operator (step S218), and adds the type of the event corresponding to this relational label as annotation information. (Step S219). At this time, the annotation processing unit 55 stores the current frame in the storage circuit 40 as the start key frame of the relation label.

そして、アノテーション処理部５５は、作業者の操作に応じて当該関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定し、関係ラベルのアノテーション情報として追加する（ステップＳ２２０）。そして、表示処理部５１は、作業者の操作に応じて動画像を表示機器１０に表示して再生させる（ステップＳ２２１）。 Then, the annotation processing unit 55 designates the object label of the object related to the event targeted by the relation label according to the operation of the worker, and adds it as the annotation information of the relation label (step S220). Then, the display processing unit 51 displays the moving image on the display device 10 and reproduces it according to the operation of the operator (step S221).

次に、アノテーション処理部５５は、作業者の操作に応じて上記のように決定された関係ラベルの対象とされた事象が映るフレームが終了したか否かを判定する（ステップＳ２２２）。アノテーション処理部５５は、例えば、終了操作領域１０７への操作の有無に応じて当該関係ラベルの対象とされた事象が映るフレームが終了したか否かを判定することができる。 Next, the annotation processing unit 55 determines whether or not the frame in which the event targeted by the relation label determined as described above is reflected has ended according to the operation of the operator (step S222). The annotation processing unit 55 can determine, for example, whether or not the frame in which the event targeted by the relation label is displayed has ended depending on whether or not there is an operation on the end operation area 107.

アノテーション処理部５５は、関係ラベルの対象とされた事象が映るフレームが終了していないと判定した場合（ステップＳ２２２：Ｎｏ）、ステップＳ２２１の処理に戻って以降の処理を繰り返し実行する。表示処理部５１は、アノテーション処理部５５によって関係ラベルの対象とされた事象が映るフレームが終了したと判定された場合（ステップＳ２２２：Ｙｅｓ）、作業者の操作に応じて表示機器１０に再生されている動画像を停止する（ステップＳ２２３）。 When it is determined that the frame in which the event targeted by the relation label is displayed has not been completed (step S222: No), the annotation processing unit 55 returns to the processing of step S221 and repeatedly executes the subsequent processing. When it is determined by the annotation processing unit 55 that the frame in which the event targeted by the relation label is displayed has ended (step S222: Yes), the display processing unit 51 is played back on the display device 10 according to the operation of the operator. The moving image is stopped (step S223).

そして、アノテーション処理部５５は、作業者の操作に応じてこの関係ラベル作業の終了を実行し（ステップＳ２２４）、現在のフレームを当該関係ラベルの終了キーフレームとして記憶回路４０に記憶させ、この動画像における当該関係ラベルの時間範囲を決定し、本フローチャートによる処理を終了する。 Then, the annotation processing unit 55 executes the end of the relational label work according to the operation of the operator (step S224), stores the current frame as the end key frame of the relational label in the storage circuit 40, and makes this moving image. The time range of the relation label in the image is determined, and the process according to this flowchart is terminated.

以上で説明した図２５のアノテーション作業に関する方法は、「動画像データが表す動画像を表示するステップと、操作を受け付けるステップと、操作に応じて動画像データにアノテーション情報を付加して学習済みモデルの機械学習に用いる教師データを作成するステップとを含み、教師データを作成するステップでは、操作に応じて動画像に含まれる物体の位置を特定し当該物体の種類を表す物体ラベルをアノテーション情報として動画像データに付加し、操作に応じて動画像に含まれる複数の物体が相関する事象の種類を表す関係ラベルをアノテーション情報として動画像データに付加し、操作に応じて上記付加した物体ラベルから関係ラベルの対象とされた事象に関係する物体の物体ラベルを指定しアノテーション情報として動画像データに付加することを特徴とするアノテーション方法」の当該各ステップに相当するステップが含まれている。また、この「アノテーション方法」は、予め用意された「アノテーションプログラム」をパーソナルコンピュータやワークステーション等のコンピュータで実行することによって実現することができる。この「アノテーションプログラム」は、上述したステップの各処理をコンピュータに実行させる。 The method related to the annotation work of FIG. 25 described above is "a step of displaying the moving image represented by the moving image data, a step of accepting an operation, and a trained model in which annotation information is added to the moving image data according to the operation. In the step of creating teacher data, including the step of creating teacher data used for machine learning, the position of the object included in the moving image is specified according to the operation, and the object label indicating the type of the object is used as annotation information. A relational label that is added to the moving image data and indicates the type of event in which multiple objects included in the moving image correlate according to the operation is added to the moving image data as annotation information, and from the above-added object label according to the operation. A step corresponding to each step of "annotation method characterized by designating an object label of an object related to an event targeted by a relational label and adding it to moving image data as annotation information" is included. Further, this "annotation method" can be realized by executing a "annotation program" prepared in advance on a computer such as a personal computer or a workstation. This "annotation program" causes the computer to execute each process of the above-mentioned steps.

以上で説明したアノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、複数の物体が関係することで生じる事象を検出するための学習済みモデルＭの機械学習に用いる教師データを作成することができる。この場合、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、動画像データに対してアノテーション情報として、動画像に映る物体の種類を表す物体ラベル、及び、複数の物体が相関する事象の種類を表す関係ラベルを付加することができる。その上で、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、さらに、動画像データに対してアノテーション情報として、当該関係ラベルが対象とする事象と相関する物体の物体ラベルもあわせて指定し付加することができる。この結果、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、適正に教師データの作成を行うことができる。この結果、例えば、検出システムＳｙｓ１は、当該教師データを用いた機械学習によって関係行動検出用の学習済みモデルＭを生成することができ、この学習済みモデルＭを用いて関係行動検出を適正に行うことができる。 The annotation device 1, the annotation method, and the annotation program described above can create teacher data used for machine learning of the trained model M for detecting an event that occurs when a plurality of objects are involved. In this case, the annotation device 1, the annotation method, and the annotation program provide annotation information for the moving image data, an object label indicating the type of the object reflected in the moving image, and the type of the event in which a plurality of objects are correlated. A relational label can be added to represent it. Then, the annotation device 1, the annotation method, and the annotation program also specify and add the object label of the object that correlates with the event targeted by the relation label as the annotation information for the moving image data. can do. As a result, the annotation device 1, the annotation method, and the annotation program can appropriately create teacher data. As a result, for example, the detection system Sys1 can generate a trained model M for detecting the relational behavior by machine learning using the teacher data, and properly performs the relational behavior detection using the trained model M. be able to.

例えば、以上で説明したアノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、ある動画像に映る物体に対して、「赤信号である歩行者信号機」を表す物体ラベルと「横断歩道を歩いている歩行者」を表す物体ラベルとを付加すると共に当該歩行者信号機、当該歩行者が相関する事象の種類として「歩行者の信号無視」という関係ラベルを付加した際に、当該関係ラベルに関係する物体の物体ラベルとして、「赤信号である歩行者信号機」を表す物体ラベル、及び、「横断歩道を歩いている歩行者」を表す物体ラベルを紐付けたアノテーション情報を付加し、教師データとすることができる。そして、検出システムＳｙｓ１は、例えば、当該教師データを用いた機械学習によって、「赤信号である歩行者信号機」、及び、「横断歩道を歩いている歩行者」が映る動画像の動画像データから「歩行者の信号無視」を検出可能な学習済みモデルＭを生成することができ、学習済みモデルＭを用いて当該「歩行者の信号無視」等の事象の検出（関係行動検出）を行うことができる。 For example, the annotation device 1, the annotation method, and the annotation program described above have an object label representing "a pedestrian traffic light that is a red light" and "walking on a pedestrian crossing" for an object reflected in a certain moving image. When the object label representing "pedestrian" is added and the relation label "ignoring the pedestrian signal" is added as the type of the pedestrian traffic light and the event that the pedestrian correlates with, the object related to the relation label. As the object label of, add annotation information associated with the object label representing "pedestrian traffic light which is a red light" and the object label representing "pedestrian walking on a pedestrian crossing", and use it as teacher data. Can be done. Then, the detection system Sys1 is, for example, from the moving image data of the moving image showing the "pedestrian traffic light which is a red light" and the "pedestrian walking on the pedestrian crossing" by machine learning using the teacher data. A trained model M capable of detecting "pedestrian signal ignorance" can be generated, and the trained model M is used to detect an event such as "pedestrian signal ignorance" (related behavior detection). Can be done.

また、以上で説明したアノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、アノテーション処理において、予め指定された複数の動画像データファイルを一連の動画像を表す動画像データとして取り扱う。これにより、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、ファイルの切り替わりを作業者に意識させることとなく連続して動画像を再生しアノテーション処理を行うことができるので、煩雑な作業を抑制し、作業性を向上することができる。また、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、異なるファイルにまたがる物体や事象であってもラベルが別々になることなく、シームレスにアノテーション処理を行うことができるので、教師データの品質の低下を招くことを抑制することができる。さらに、アノテーション装置１、アノテーション方法、及び、アノテーションプログラムは、複数の動画像データファイル単位ではなく、一連の動画像として取り扱う動画像データ単位で１つのまとまりの教師データを作成することができるので、この点でも分断された教師データの取りまとめ作業等の煩雑な作業を抑制し、教師データの品質の低下を招くことを抑制することができる。 Further, the annotation device 1, the annotation method, and the annotation program described above handle a plurality of predetermined moving image data files as moving image data representing a series of moving images in the annotation processing. As a result, the annotation device 1, the annotation method, and the annotation program can continuously reproduce moving images and perform annotation processing without making the operator aware of file switching, thus suppressing complicated work. And workability can be improved. In addition, the annotation device 1, the annotation method, and the annotation program can seamlessly perform annotation processing even for objects and events that span different files without the labels being separated, so that the quality of the teacher data can be improved. It is possible to suppress the decrease. Further, since the annotation device 1, the annotation method, and the annotation program can create one set of teacher data in units of moving image data handled as a series of moving images, not in units of a plurality of moving image data files. In this respect as well, it is possible to suppress complicated work such as collecting the divided teacher data and prevent deterioration of the quality of the teacher data.

なお、上述した本発明の実施形態に係るアノテーション装置、アノテーション方法、及び、アノテーションプログラムは、上述した実施形態に限定されず、特許請求の範囲に記載された範囲で種々の変更が可能である。 The annotation device, annotation method, and annotation program according to the above-described embodiment of the present invention are not limited to the above-mentioned embodiments, and various modifications can be made within the scope of the claims.

以上で説明したアノテーション装置１は、動画像再生機能に加えて、サウンド再生機能を有し、動画像データに加えて、例えば、サウンドデータに対してもアノテーション処理を適用するものであってもよい。さらに、アノテーション装置１は、例えば、ＩｏＴ（ＩｎｔｅｒｎｅｔｏｆＴｈｉｎｇｓ）データのような時系列データを２次元プロットする機能を有し、当該時系列データに対してもアノテーション処理を適用するものであってもよい。アノテーション装置１は、例えば、動画像データとは異なる構造のデータとしてサウンドデータを例にすると、例えば、「警報音」と「うわっ」という声とが相関する事象の種類として「驚き」という事象を表す関係ラベルを付加すると共に、当該関係ラベルの対象とされた事象に関係する音ラベルとして、「うわっ」という声に対する音ラベルと「警報音」に対する音ラベルとを指定しアノテーション情報としてサウンドデータに付加することができる。またこの場合、アノテーション装置１は、予め指定された複数の動画像データファイルを一連の動画像を表す動画像データとして取り扱う機能を、サウンドデータファイルやＩｏＴデータファイルに対して適用してもよい。 The annotation device 1 described above has a sound reproduction function in addition to the moving image reproduction function, and may apply annotation processing to, for example, sound data in addition to the moving image data. .. Further, the annotation device 1 has a function of two-dimensionally plotting time-series data such as IoT (Internet of Things) data, and even if the annotation process is applied to the time-series data as well. good. Taking sound data as an example of data having a structure different from that of moving image data, for example, the annotation device 1 sets an event of "surprise" as a type of event in which an "alarm sound" and a voice of "wow" correlate with each other. In addition to adding a relational label to represent, as a sound label related to the event targeted by the relational label, a sound label for the voice "wow" and a sound label for the "alarm sound" are specified and added to the sound data as annotation information. Can be added. Further, in this case, the annotation device 1 may apply a function of handling a plurality of predetermined moving image data files as moving image data representing a series of moving images to a sound data file or an IoT data file.

以上で説明した処理回路５０は、単一のプロセッサによって各処理機能が実現されるものとして説明したがこれに限らない。処理回路５０は、複数の独立したプロセッサを組み合わせて各プロセッサがプログラムを実行することにより各処理機能が実現されてもよい。また、処理回路５０が有する処理機能は、単一又は複数の処理回路に適宜に分散又は統合されて実現されてもよい。また、処理回路５０が有する処理機能は、その全部又は任意の一部をプログラムにて実現してもよく、また、ワイヤードロジック等によるハードウェアとして実現してもよい。 The processing circuit 50 described above has been described as assuming that each processing function is realized by a single processor, but the present invention is not limited to this. The processing circuit 50 may realize each processing function by combining a plurality of independent processors and executing a program by each processor. Further, the processing function of the processing circuit 50 may be appropriately distributed or integrated into a single or a plurality of processing circuits. Further, the processing function of the processing circuit 50 may be realized by a program in whole or in any part thereof, or may be realized as hardware by wired logic or the like.

以上で説明したプロセッサによって実行されるプログラムは、記憶回路４０等に予め組み込まれて提供される。なお、このプログラムは、これらの装置にインストール可能な形式又は実行可能な形式のファイルで、コンピュータで読み取り可能な記憶媒体に記録されて提供されてもよい。また、このプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納され、ネットワーク経由でダウンロードされることにより提供又は配布されてもよい。 The program executed by the processor described above is provided by being incorporated in the storage circuit 40 or the like in advance. It should be noted that this program may be provided as a file in a format that can be installed on these devices or in an executable format, recorded on a storage medium that can be read by a computer. Further, this program may be stored on a computer connected to a network such as the Internet, and may be provided or distributed by being downloaded via the network.

本実施形態に係るアノテーション装置、アノテーション方法、及び、アノテーションプログラムは、以上で説明した実施形態、変形例の構成要素を適宜組み合わせることで構成してもよい。 The annotation device, annotation method, and annotation program according to the present embodiment may be configured by appropriately combining the components of the embodiments and modifications described above.

１アノテーション装置
１０表示機器（表示部）
２０操作機器（操作部）
３０データ入出力機器
４０記憶回路
４０ａ指定ファイル記憶領域
５０処理回路（処理部）
５１表示処理部
５２操作処理部
５３データ入出力処理部
５４タスク作成処理部
５５アノテーション処理部
１００Ａ、１００Ｂ、１００Ｃアノテーション画面
１０１動画像表示領域
１０１ａ動画像操作領域
１０２ラベル追加操作領域
１０３ラベル選択領域
１０３Ａメインリスト
１０３Ｂ、１０３Ｃ、１０３Ｄ、１０３Ｅサブリスト
１０４タスク終了操作領域
１０５決定ラベル表示画像
１０６消去操作領域
１０７終了操作領域
１０８消去決定画像
１０９追加済みラベル表示領域
１１０スライドバー表示領域
１１１ファイル名表示領域
１１１Ａファイルリスト
１１２ファイル選択領域
１１３作業終了確認画像
１１４作業終了決定画像
２００管理モード画面
２０１タスク作成操作領域
３００タスク作成画面
ＡＬ機械学習アルゴリズム
Ｄ１入力データ
Ｄ２教師データ
Ｄ３学習用データセット
Ｄ４検出対象データ
Ｄ５種別特定結果データ
Ｍ学習済みモデル
Ｓｖ学習用データ提供システム
Ｓｙｓ１検出システム 1 Annotation device 10 Display device (display unit)
20 Operation equipment (operation unit)
30 Data input / output device 40 Storage circuit 40a Designated file storage area 50 Processing circuit (processing unit)
51 Display processing unit 52 Operation processing unit 53 Data input / output processing unit 54 Task creation processing unit 55 Annotation processing unit 100A, 100B, 100C Annotation screen 101 Video display area 101a Video operation area 102 Label addition operation area 103 Label selection area 103A Main list 103B, 103C, 103D, 103E Sublist 104 Task end operation area 105 Decision label display image 106 Erase operation area 107 End operation area 108 Erase decision image 109 Added label display area 110 Slide bar display area 111 File name display area 111A File list 112 File selection area 113 Work end confirmation image 114 Work end decision image 200 Management mode screen 201 Task creation operation area 300 Task creation screen AL Machine learning algorithm D1 Input data D2 Teacher data D3 Learning data set D4 Detection target data D5 Type Specific result data M Trained model Sv Training data provision system Sys1 detection system

Claims

A display unit that can display the moving image represented by the moving image data,
The operation unit that accepts operations and
It is provided with a processing unit capable of executing annotation processing for adding annotation information to the moving image data in response to an operation on the operation unit and creating teacher data used for machine learning of the trained model.
In the annotation process, the processing unit specifies the position of an object included in the moving image in response to an operation on the operating unit, and adds an object label indicating the type of the object to the moving image data as the annotation information. Processing, processing to add a relational label indicating the type of an event in which a plurality of objects included in the moving image correlate to the moving image data as the annotation information, and the operation unit. From the object label added in response to the operation to, the object label of the object related to the event targeted by the relational label is designated, and a process of adding to the moving image data as the annotation information is executed. Characterized by that,
Annotation device.

In the annotation processing, the processing unit handles a plurality of predetermined moving image data files as the moving image data representing the series of the moving images.
The annotation device according to claim 1.

Steps to display the moving image represented by the moving image data,
Steps to accept operations and
It includes a step of adding annotation information to the moving image data according to an operation to create teacher data used for machine learning of a trained model.
In the step of creating the teacher data, the position of the object included in the moving image is specified according to the operation, and the object label indicating the type of the object is added to the moving image data as the annotation information, and the operation is performed. A relational label indicating the type of an event in which a plurality of objects included in the moving image correlate is added to the moving image data as the annotation information, and the object label added according to the operation is the target of the relational label. It is characterized in that the object label of the object related to the event is designated and added to the moving image data as the annotation information.
Annotation method.

The moving image represented by the moving image data is displayed, and the moving image is displayed.
Accept the operation,
Annotation information is added to the moving image data according to the operation to create teacher data used for machine learning of the trained model.
Let the computer perform each process
In the process of creating the teacher data, the position of the object included in the moving image is specified according to the operation, and the object label indicating the type of the object is added to the moving image data as the annotation information, and the operation is performed. A relationship label indicating the type of event in which a plurality of objects included in the moving image correlate is added to the moving image data as the annotation information, and the object label added according to the operation is the target of the relationship label. The object label of the object related to the event is designated and added to the moving image data as the annotation information.
It is characterized in that each process is executed by the computer.
Annotation program.