JP7455598B2

JP7455598B2 - Image processing device, annotation processing device, image processing method, annotation processing method, image processing program

Info

Publication number: JP7455598B2
Application number: JP2020017074A
Authority: JP
Inventors: 君孝村下; 浩山田
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2020-02-04
Filing date: 2020-02-04
Publication date: 2024-03-26
Anticipated expiration: 2040-02-04
Also published as: JP2021124882A

Description

本発明は、画像処理装置、アノテーション処理装置、画像処理方法、アノテーション処理方法、画像処理プログラムに関する。 The present invention relates to an image processing device, an annotation processing device, an image processing method, an annotation processing method, and an image processing program.

コンピュータ等により、画像に対して、画像中に現れる物体等の位置及び名称を特定する処理を行うことが求められている。当該処理には、例えば、機械学習によって生成された学習済み推定モデルが使用される。当該モデルを学習させるには、画像と当該画像中の物体等の位置及び名称とを対応付けた教師データを大量に用意することが求められる。 2. Description of the Related Art Computers and the like are required to perform processing on images to identify the positions and names of objects appearing in the images. For example, a learned estimation model generated by machine learning is used in this process. In order to train the model, it is required to prepare a large amount of training data that associates images with the positions and names of objects in the images.

特開２００７－３１６８３９号公報Japanese Patent Application Publication No. 2007-316839 特開２００６－２３４４９２号公報Japanese Patent Application Publication No. 2006-234492

画像と当該画像中の所定の物体の位置（範囲）及び名称とを対応付ける処理をアノテーション処理という。物体の位置は、例えば、画像中の物体を包含する長方形の角の位置によって表される。アノテーション処理を効率化するために、動画像の１フレームの画像について、手動でアノテーション処理を行い、他のフレームをオートアノテーションツール（ＣＶＡＴ（Computer Vision Annotation Tool）、ＶｏＴＴ（Visual Object Tagging Tool））などにより、自動でアノテーション処理を行う手法がある。これにより、画像と
当該画像中の物体等の位置及び名称とを対応付けた教師データを大量に用意することができる。しかし、この手法では、手動でアノテーション処理を行ったフレームから離れるにつれて、誤差が累積し、画像における物体の位置と検出した物体の位置とが徐々にずれていくという問題がある。画像における物体の位置と検出した物体の位置とがずれた教師データで機械学習すると、学習済みモデルの質が低くなるおそれがある。動画像の各フレームの画像について自動で適切なアノテーション処理を行うには、高精度な自動アノテーション処理が可能となる動画像の区間を適切に選択することが好ましい。 The process of associating an image with the position (range) and name of a predetermined object in the image is called annotation process. The position of the object is represented, for example, by the position of the corner of a rectangle that includes the object in the image. In order to improve the efficiency of annotation processing, one frame of the video is manually annotated, and other frames are annotated using automatic annotation tools (CVAT (Computer Vision Annotation Tool), VoTT (Visual Object Tagging Tool), etc.). There is a method to automatically process annotations. This makes it possible to prepare a large amount of training data that associates images with the positions and names of objects, etc. in the images. However, this method has a problem in that errors accumulate and the position of the object in the image gradually deviates from the position of the detected object as the distance from the manually annotated frame increases. If machine learning is performed using training data in which the position of an object in an image differs from the position of a detected object, the quality of the trained model may deteriorate. In order to automatically perform appropriate annotation processing on images of each frame of a moving image, it is preferable to appropriately select a section of the moving image that allows highly accurate automatic annotation processing.

本発明は、動画像から物体を追跡しやすい動画像の区間を抽出できる技術を提供することを目的とする。 An object of the present invention is to provide a technique that can extract a section of a moving image in which it is easy to track an object.

上記課題を解決するために、以下の手段を採用する。
即ち、第１の態様は、
画像中に現れる物体の位置及び名称を特定する処理を行う画像処理装置であって、
移動体に取り付けられたカメラで撮影された動画像と、前記動画像に対応付けられている前記移動体の走行情報とを格納する記憶部と、
前記走行情報に基づいて、前記動画像から、前記移動体が所定走行状態である期間の、前記動画像の区間を、前記画像中に現れる物体の位置及び名称を特定する処理を行うために抽出する画像処理部と、
を備える画像処理装置とする。 In order to solve the above problem, the following means are adopted.
That is, the first aspect is
An image processing device that performs processing to identify the position and name of an object appearing in an image,
a storage unit that stores a moving image taken by a camera attached to a moving object and travel information of the moving object that is associated with the moving image;
Based on the running information, a section of the moving image during a period in which the mobile object is in a predetermined running state is extracted from the moving image in order to perform processing for specifying the position and name of an object appearing in the image. an image processing unit to
An image processing device is provided.

開示の態様は、プログラムが情報処理装置によって実行されることによって実現されてもよい。即ち、開示の構成は、上記した態様における各手段が実行する処理を、情報処理装置に対して実行させるためのプログラム、或いは当該プログラムを記録したコンピュー
タ読み取り可能な記録媒体として特定することができる。また、開示の構成は、上記した各手段が実行する処理を情報処理装置が実行する方法をもって特定されてもよい。開示の構成は、上記した各手段が実行する処理を行う情報処理装置を含むシステムとして特定されてもよい。 The disclosed aspects may be realized by a program being executed by an information processing device. That is, the disclosed configuration can be specified as a program for causing an information processing apparatus to execute the processing executed by each means in the above-described aspect, or a computer-readable recording medium on which the program is recorded. Further, the disclosed configuration may be specified by the method by which the information processing apparatus executes the processing executed by each of the above-mentioned means. The disclosed configuration may be specified as a system including an information processing device that performs the processing performed by each of the above-described means.

本発明によれば、動画像から物体を追跡しやすい動画像の区間を抽出できる技術を提供することができる。 According to the present invention, it is possible to provide a technique that can extract a section of a moving image in which it is easy to track an object.

図１は、実施形態の画像処理装置の構成例を示す図である。FIG. 1 is a diagram showing a configuration example of an image processing apparatus according to an embodiment. 図２は、情報処理装置のハードウェア構成例を示す図である。FIG. 2 is a diagram showing an example of the hardware configuration of the information processing device. 図３は、実施形態の画像処理装置の動作フローの例を示す図である。FIG. 3 is a diagram illustrating an example of the operation flow of the image processing apparatus according to the embodiment. 図４は、静止物の追跡の例を示す図である。FIG. 4 is a diagram illustrating an example of tracking a stationary object.

以下、図面を参照して実施形態について説明する。実施形態の構成は例示であり、発明の構成は、開示の実施形態の具体的構成に限定されない。発明の実施にあたって、実施形態に応じた具体的構成が適宜採用されてもよい。 Embodiments will be described below with reference to the drawings. The configuration of the embodiment is an example, and the configuration of the invention is not limited to the specific configuration of the disclosed embodiment. In implementing the invention, specific configurations depending on the embodiments may be adopted as appropriate.

〔実施形態〕
（構成例）
図１は、本実施形態の画像処理装置の構成例を示す図である。画像処理装置１００は、画像処理部１０２、入力部１０４、出力部１０６、通信部１０８、記憶部１１０を有する。画像処理装置１００は、車両等に搭載されるカメラで撮影された動画像と、当該動画像が撮影された際の車両の走行状態を示す走行情報とを対応付けて記憶部１１０に格納している。車両は移動体の一例である。車両の走行情報には、車両の操舵角を示す操舵情報や車両の速度を示す速度情報が含まれる。当該動画像と走行情報とは、通信部１０８により、車両や他の情報処理装置等から通信ネットワーク等を介して取得される。画像処理装置１００は、記憶部１１０に格納される動画像から、自動でアノテーション処理を行うのに適した動画像の区間を抽出する。画像処理装置１００は、車両の走行情報に基づいて、当該動画像の区間を抽出する。画像処理装置１００は、抽出された動画像の区間に含まれるフレームの画像について、画像に含まれる物体についてアノテーション処理を行う。物体には、例えば、標識、車両、建物などの人工物、植物、地形、岩、動物などの自然物等が含まれ得る。アノテーション処理は、画像と当該画像中の所定の物体の位置（範囲）及び名称とを対応付ける処理である。画像処理装置１００は、画像と画像中の物体の位置（範囲）及び名称とを対応付けて、記憶部１１０に格納する。物体の名称は、物体の状態、性質などの物体に関連する情報であってもよい。アノテーション処理された画像と物体の位置及び名称とは、アノテーション処理を行う学習済み推定モデルの機械学習における教師データとして使用され得る。 [Embodiment]
(Configuration example)
FIG. 1 is a diagram showing a configuration example of an image processing apparatus according to this embodiment. The image processing device 100 includes an image processing section 102, an input section 104, an output section 106, a communication section 108, and a storage section 110. The image processing device 100 stores in the storage unit 110 a moving image captured by a camera mounted on a vehicle, etc., and driving information indicating the driving state of the vehicle at the time the moving image was captured. There is. A vehicle is an example of a moving object. The vehicle running information includes steering information indicating the steering angle of the vehicle and speed information indicating the speed of the vehicle. The moving image and travel information are acquired by the communication unit 108 from a vehicle or other information processing device via a communication network or the like. The image processing device 100 extracts a section of the moving image suitable for automatically performing annotation processing from the moving image stored in the storage unit 110. The image processing device 100 extracts a section of the moving image based on vehicle travel information. The image processing device 100 performs annotation processing on objects included in images of frames included in the extracted moving image section. Objects may include, for example, artificial objects such as signs, vehicles, and buildings, natural objects such as plants, landforms, rocks, animals, and the like. The annotation process is a process of associating an image with the position (range) and name of a predetermined object in the image. The image processing device 100 stores an image in the storage unit 110 in association with the position (range) and name of an object in the image. The name of the object may be information related to the object, such as the state and properties of the object. The annotated image and the position and name of the object can be used as training data in machine learning of a learned estimation model that performs the annotation process.

画像処理部１０２は、記憶部１１０に格納される動画像と当該動画像に対応付けられる車両の走行情報とを取得する。画像処理部１０２は、車両の走行情報に基づいて、車両が所定の走行状態の際に撮影された動画像を抽出する。画像処理部１０２は、利用者等に、抽出された動画像に含まれる１フレームの画像中の物体の位置（範囲）及び名称を入力部１０４により入力させる。物体の位置は、例えば、物体を含む長方形（の各頂点の位置の座標）で特定される。画像処理部１０２は、入力されたフレームの画像中の物体の範囲に含まれる特徴点を抽出する。画像処理部１０２は、他のフレームの画像について、抽出した特徴点を追跡し、各物体の特徴点の移動ベクトル（画像内で移動する方向と大きさを示すベクトル）を算出する。画像処理部１０２は、各物体の特徴点の移動ベクトルに基づい
て、静止物（静止している物体）または直進移動物（直進移動している物体）であるかを判定する。画像処理部１０２は、静止物または直進移動物と判定された物体について、追跡を行う。画像処理部１０２は、追跡に基づいて、アノテーション処理を行い、他のフレームの画像中の物体の位置及び名称を記憶部１１０に格納する。画像処理部１０２は、出力部１０６に他のフレームの画像及び当該画像に含まれる物体の位置及び名称を表示し、利用者等に確認結果を入力部１０４より入力させる。 The image processing unit 102 acquires a moving image stored in the storage unit 110 and vehicle travel information associated with the moving image. The image processing unit 102 extracts a moving image taken when the vehicle is in a predetermined running state, based on vehicle running information. The image processing unit 102 allows a user or the like to input the position (range) and name of an object in one frame of the image included in the extracted moving image using the input unit 104. The position of the object is specified, for example, by (coordinates of the position of each vertex of) a rectangle containing the object. The image processing unit 102 extracts feature points included in the range of the object in the input frame image. The image processing unit 102 tracks the extracted feature points for images of other frames, and calculates a movement vector (vector indicating the direction and size of movement within the image) of the feature point of each object. The image processing unit 102 determines whether each object is a stationary object (an object that is stationary) or a rectilinear moving object (an object that is moving in a straight line) based on the movement vector of the feature point of each object. The image processing unit 102 tracks an object determined to be a stationary object or a rectilinear moving object. The image processing unit 102 performs annotation processing based on the tracking, and stores the position and name of the object in the images of other frames in the storage unit 110. The image processing unit 102 displays images of other frames and the positions and names of objects included in the images on the output unit 106, and allows the user or the like to input the confirmation results through the input unit 104.

入力部１０４は、利用者等による情報の入力を受け付ける入力手段である。入力部１０４は、キーボード、ポインティングデバイス等の入力装置である。入力部１０４は、利用者による、出力部１０６に表示される画像に含まれる物体の位置（範囲）及び名称の入力を受け付ける。また、入力部１０４は、出力部１０６に表示されるアノテーション処理の結果の確認結果の入力を受け付ける。 The input unit 104 is an input means for accepting information input by a user or the like. The input unit 104 is an input device such as a keyboard or a pointing device. The input unit 104 accepts input by a user of the position (range) and name of an object included in an image displayed on the output unit 106. The input unit 104 also accepts input of a confirmation result of the annotation processing result displayed on the output unit 106.

出力部１０６は、利用者等に対する情報の表示等の出力を行う出力手段である。出力部１０６は、例えば、ディスプレイ等の表示装置である。出力部１０６は、動画像のフレームの画像、アノテーション結果である、画像と画像に含まれる物体の位置及び名称とを表示する。 The output unit 106 is an output unit that outputs information such as displaying information to a user or the like. The output unit 106 is, for example, a display device such as a display. The output unit 106 displays an image of a frame of a moving image, an annotation result, and the position and name of an object included in the image.

通信部１０８は、通信ネットワーク等を介して他の情報処理装置などと通信をする通信インタフェースである。通信部１０８は、他の情報処理装置などから、動画像及び走行情報を受信し、記憶部１１０に格納する。 The communication unit 108 is a communication interface that communicates with other information processing devices via a communication network or the like. The communication unit 108 receives moving images and driving information from other information processing devices, etc., and stores them in the storage unit 110.

記憶部１１０は、車両等に搭載されたカメラで撮影された動画像と当該車両等の走行状態を示す走行情報とを対応付けて格納する。動画像は、複数のフレームの静止画像（画像）の集合体である。記憶部１１０に格納される動画像は、車両に搭載されるカメラによって撮影された動画像である。車両に搭載されるカメラは、例えば、車両の前方に固定して設置され、車両の走行方向（前方向）に向けられている。動画像に対応付けられる走行情報は、車両のＣＡＮ（Control Area Network）等により車両の制御システム等から取得された操舵角や速度等の情報である。例えば、動画像に付加された時刻情報と、車両情報に付加された時刻情報とが対応付けられることにより、動画像が撮影された際の走行状態が認識され得る。また、記憶部１１０は、アノテーション結果である、画像と画像に含まれる物体の位置及び名称とを対応付けて格納する。車両の走行情報には、カメラの撮影方向の情報が含まれていてもよい。カメラの撮影方向の情報は、画像内のエピポーラ拘束線等を求める際に使用され得る。 The storage unit 110 stores moving images captured by a camera mounted on a vehicle or the like in association with driving information indicating the driving state of the vehicle or the like. A moving image is a collection of still images (images) of multiple frames. The moving image stored in the storage unit 110 is a moving image captured by a camera mounted on a vehicle. A camera mounted on a vehicle is, for example, fixedly installed at the front of the vehicle and directed in the traveling direction (forward) of the vehicle. The driving information associated with the moving image is information such as the steering angle and speed acquired from the vehicle's control system using the vehicle's CAN (Control Area Network) or the like. For example, by associating time information added to a moving image with time information added to vehicle information, the driving state at the time the moving image was captured can be recognized. Furthermore, the storage unit 110 stores images, which are annotation results, and the positions and names of objects included in the images in association with each other. The driving information of the vehicle may include information about the shooting direction of the camera. Information on the photographing direction of the camera can be used when determining an epipolar constraint line or the like within the image.

図２は、情報処理装置のハードウェア構成例を示す図である。図２に示す情報処理装置は、一般的なコンピュータの構成を有している。画像処理装置１００は、図２に示すような情報処理装置９０によって実現される。図２の情報処理装置９０は、プロセッサ９１、メモリ９２、記憶部９３、入力部９４、出力部９５、通信制御部９６を有する。これらは、互いにバスによって接続される。メモリ９２及び記憶部９３は、コンピュータ読み取り可能な記録媒体である。コンピュータのハードウェア構成は、図２に示される例に限らず、適宜構成要素の省略、置換、追加が行われてもよい。 FIG. 2 is a diagram showing an example of the hardware configuration of the information processing device. The information processing apparatus shown in FIG. 2 has a general computer configuration. The image processing device 100 is realized by an information processing device 90 as shown in FIG. The information processing device 90 in FIG. 2 includes a processor 91, a memory 92, a storage section 93, an input section 94, an output section 95, and a communication control section 96. These are connected to each other by a bus. The memory 92 and the storage unit 93 are computer-readable recording media. The hardware configuration of the computer is not limited to the example shown in FIG. 2, and components may be omitted, replaced, or added as appropriate.

情報処理装置９０は、プロセッサ９１が記録媒体に記憶されたプログラムをメモリ９２の作業領域にロードして実行し、プログラムの実行を通じて各構成部等が制御されることによって、所定の目的に合致した機能を実現することができる。 The information processing device 90 is configured such that a processor 91 loads a program stored in a recording medium into a working area of a memory 92 and executes it, and each component is controlled through the execution of the program, thereby achieving a predetermined purpose. functions can be realized.

プロセッサ９１は、例えば、ＣＰＵ（Central Processing Unit）やＤＳＰ（Digital Signal Processor）である。 The processor 91 is, for example, a CPU (Central Processing Unit) or a DSP (Digital Signal Processor).

メモリ９２は、例えば、ＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）を含む。メモリ９２は、主記憶装置とも呼ばれる。 The memory 92 includes, for example, RAM (Random Access Memory) and ROM (Read Only Memory). Memory 92 is also called a main storage device.

記憶部９３は、例えば、ＥＰＲＯＭ（Erasable Programmable ROM）、ハードディスク
ドライブ（ＨＤＤ、Hard Disk Drive）である。また、記憶部９３は、リムーバブルメデ
ィア、即ち可搬記録媒体を含むことができる。リムーバブルメディアは、例えば、ＵＳＢ（Universal Serial Bus）メモリ、あるいは、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）のようなディスク記録媒体である。記憶部９３は、二次記憶装置とも呼ばれる。 The storage unit 93 is, for example, an EPROM (Erasable Programmable ROM) or a hard disk drive (HDD). Furthermore, the storage unit 93 can include a removable medium, that is, a portable recording medium. The removable medium is, for example, a USB (Universal Serial Bus) memory, or a disc recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). The storage unit 93 is also called a secondary storage device.

記憶部９３は、各種のプログラム、各種のデータ及び各種のテーブルを読み書き自在に記録媒体に格納する。記憶部９３には、オペレーティングシステム（Operating System :ＯＳ）、各種プログラム、各種テーブル等が格納される。記憶部９３に格納される情報は、メモリ９２に格納されてもよい。また、メモリ９２に格納される情報は、記憶部９３に格納されてもよい。 The storage unit 93 stores various programs, various data, and various tables on a recording medium in a readable and writable manner. The storage unit 93 stores an operating system (OS), various programs, various tables, and the like. The information stored in the storage section 93 may also be stored in the memory 92. Further, the information stored in the memory 92 may be stored in the storage section 93.

オペレーティングシステムは、ソフトウェアとハードウェアとの仲介、メモリ空間の管理、ファイル管理、プロセスやタスクの管理等を行うソフトウェアである。オペレーティングシステムは、通信インタフェースを含む。通信インタフェースは、通信制御部９６を介して接続される他の外部装置等とデータのやり取りを行うプログラムである。外部装置等には、例えば、他のコンピュータ、外部記憶装置等が含まれる。 An operating system is software that mediates between software and hardware, manages memory space, manages files, and manages processes and tasks. The operating system includes a communication interface. The communication interface is a program that exchanges data with other external devices connected via the communication control unit 96. External devices include, for example, other computers, external storage devices, and the like.

入力部９４は、キーボード、ポインティングデバイス、ワイヤレスリモコン、タッチパネル等を含む。また、入力部９４は、カメラのような映像や画像の入力装置や、マイクロフォンのような音声の入力装置を含むことができる。 The input unit 94 includes a keyboard, a pointing device, a wireless remote control, a touch panel, and the like. Further, the input unit 94 can include a video or image input device such as a camera, or an audio input device such as a microphone.

出力部９５は、ＬＣＤ（Liquid Crystal Display）、ＥＬ（Electroluminescence）パ
ネル、ＣＲＴ（Cathode Ray Tube）ディスプレイ、ＰＤＰ（Plasma Display Panel）等の表示装置、プリンタ等の出力装置を含む。また、出力部９５は、スピーカのような音声の出力装置を含むことができる。 The output unit 95 includes a display device such as an LCD (Liquid Crystal Display), an EL (Electroluminescence) panel, a CRT (Cathode Ray Tube) display, a PDP (Plasma Display Panel), and an output device such as a printer. Further, the output unit 95 can include an audio output device such as a speaker.

通信制御部９６は、他の装置と接続し、情報処理装置９０と他の装置との間の通信を制御する。通信制御部９６は、例えば、ＬＡＮ（Local Area Network）インタフェースボード、無線通信のための無線通信回路、有線通信のための通信回路である。ＬＡＮインタフェースボードや無線通信回路は、インターネット等のネットワークに接続される。 The communication control unit 96 connects to other devices and controls communication between the information processing device 90 and the other devices. The communication control unit 96 is, for example, a LAN (Local Area Network) interface board, a wireless communication circuit for wireless communication, or a communication circuit for wired communication. The LAN interface board and wireless communication circuit are connected to a network such as the Internet.

画像処理装置１００を実現するコンピュータは、プロセッサが補助記憶装置に記憶されているプログラムを主記憶装置にロードして実行することによって、画像処理部１０２、入力部１０４、出力部１０６、通信部１０８としての機能を実現する。一方、記憶部１１０は、主記憶装置または補助記憶装置の記憶領域に設けられる。 The computer that realizes the image processing device 100 has an image processing section 102, an input section 104, an output section 106, and a communication section 108 by a processor loading a program stored in an auxiliary storage device into the main storage device and executing it. Realize the function as On the other hand, the storage unit 110 is provided in a storage area of a main storage device or an auxiliary storage device.

（動作例）
図３は、本実施形態の画像処理装置１００の動作フローの例を示す図である。ここでは、画像処理装置１００の記憶部１１０には、通信部１０８等により受信された車両に搭載されるカメラで撮影された動画像及び動画像が撮影された際の車両の走行情報とが対応付けられて格納されているとする。 (Example of operation)
3 is a diagram showing an example of an operation flow of the image processing device 100 of this embodiment. Here, it is assumed that the storage unit 110 of the image processing device 100 stores, in association with each other, video images captured by a camera mounted on a vehicle and received by the communication unit 108 or the like, and vehicle driving information when the video images were captured.

Ｓ１０１では、画像処理装置１００の画像処理部１０２は、記憶部１１０に格納される動画像及び走行情報を取得する。動画像は、例えば、車両の走行時等に車両の走行方向等を撮影したものである。 In S101, the image processing unit 102 of the image processing device 100 acquires a moving image and travel information stored in the storage unit 110. The moving image is, for example, a photograph of the direction in which the vehicle is traveling while the vehicle is traveling.

Ｓ１０２では、画像処理部１０２は、Ｓ１０１で取得した動画像において、車両が直線走行（直進走行）している動画像の区間を抽出する。画像処理部１０２は、動画像に対応付けられた車両の走行状態を示す走行情報の、操舵角が０度であり、かつ、速度が０ｋｍ／ｈを超えている際に撮影された動画像の区間を抽出する。操舵角が０度であり、かつ、速度が０ｋｍ／ｈを超えていることは、車両が停止せずに直進走行していることを示している。例えば、画像処理部１０２は、動画像に対応付けられている走行情報において、操舵角が０度であり、かつ、速度が０ｋｍ／ｈを超えている期間が時刻ｔ１から時刻ｔ２までである場合に、動画像の時刻ｔ１から時刻ｔ２までの区間を抽出する。抽出された動画像は、アノテーション処理を行う画像として使用される。画像処理部１０２は、抽出した動画像の区間を記憶部１１０に格納する。画像処理部１０２は、動画像に直進走行の際に撮影された区間が含まれない場合には、当該動画像から動画像の区間の抽出は行われない。また、画像処理部１０２は、操舵角が０度であり、かつ、速度が等速（一定速度）である期間を動画像の区間として抽出してもよい。操舵角が０度であり、かつ、速度が等速である期間の動画像では、車両に搭載されたカメラが等速で移動することになり、速度が等速である期間の動画像を使用すると、静止物等の追跡がより容易になる。当該動画像から動画像の区間の抽出がされない場合、図３の動作フローは終了する。 In S102, the image processing unit 102 extracts, from the moving image acquired in S101, a section of the moving image where the vehicle is running in a straight line (straight ahead). The image processing unit 102 processes a moving image captured when the steering angle is 0 degrees and the speed exceeds 0 km/h, in the driving information indicating the driving state of the vehicle associated with the moving image. Extract the interval. The fact that the steering angle is 0 degrees and the speed exceeds 0 km/h indicates that the vehicle is traveling straight without stopping. For example, in the driving information associated with the video image, if the period in which the steering angle is 0 degrees and the speed exceeds 0 km/h is from time t1 to time t2, Next, the section from time t1 to time t2 of the moving image is extracted. The extracted moving images are used as images for annotation processing. The image processing unit 102 stores the extracted sections of the moving image in the storage unit 110. If the moving image does not include a section photographed during straight forward travel, the image processing unit 102 does not extract the section of the moving image from the moving image. The image processing unit 102 may also extract a period in which the steering angle is 0 degrees and the speed is constant (constant speed) as a section of the moving image. The camera mounted on the vehicle moves at a constant speed, and the video images taken during a period when the steering angle is 0 degrees and the speed is constant are used. This makes it easier to track stationary objects and the like. If the moving image section is not extracted from the moving image, the operation flow in FIG. 3 ends.

Ｓ１０３では、画像処理部１０２は、出力部１０６にＳ１０２で抽出した動画像を表示し、利用者に、動画像のフレームの画像に含まれる物体を含む範囲（領域）を選択させ、当該物体の名称を入力させる。１つの画像に対して複数の物体の範囲及び名称を入力させてもよい。入力部１０４は、利用者による、物体を含む範囲の入力、物体の名称の入力を受け付ける。すなわち、画像処理部１０２は、抽出した動画像に含まれる画像に対して、利用者に手動でのアノテーション処理を行わせる。画像処理部１０２は、物体を含む範囲を選択する際に使用する画像を動画像の各フレームの画像から選択できるように出力部１０６に表示してもよい。ここで、画像処理部１０２は、例えば、利用者に、動画像の各フレームに含まれる同一の物体について、物体の大きさが最大となるフレームの画像を選択させ、当該画像において、物体を含む領域（例えば、長方形）を選択させる。画像処理部１０２は、選択された、動画像に含まれるフレームの画像と、物体の範囲と物体の名称とを対応付けて、記憶部１１０に格納する。また、画像処理部１０２は、周知の画像認識技術等により、動画像に含まれる画像から所定の物体を含む範囲を抽出し、画像と物体の範囲と当該物体の名称とを対応付けて、記憶部１１０に格納してもよい。なお、必ずしも物体の大きさが「最大」となるフレームの画像でなくても、利用者が物体を精度よく選択するのに十分な大きさで映っているフレームの画像を選択してもよい。 In S103, the image processing unit 102 displays the video image extracted in S102 on the output unit 106, allows the user to select a range (region) including the object included in the image of the frame of the video image, and selects a range (area) of the object included in the image of the frame of the video image. Have the user enter the name. The range and names of multiple objects may be input for one image. The input unit 104 receives a user's input of a range including an object and the name of the object. That is, the image processing unit 102 allows the user to manually perform annotation processing on images included in the extracted moving images. The image processing unit 102 may display the image to be used when selecting the range including the object on the output unit 106 so that the image can be selected from the images of each frame of the moving image. Here, for example, the image processing unit 102 allows the user to select an image of a frame in which the size of the object is the largest for the same object included in each frame of the moving image, and Have a region (e.g., a rectangle) selected. The image processing unit 102 associates the image of the selected frame included in the moving image with the range of the object and the name of the object, and stores the image in the storage unit 110 . In addition, the image processing unit 102 extracts a range including a predetermined object from an image included in a moving image using well-known image recognition technology, associates the image, the range of the object, and the name of the object, and stores the extracted range. It may be stored in the section 110. Note that the image of the frame in which the size of the object is not necessarily the "maximum" size may be selected, but the image of the frame in which the size of the object is large enough for the user to accurately select the object may be selected.

Ｓ１０４では、画像処理部１０２は、Ｓ１０３で入力された動画像に含まれるフレームの画像における物体の範囲から、特徴点を抽出する。この特徴点を最初に抽出された特徴点とよぶ。特徴点は、例えば、物体が長方形の標識であるときに、標識の角や縁の部分、標識内の数字、文字、記号内の一点などである。また、特徴点は、例えば、物体が自動車であるときのナンバープレート、ヘッドライト、エンブレム、フロントガラスの角、縁、ドアミラー等の一点である。１つの物体に対して１以上の特徴点が抽出される。物体をより適切に追跡するために、１つの物体に対して、複数の特徴点が抽出されることが望ましい。ここで、Ｓ１０３で物体の大きさが最大となるフレームの画像が選択されることで、画像処理部１０２における特徴点の抽出をしやすくなる。 In S104, the image processing unit 102 extracts feature points from the range of the object in the image of the frame included in the video input in S103. This feature point is called the first extracted feature point. For example, when the object is a rectangular sign, the feature point may be a corner or edge of the sign, or a point within a number, letter, or symbol within the sign. Further, the feature point is, for example, a license plate, a headlight, an emblem, a corner or edge of a windshield, a door mirror, etc. when the object is a car. One or more feature points are extracted for one object. In order to track an object more appropriately, it is desirable to extract multiple feature points for one object. Here, by selecting the image of the frame in which the object size is the largest in S103, it becomes easier for the image processing unit 102 to extract feature points.

さらに、画像処理部１０２は、抽出した特徴点を含むフレームの画像の前後のフレームの画像から、同一の特徴点を抽出する。この特徴点を後に抽出された特徴点とよぶ。前後のフレームの画像から特徴点の抽出は、例えば、特徴点近傍の局所的パターンマッチングによって行われる。また、前後のフレームの画像に対する特徴点を追跡するアルゴリズムとして、周知の特徴点追跡アルゴリズム（ＫＬＴ法（Kanade-Lucas-Tomasi Feature Trac
ker）など）が使用され得る。２フレームの画像における同一の特徴点の位置の差を、特
徴点の移動ベクトルという。物体が静止物である場合、時間的に前のフレームの画像の方向に、特徴点が追跡される。また、物体の大きさが最大となるフレームの画像が選択されるため、物体の大きさが小さくなるフレームの画像の方向に、特徴点が追跡される。画像処理部１０２は、抽出した各画像における各物体の特徴点を記憶部１１０に格納する。なお、例えばカメラが車両の前方ではなく後方に取り付けられている場合は、画像の中の物体の大きさは時間が経つほど小さくなる。この場合は時間的に後のフレームの画像の方向に特徴点を追跡すればよい。要するに、画像中の物体の大きさが小さくなる方向に特徴点を追跡すればよい。 Further, the image processing unit 102 extracts the same feature point from images of frames before and after the image of the frame including the extracted feature point. These feature points are later referred to as extracted feature points. Extraction of feature points from images of previous and subsequent frames is performed, for example, by local pattern matching near the feature points. In addition, as an algorithm for tracking feature points for images of previous and subsequent frames, we used a well-known feature point tracking algorithm (KLT method (Kanade-Lucas-Tomasi Feature Tracing).
ker) etc.) may be used. The difference between the positions of the same feature point in two frames of images is called a movement vector of the feature point. If the object is stationary, feature points are tracked in the direction of the image of the temporally previous frame. Furthermore, since the image of the frame where the object size is the largest is selected, the feature points are tracked in the direction of the image of the frame where the object size is smaller. The image processing unit 102 stores the feature points of each object in each extracted image in the storage unit 110. Note that, for example, if the camera is attached to the rear of the vehicle rather than the front, the size of the object in the image becomes smaller as time passes. In this case, the feature points may be tracked in the direction of the image of the temporally later frame. In short, it is sufficient to track the feature points in the direction in which the size of the object in the image becomes smaller.

Ｓ１０５では、画像処理部１０２は、各物体の特徴点の移動ベクトルに基づいて、Ｓ１０４で抽出された特徴点を含む物体が、静止物（静止している物体）または直進移動物（直進移動している物体）であるかを判定する。 In S105, the image processing unit 102 determines whether the object including the feature points extracted in S104 is a stationary object (an object that is stationary) or a rectilinear moving object (an object that is moving in a straight line), based on the movement vector of the feature point of each object. (object).

一般に、直進移動するカメラで撮影された動画像において、静止物は動画像の画像内で所定の直線上を移動する。この所定の直線をエピポーラ拘束線という。直進移動するカメラで撮影された動画像において、特徴点を含む物体が静止物である場合、当該特徴点は画像内でエピポーラ拘束線上を移動する。車両の直進走行に伴って、画像内の静止物はエピポーラ拘束線上を移動する。例えば、車両の進行方向が画像の中心である（車両が画像の中心に向かって直進走行している）とすると、エピポーラ拘束線は、画像の中心と特徴点とを通る直線となる。よって、車両がカメラで撮影された動画像の中心に向かって直進走行しているとすると、それぞれの最初に抽出された特徴点のエピポーラ拘束線は、最初に抽出された特徴点と画像の中心とを通る直線となる。画像処理部１０２は、ある物体の最初に抽出された特徴点の当該エピポーラ拘束線上に、当該物体の後に抽出された特徴点が存在する場合に、当該特徴点を含む物体が静止物であると判定する。画像処理部１０２は、当該特徴点の移動ベクトルがエピポーラ拘束線上に存在する場合に当該特徴点を含む物体が静止物であると判定してもよい。画像処理部１０２は、当該物体が静止物である情報を、当該物体の範囲等に対応付けて記憶部１１０に格納する。静止物が画像内でエピポーラ拘束線上を移動すると予測することで、物体の追跡の精度を向上させることができる。 Generally, in a moving image captured by a camera that moves in a straight line, a stationary object moves along a predetermined straight line within the moving image. This predetermined straight line is called an epipolar constraint line. In a moving image captured by a camera that moves in a straight line, if an object including a feature point is stationary, the feature point moves on an epipolar constraint line within the image. As the vehicle travels straight, a stationary object in the image moves on the epipolar constraint line. For example, if the traveling direction of the vehicle is the center of the image (the vehicle is traveling straight toward the center of the image), the epipolar constraint line will be a straight line passing through the center of the image and the feature point. Therefore, if the vehicle is traveling straight toward the center of a moving image captured by a camera, the epipolar constraint line of each first extracted feature point will be between the first extracted feature point and the center of the image. It becomes a straight line passing through. The image processing unit 102 determines that the object including the feature point is a stationary object when there is a feature point extracted after the object on the epipolar constraint line of the feature point extracted first of the object. judge. The image processing unit 102 may determine that the object including the feature point is a stationary object when the movement vector of the feature point is on the epipolar constraint line. The image processing unit 102 stores information indicating that the object is a stationary object in the storage unit 110 in association with the range of the object. By predicting that a stationary object will move on an epipolar constraint line within an image, the accuracy of tracking the object can be improved.

また、特徴点を含む物体が直進移動物である場合、特徴点は画像内で直線上を移動する。当該物体は、例えば、直進走行する他の車両である。直進移動するカメラで撮影された動画像において、直進移動物は画像内で直線上を移動する。当該直線は、エピポーラ拘束線とは異なる。画像処理部１０２は、ある物体の最初に抽出された特徴点及び後から抽出された特徴点が１つの直線上に存在する場合に、当該特徴点を含む物体が直進移動物であると判定する。画像処理部１０２は、当該物体が直進移動物である情報を、当該物体の範囲等に対応付けて記憶部１１０に格納する。直進移動物が画像内で直線上を移動すると予測することで、物体の追跡の精度を向上させることができる。 Further, if the object including the feature point is a rectilinear moving object, the feature point moves on a straight line within the image. The object is, for example, another vehicle traveling straight. In a moving image captured by a camera moving in a straight line, an object moving in a straight line moves in a straight line within the image. The straight line is different from the epipolar constraint line. When the first extracted feature point and the later extracted feature point of an object exist on one straight line, the image processing unit 102 determines that the object including the feature point is a straight-moving object. . The image processing unit 102 stores information indicating that the object is a rectilinear moving object in the storage unit 110 in association with the range of the object. By predicting that a rectilinearly moving object will move in a straight line within an image, the accuracy of tracking the object can be improved.

画像処理部１０２は、Ｓ１０４で抽出された特徴点を含む物体が静止物でも直進移動物でもない場合、当該物体に対して以後の処理を行わない。画像処理部１０２は、ある物体の最初に抽出された特徴点のエピポーラ拘束線上に、当該物体の後に抽出された特徴点の一部が存在しない場合に、当該物体に対して以後の処理を行わなくてもよい。 If the object including the feature points extracted in S104 is neither a stationary object nor a rectilinear moving object, the image processing unit 102 does not perform any further processing on the object. The image processing unit 102 performs subsequent processing on the object if some of the feature points extracted after the object do not exist on the epipolar constraint line of the first extracted feature point of the object. You don't have to.

Ｓ１０６では、画像処理部１０２は、Ｓ１０５で静止物または直進移動物であると判定された物体に対して、各画像のおける当該物体の位置（範囲）の追跡（算出、検出）を行う。画像処理部１０２は、物体が静止物である場合には、物体の位置（物体を含む長方形の頂点）も特徴点と同様にエピポーラ拘束線上を移動するとみなして、物体の位置を算出する。 In S106, the image processing unit 102 tracks (calculates, detects) the position (range) of the object in each image, which is determined to be a stationary object or a rectilinear moving object in S105. When the object is stationary, the image processing unit 102 calculates the position of the object by assuming that the position of the object (the vertex of the rectangle containing the object) also moves on the epipolar constraint line like the feature points.

図４は、静止物の追跡の例を示す図である。図４の例に示すように、後に抽出された特徴点が最初に抽出された特徴点と画像の中心とを結ぶ線分をａ：ｂに分ける点にあるとする。このとき、画像処理部１０２は、当該後に抽出された特徴点を含む物体の位置を、最初に抽出された特徴点を含む物体を含む長方形の各頂点と画像の中心とを結ぶ線分をａ：ｂに分ける点同士を結んだ長方形として算出する。 FIG. 4 is a diagram illustrating an example of tracking a stationary object. As shown in the example of FIG. 4, it is assumed that a later extracted feature point is located at a point that divides the line segment connecting the first extracted feature point and the center of the image into a:b. At this time, the image processing unit 102 calculates the position of the object including the subsequently extracted feature point by a line segment a connecting each vertex of the rectangle including the object including the first extracted feature point and the center of the image. : Calculated as a rectangle connecting the points divided into b.

また、画像処理部１０２は、物体が直線移動物である場合には、物体の位置（物体を含む長方形の各頂点）も特徴点と同様に直線上を移動するとみなして、物体の位置を算出する。例えば、当該後に抽出された特徴点を含む物体の位置は、後に算出された特徴点の位置まで平行移動したとする。また、１つの物体に対して複数の特徴点が含まれている場合には、特徴点同士の間隔の拡大縮小に応じて物体を含む長方形を拡大縮小する。画像処理部１０２は、追跡した各画像における物体の位置を画像等と対応付けて記憶部１１０に格納する。 Furthermore, when the object is a linearly moving object, the image processing unit 102 calculates the position of the object by assuming that the position of the object (each vertex of the rectangle containing the object) also moves on a straight line in the same way as the feature points. do. For example, assume that the position of the object including the subsequently extracted feature point is translated to the position of the later calculated feature point. Further, when a plurality of feature points are included in one object, the rectangle containing the object is enlarged or reduced in accordance with the enlargement or reduction of the interval between the feature points. The image processing unit 102 stores the position of the object in each tracked image in the storage unit 110 in association with the image.

Ｓ１０７では、画像処理部１０２は、Ｓ１０６で追跡した物体の位置を示す領域（長方形）を、当該物体を含む画像に重ねて出力部１０６に表示する。利用者（検査者）等は、出力部１０６に表示される画像に含まれる物体と、物体の位置を示す領域（長方形）とを比較して、これらがずれているか否かを確認する。画像処理部１０２は、利用者による確認結果（検査結果）の入力を、入力部１０４により受け付ける。画像処理部１０２は、ずれているとの確認結果が入力部１０４により入力された画像を、記憶部１１０から削除する。画像処理部１０２は、ずれていないとの確認結果が入力部１０４により入力された画像を、記憶部１１０に格納したままにする。画像処理部１０２は、ずれていないとの確認結果が入力部１０４により入力された画像に、ずれていないとの確認されたことを示す情報を対応付けて、記憶部１１０に格納してもよい。これにより、画像処理部１０２は、記憶部１１０に、機械学習の教師データとして使用される画像と当該画像に含まれる物体の位置及び名称とを格納する。 In S107, the image processing unit 102 displays a region (rectangle) indicating the position of the object tracked in S106 on the output unit 106, overlapping the image containing the object. A user (inspector) or the like compares the object included in the image displayed on the output unit 106 and the area (rectangle) indicating the position of the object to check whether or not they are misaligned. The image processing unit 102 receives input of confirmation results (inspection results) from the user through the input unit 104 . The image processing unit 102 deletes from the storage unit 110 the image for which the input unit 104 has confirmed that the image is misaligned. The image processing unit 102 continues to store in the storage unit 110 the image for which the input unit 104 has confirmed that there is no shift. The image processing unit 102 may store in the storage unit 110 information indicating that it has been confirmed that there is no deviation, in association with the image for which the confirmation result that there is no deviation has been input by the input unit 104. . Thereby, the image processing unit 102 stores, in the storage unit 110, an image used as training data for machine learning and the position and name of an object included in the image.

（変形例）
上記の例では、Ｓ１０２において、画像処理部１０２は、車両が直線走行（直進走行）している動画像の区間を抽出したが、車両が静止している同画像の区間を抽出してもよい。このとき、Ｓ１０２において、画像処理部１０２は、動画像に対応付けられた車両の走行状態を示す走行情報の速度が０ｋｍ／ｈである際に撮影された動画像の区間を抽出する。速度が０ｋｍ／ｈであることは、車両が停止していることを示している。抽出された動画像は、アノテーション処理を行う画像として使用される。画像処理部１０２は、抽出した動画像の区間を記憶部１１０に格納する。また、このとき、Ｓ１０５において、画像処理部１０２は、各物体の特徴点の移動ベクトルに基づいて、Ｓ１０４で抽出された特徴点を含む物体が、直進移動物（直進移動している物体）であるかを判定する。特徴点を含む物体が直進移動物である場合、特徴点は画像内で直線上を移動する。静止しているカメラで撮影された動画像において、直進移動物は画像内で直線上を移動する。画像処理部１０２は、ある物体の最初に抽出された特徴点及び後から抽出された特徴点が１つの直線上に存在する場合に、当該特徴点を含む物体が直進移動物であると判定する。このようにすることで、画像処理部１０２は、静止している車両から撮影された動画像から教師データとして使用される画像と当該画像に含まれる物体の位置及び名称とを記憶部１１０に格納することができる。 (Modified example)
In the above example, in S102, the image processing unit 102 extracts a section of the video image in which the vehicle is traveling in a straight line (traveling straight ahead), but it may also extract a section of the same image in which the vehicle is stationary. . At this time, in S102, the image processing unit 102 extracts a section of the moving image captured when the speed of the driving information indicating the driving state of the vehicle associated with the moving image is 0 km/h. A speed of 0 km/h indicates that the vehicle is stopped. The extracted moving images are used as images for annotation processing. The image processing unit 102 stores the extracted sections of the moving image in the storage unit 110. At this time, in S105, the image processing unit 102 determines whether the object including the feature points extracted in S104 is a rectilinear moving object (an object moving in a straight line) based on the movement vector of the feature point of each object. Determine if there is. If the object including the feature point is a straight-moving object, the feature point moves on a straight line within the image. In a moving image captured by a stationary camera, a rectilinear moving object moves in a straight line within the image. When the first extracted feature point and the later extracted feature point of an object exist on one straight line, the image processing unit 102 determines that the object including the feature point is a straight-moving object. . By doing so, the image processing unit 102 stores in the storage unit 110 an image to be used as training data and the position and name of an object included in the image from a moving image taken from a stationary vehicle. can do.

（実施形態の作用、効果）
画像処理装置１００は、車両等に搭載されるカメラで撮影された動画像と、当該動画像が撮影された際の車両の走行状態を示す走行情報とを対応付けて記憶部１１０に格納している。画像処理装置１００は、記憶部１１０に格納される動画像から、車両が直進走行または静止している際に撮影された動画像の区間を、車両の走行情報に基づいて、抽出する
。当該動画像の区間は、自動でアノテーション処理を行うのに適した画像が含まれる動画像の区間である。画像処理装置１００は、自動でアノテーション処理を行うのに適した画像が含まれる動画像の区間を抽出することができる。画像処理装置１００は、出力部１０６に抽出した動画像を表示し、利用者に、動画像のフレームの画像に含まれる物体を含む範囲（領域）を選択させ、当該物体の名称を入力させる。このとき、画像処理装置１００は、動画像の各フレームに含まれる同一の物体について、物体の大きさが最大となるフレームの画像において、物体の領域を選択させる。物体の大きさが最大となるフレームで物体の領域を選択させることで、選択された領域と物体との誤差を最も小さくできる。物体が静止物である場合には、時間的に後のフレームの画像であるほど画像における物体の大きさが大きくなる。画像処理装置１００は、選択された物体の領域（範囲）において、物体の特徴点を抽出する。画像処理装置１００は、物体が静止物である場合、時間的に前のフレームの画像の方向に、物体の特徴点を追跡する。画像処理装置１００は、物体の特徴点がエピポーラ拘束線上を移動するとき、当該物体は静止物であると判定する。画像処理装置１００は、物体が静止物であると判定したとき、物体の範囲（領域、位置）が特徴点と同様にエピポーラ拘束線上を移動するとみなして、物体の範囲を追跡する。画像処理装置１００は、エピポーラ拘束線上を移動する物体を静止物とみなすことで、静止物の判定を容易にすることができる。画像処理装置１００は、追跡の結果、画像における実際の物体の位置が追跡（算出）した物体の位置と異なる場合に、当該画像を削除して教師データとして使用しないことで、アノテーション処理を行う学習済み推定モデルの精度を向上させることができる。画像処理装置１００は、車両に搭載されるカメラで撮影された動画像から、精度の高い、アノテーション処理を行う学習済み推定モデルの教師データを生成することができる。 (Actions and effects of embodiments)
The image processing device 100 stores in the storage unit 110 a moving image captured by a camera mounted on a vehicle, etc., and driving information indicating the driving state of the vehicle at the time the moving image was captured. There is. The image processing device 100 extracts, from the moving images stored in the storage unit 110, sections of moving images taken while the vehicle is traveling straight or standing still, based on the driving information of the vehicle. The section of the moving image is a section of the moving image that includes images suitable for automatic annotation processing. The image processing apparatus 100 can extract a section of a moving image that includes images suitable for automatically performing annotation processing. The image processing device 100 displays the extracted moving image on the output unit 106, and allows the user to select a range (region) that includes an object included in the frame image of the moving image, and input the name of the object. At this time, for the same object included in each frame of the moving image, the image processing device 100 selects an object area in the image of the frame in which the object size is the largest. By having the object area selected in the frame where the object size is maximum, the error between the selected area and the object can be minimized. When the object is a stationary object, the size of the object in the image becomes larger as the image is in a later frame in time. The image processing device 100 extracts feature points of the object in the selected object region (range). When the object is stationary, the image processing device 100 tracks the feature points of the object in the direction of the image of the temporally previous frame. The image processing device 100 determines that the object is stationary when the feature points of the object move on the epipolar constraint line. When determining that the object is a stationary object, the image processing apparatus 100 tracks the range of the object, assuming that the range (region, position) of the object moves on the epipolar constraint line, similar to the feature points. The image processing apparatus 100 can easily determine whether an object is a stationary object by regarding an object moving on an epipolar constraint line as a stationary object. The image processing device 100 performs learning to perform annotation processing by deleting the image and not using it as training data when the actual position of the object in the image differs from the tracked (calculated) position of the object as a result of tracking. The accuracy of the estimated model can be improved. The image processing device 100 can generate highly accurate training data for a trained estimation model that performs annotation processing from a moving image captured by a camera mounted on a vehicle.

〈コンピュータ読み取り可能な記録媒体〉
コンピュータその他の機械、装置（以下、コンピュータ等）に上記いずれかの機能を実現させるプログラムをコンピュータ等が読み取り可能な記録媒体に記録することができる。そして、コンピュータ等に、この記録媒体のプログラムを読み込ませて実行させることにより、その機能を提供させることができる。 <Computer-readable recording medium>
A program that causes a computer or other machine or device (hereinafter referred to as a computer or the like) to realize any of the above functions can be recorded on a computer-readable recording medium. Then, by causing a computer or the like to read and execute the program on this recording medium, the function can be provided.

ここで、コンピュータ等が読み取り可能な記録媒体とは、データやプログラム等の情報を電気的、磁気的、光学的、機械的、または化学的作用によって蓄積し、コンピュータ等から読み取ることができる記録媒体をいう。このような記録媒体内には、ＣＰＵ、メモリ等のコンピュータを構成する要素を設け、そのＣＰＵにプログラムを実行させてもよい。 Here, a computer-readable recording medium is a recording medium that stores information such as data and programs through electrical, magnetic, optical, mechanical, or chemical action and can be read by a computer, etc. means. Elements constituting a computer, such as a CPU and a memory, may be provided in such a recording medium, and the program may be caused to be executed by the CPU.

また、このような記録媒体のうちコンピュータ等から取り外し可能なものとしては、例えばフレキシブルディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＣＤ－Ｒ／Ｗ、ＤＶＤ、ＤＡＴ、８mmテープ、メモリカード等がある。 Furthermore, among such recording media, those that can be removed from computers and the like include flexible disks, magneto-optical disks, CD-ROMs, CD-R/Ws, DVDs, DATs, 8 mm tapes, and memory cards.

また、コンピュータ等に固定された記録媒体としてハードディスクやＲＯＭ等がある。 In addition, there are hard disks, ROMs, etc. as recording media fixed to computers and the like.

（その他）
以上、本発明の実施形態を説明したが、これらはあくまで例示にすぎず、本発明はこれらに限定されるものではなく、特許請求の範囲の趣旨を逸脱しない限りにおいて、各構成の組み合わせなど、当業者の知識に基づく種々の変更が可能である。 (others)
Although the embodiments of the present invention have been described above, these are merely examples, and the present invention is not limited thereto. Various modifications are possible based on the knowledge of those skilled in the art.

１００：画像処理装置
１０２：画像処理部
１０４：入力部
１０６：出力部
１０８：通信部
１１０：記憶部
９０: 情報処理装置
９１: プロセッサ
９２: メモリ
９３: 記憶部
９４: 入力部
９５: 出力部
９６: 通信制御部 100: Image processing device 102: Image processing unit 104: Input unit 106: Output unit 108: Communication unit 110: Storage unit 90: Information processing device 91: Processor 92: Memory 93: Storage unit 94: Input unit 95: Output unit 96: Communication control unit

Claims

An image processing device that extracts, as a processing target image, an image that undergoes annotation processing that associates the position and name of an object appearing in the image with the image,
Obtaining a moving image captured by a camera attached to a moving object and travel information of the moving object at the time the moving image was captured;
Based on the acquired traveling information, selecting a moving image of a period in which the mobile object is in a straight running state or a stopped state from among the acquired moving images,
extracting a processing target image from the selected moving image;
An image processing device including a processor.

An annotation processing device that associates the position and name of an object appearing in an image with the image,
Obtaining a moving image captured by a camera attached to a moving object and travel information of the moving object at the time the moving image was captured;
Based on the acquired traveling information, selecting a moving image of a period in which the mobile object is in a straight running state or a stopped state from among the acquired moving images,
When an object in a first frame image included in the selected moving image can be tracked in a second frame image different from the first frame image, the object is associated with the first frame image. An annotation processing device comprising a processor that associates the position and name of the object with the image of the second frame.

The processor includes:
performing the tracking of the position of the object in the first frame image with respect to a rectilinearly moving object in the moving image;
The annotation processing device according to claim 2.

The processor includes:
When the position of the object in the second frame image based on the tracking result is different from the actual position of the object in the second frame image , the second frame image is compared to the object position and the object position in the second frame image. Delete names from stored images by associating them with each other,
The annotation processing device according to claim 2.

An image processing method that extracts, as a processing target image, an image that undergoes annotation processing that associates the position and name of an object appearing in the image with the image, the method comprising:
Extracting images to be processed from moving images taken by a camera attached to a moving object, during a period in which the moving object is running in a straight line or in a stopped state;
Image processing method.

An image processing method that extracts, as a processing target image, an image that undergoes annotation processing that associates the position and name of an object appearing in the image with the image, the method comprising:
Obtaining a moving image shot by a camera attached to a moving object and travel information of the moving object at the time of shooting the moving image,
Based on the acquired traveling information, selecting a moving image of a period in which the moving object is in a straight-line traveling state or a stopped state from among the acquired moving images,
extracting a processing target image from the selected video image;
Image processing method.

An annotation processing method for associating the position and name of an object appearing in an image with the image, the method comprising:
Obtaining a moving image captured by a camera attached to a moving object and travel information of the moving object at the time the moving image was captured;
Based on the acquired traveling information, selecting a moving image of a period in which the mobile object is in a straight running state or a stopped state from among the acquired moving images,
When an object in a first frame image included in the selected moving image can be tracked in a second frame image different from the first frame image, the object is associated with the first frame image. An annotation processing method that associates the position and name of an object with the image of the second frame.

An image processing program that extracts, as a processing target image, an image that performs annotation processing that associates the position and name of an object that appears in the image with the image,
Obtaining a moving image shot by a camera attached to a moving object and travel information of the moving object at the time of shooting the moving image,
Based on the acquired traveling information, selecting a moving image of a period in which the mobile object is in a straight running state or a stopped state from among the acquired moving images,
extracting an image to be processed from the selected video image;
An image processing program for the processor to execute.

An annotation processing program that associates the position and name of an object appearing in an image with the image,
Obtaining a moving image captured by a camera attached to a moving object and travel information of the moving object at the time the moving image was captured;
Based on the acquired traveling information, selecting a moving image of a period in which the mobile object is in a straight running state or a stopped state from among the acquired moving images,
When an object in a first frame image included in the selected moving image can be tracked in a second frame image different from the first frame image, the object is associated with the first frame image. storing the position and name of the object in association with the image of the second frame;
An image processing program for the processor to execute.