JP7452094B2

JP7452094B2 - Moving object extraction device, moving object extraction method and program

Info

Publication number: JP7452094B2
Application number: JP2020032245A
Authority: JP
Inventors: 孝光渡邉
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2024-03-19
Anticipated expiration: 2040-02-27
Also published as: JP2021135815A

Description

本発明は、移動物体抽出装置、移動物体抽出方法およびプログラムに関する。 The present invention relates to a moving object extraction device, a moving object extraction method, and a program.

近年、カメラによって撮像された動画像に基づいて各種の処理を行う技術が知られている。例えば、動画像のボケを低減する技術が開示されている（例えば、特許文献１参照）。また、動画像から移動物体の軌跡を抽出し、移動物体の軌跡に基づいて、動画像中に設定された計測線を跨いだ移動物体の数を計測する技術が開示されている（例えば、特許文献２参照）。 2. Description of the Related Art In recent years, techniques have been known that perform various types of processing based on moving images captured by a camera. For example, a technique for reducing blur in moving images has been disclosed (see, for example, Patent Document 1). Additionally, a technology has been disclosed that extracts the trajectory of a moving object from a moving image and measures the number of moving objects that straddle measurement lines set in the moving image based on the trajectory of the moving object (for example, patented (See Reference 2).

特開２０１９－５３５８８号公報JP2019-53588A 特開２０１７－３３４０８号公報JP 2017-33408 Publication

しかし、移動物体の検知精度の低下を抑制しつつ、移動物体の追跡精度の低下を抑制することを可能とする技術が提供されることが望まれる。 However, it is desired to provide a technique that can suppress a decrease in tracking accuracy of a moving object while suppressing a decrease in detection accuracy of a moving object.

上記問題を解決するために、本発明のある観点によれば、第１の演算量によって画像フレームにおける第１の物体の検知および前記第１の物体よりも小さい物体である第２の物体の検知を行う第１のモデルと、前記第１の演算量よりも小さい第２の演算量によって画像フレームにおける前記第１の物体の検知を行う第２のモデルと、複数の画像フレームから構成された動画像を取得し、前記第２のモデルによる物体検知の実行頻度が前記第１のモデルによる物体検知の実行頻度よりも多くなるように画像フレームごとに前記第１のモデルまたは前記第２のモデルに前記動画像を分配する画像フレーム分配部と、前記第１のモデルによる物体検知の結果と前記第２のモデルによる物体検知の結果とに基づいて、前記動画像における前記第１の物体の軌跡を抽出する第１の移動軌跡抽出部と、前記第１のモデルによる物体検知の結果に基づいて、前記動画像における前記第２の物体の軌跡を抽出する第２の移動軌跡抽出部と、を備える、移動物体抽出装置が提供される。
In order to solve the above problem, according to one aspect of the present invention, the detection of a first object in an image frame and the detection of a second object that is a smaller object than the first object are performed using a first amount of calculation. a second model that detects the first object in an image frame with a second amount of calculations smaller than the first amount of calculations, and a video composed of a plurality of image frames. the first model or the second model for each image frame such that the frequency of execution of object detection by the second model is greater than the frequency of execution of object detection by the first model; an image frame distribution unit that distributes the moving image; and a trajectory of the first object in the moving image based on the result of object detection by the first model and the result of object detection by the second model. a first movement trajectory extraction unit that extracts the trajectory of the second object in the video image based on a result of object detection by the first model; and a second movement trajectory extraction unit that extracts the trajectory of the second object in the video image. , a moving object extraction device is provided.

また、本発明の別の観点によれば、複数の画像フレームから構成された動画像を取得することと、第１の演算量によって画像フレームにおける第１の物体の検知および前記第１の物体よりも小さい物体である第２の物体の検知を行う第１のモデルと、前記第１の演算量よりも小さい第２の演算量によって画像フレームにおける前記第１の物体の検知を行う第２のモデルと、のうち、前記第２のモデルによる物体検知の実行頻度が前記第１のモデルによる物体検知の実行頻度以上になるように画像フレームごとに前記第１のモデルまたは前記第２のモデルに前記動画像を分配することと、前記第１のモデルによる物体検知の結果と前記第２のモデルによる物体検知の結果とに基づいて、前記動画像における前記第１の物体の軌跡を抽出することと、前記第１のモデルによる物体検知の結果に基づいて、前記動画像における前記第２の物体の軌跡を抽出することと、を含む、移動物体抽出方法が提供される。
According to another aspect of the present invention, a moving image composed of a plurality of image frames is acquired, and a first object is detected in an image frame by a first amount of calculation, and the first object is detected. a first model that detects a second object that is a smaller object ; and a second model that detects the first object in an image frame using a second amount of calculation that is smaller than the first amount of calculation. The first model or the second model is configured to perform the above-mentioned operation for each image frame so that the frequency of execution of object detection by the second model is greater than or equal to the frequency of execution of object detection by the first model. distributing the moving image; and extracting a trajectory of the first object in the moving image based on a result of object detection by the first model and a result of object detection by the second model. , extracting a trajectory of the second object in the moving image based on a result of object detection by the first model .

また、本発明の別の観点によれば、コンピュータを、第１の演算量によって画像フレームにおける第１の物体の検知および前記第１の物体よりも小さい物体である第２の物体の検知を行う第１のモデルと、前記第１の演算量よりも小さい第２の演算量によって画像フレームにおける前記第１の物体の検知を行う第２のモデルと、複数の画像フレームから構成された動画像を取得し、前記第２のモデルによる物体検知の実行頻度が前記第１のモデルによる物体検知の実行頻度よりも多くなるように画像フレームごとに前記第１のモデルまたは前記第２のモデルに前記動画像を分配する画像フレーム分配部と、前記第１のモデルによる物体検知の結果と前記第２のモデルによる物体検知の結果とに基づいて、前記動画像における前記第１の物体の軌跡を抽出する第１の移動軌跡抽出部と、前記第１のモデルによる物体検知の結果に基づいて、前記動画像における前記第２の物体の軌跡を抽出する第２の移動軌跡抽出部と、を備える移動物体抽出装置として機能させるためのプログラムが提供される。
According to another aspect of the present invention, the computer is configured to detect a first object in an image frame and detect a second object that is smaller than the first object using a first amount of calculation. a first model; a second model that detects the first object in an image frame using a second calculation amount smaller than the first calculation amount; and a moving image composed of a plurality of image frames. the first model or the second model for each image frame such that the frequency of execution of object detection by the second model is greater than the frequency of execution of object detection by the first model. an image frame distribution unit that distributes images; and extracting a trajectory of the first object in the moving image based on a result of object detection by the first model and a result of object detection by the second model. A moving object comprising: a first movement trajectory extraction section; and a second movement trajectory extraction section that extracts a trajectory of the second object in the video image based on the result of object detection by the first model. A program for functioning as an extraction device is provided.

以上説明したように本発明によれば、移動物体の検知精度の低下を抑制しつつ、移動物体の追跡精度の低下を抑制することを可能とする技術が提供される。 As described above, according to the present invention, a technique is provided that makes it possible to suppress a decrease in tracking accuracy of a moving object while suppressing a decrease in detection accuracy of a moving object.

本発明の実施形態に係るカメラによって撮像される画像フレームの例を示す図である。FIG. 3 is a diagram showing an example of an image frame captured by a camera according to an embodiment of the present invention. 本発明の実施形態に係る移動物体抽出システムの機能構成例を示すブロック図である。FIG. 1 is a block diagram showing an example of a functional configuration of a moving object extraction system according to an embodiment of the present invention. 同実施形態に係る移動物体抽出システムの動作の例を示すフローチャートである。It is a flow chart which shows an example of operation of a moving object extraction system concerning the same embodiment. 車両検出部による車両検出の例を示す図である。FIG. 3 is a diagram showing an example of vehicle detection by a vehicle detection unit. １フレーム前の画像フレームでの車両検出の例を示す図である。FIG. 6 is a diagram illustrating an example of vehicle detection in an image frame one frame before. 移動軌跡抽出部による車両追跡の例を示す図である。FIG. 3 is a diagram illustrating an example of vehicle tracking by a movement trajectory extraction unit. 計測線を跨いだ車両台数の計測の例を示す図である。It is a figure which shows the example of the measurement of the number of vehicles which straddled a measurement line. 同実施形態に係る移動物体抽出システムにおける改善可能な点について説明するための図である。FIG. 3 is a diagram for explaining possible improvements in the moving object extraction system according to the embodiment. 画像フレーム分配部および移動軌跡抽出部の具体的な機能について説明するための図である。FIG. 3 is a diagram for explaining specific functions of an image frame distribution section and a movement trajectory extraction section. 分散処理の例について説明するための図である。FIG. 3 is a diagram for explaining an example of distributed processing. 同実施形態に係る移動物体抽出装置の例としての情報処理装置のハードウェア構成を示す図である。FIG. 2 is a diagram showing a hardware configuration of an information processing device as an example of a moving object extraction device according to the embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings. Note that, in this specification and the drawings, components having substantially the same functional configurations are designated by the same reference numerals and redundant explanation will be omitted.

また、本明細書および図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合がある。ただし、実質的に同一の機能構成を有する複数の構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。また、異なる実施形態の類似する構成要素については、同一の符号の後に異なるアルファベットを付して区別する場合がある。ただし、異なる実施形態の類似する構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。 Further, in this specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by attaching different numbers after the same reference numeral. However, if there is no particular need to distinguish between a plurality of components having substantially the same functional configuration, only the same reference numerals are given. Furthermore, similar components in different embodiments may be distinguished by using different alphabets after the same reference numerals. However, when it is not necessary to particularly distinguish between similar components of different embodiments, only the same reference numerals are given.

（０．概要）
本発明の実施形態は、カメラによって撮像された動画像に基づいて、動画像に写る移動物体の軌跡を抽出する技術に関する。動画像は、時系列に沿って連続的に撮像された複数の画像フレームによって構成される。以下では、単位時間あたりの画像フレームの数をフレームレートと言う。なお、本発明の実施形態では、軌跡が抽出される移動物体として車両を例に挙げて説明する。しかし、移動物体は車両に限定されない。例えば、移動物体は、車両以外の乗り物であってもよいし、乗り物以外の物体（例えば、人など）であってもよい。 (0. Overview)
Embodiments of the present invention relate to a technique for extracting a trajectory of a moving object in a moving image based on a moving image captured by a camera. A moving image is composed of a plurality of image frames that are sequentially captured in time series. Hereinafter, the number of image frames per unit time will be referred to as frame rate. Note that the embodiment of the present invention will be described using a vehicle as an example of a moving object whose trajectory is extracted. However, moving objects are not limited to vehicles. For example, the moving object may be a vehicle other than a vehicle, or an object other than a vehicle (for example, a person).

図１は、本発明の実施形態に係るカメラによって撮像される画像フレームの例を示す図である。図１を参照すると、本発明の実施形態に係るカメラによって撮像される動画像を構成する画像フレームＧ０が示されている。本発明の実施形態に係るカメラは、図１に示されるような道路平面を走行する車両を撮像可能な位置に設けられている。カメラは、イメージセンサを含んで構成されており、イメージセンサによって撮像範囲を撮像する。カメラの種類は特に限定されない。例えば、カメラは、可視光カメラであってもよいし、赤外光カメラであってもよい。 FIG. 1 is a diagram showing an example of an image frame captured by a camera according to an embodiment of the present invention. Referring to FIG. 1, an image frame G0 constituting a moving image captured by a camera according to an embodiment of the present invention is shown. The camera according to the embodiment of the present invention is provided at a position where it can image a vehicle traveling on a road plane as shown in FIG. The camera is configured to include an image sensor, and images an imaging range using the image sensor. The type of camera is not particularly limited. For example, the camera may be a visible light camera or an infrared light camera.

画像フレームＧ０には、車両の例として車両Ａ１～Ａ４およびその他の複数の車両が写っている。画像フレームＧ０には、あらかじめ車両の通過数を計測するための線（すなわち、計測線Ｌ０）が設定されている。本発明の実施形態では、計測線Ｌ０を跨いだ車両軌跡の数を計測する例について主に説明する。しかし、車両軌跡の数の計測は、車両軌跡の利用例の一例に過ぎない。すなわち、本発明の実施形態において抽出された車両軌跡の用途は特に限定されない。 In the image frame G0, vehicles A1 to A4 and a plurality of other vehicles are shown as examples of vehicles. A line for measuring the number of passing vehicles (ie, measurement line L0) is set in advance in the image frame G0. In the embodiment of the present invention, an example will be mainly described in which the number of vehicle trajectories that straddle the measurement line L0 is measured. However, measuring the number of vehicle trajectories is only one example of the use of vehicle trajectories. That is, the use of the vehicle trajectory extracted in the embodiment of the present invention is not particularly limited.

また、本発明の実施形態では、計測線Ｌ０を跨いだ車両軌跡の数を、車両軌跡が計測線Ｌ０を跨いだ方向別に計測する例について主に説明する。すなわち、本発明の実施形態では、計測線Ｌ０を方向Ｄ１（計測線Ｌ０によって分割される二つの領域のうちの一方から他方）に跨いだ車両軌跡の数と計測線Ｌ０を方向Ｄ２（当該他方から当該一方）に跨いだ車両軌跡の数とを別々に計測する例について主に説明する。しかし、計測線Ｌ０を跨いだ車両軌跡の数は、車両軌跡が計測線Ｌ０を跨いだ方向の区別なく計測されてもよい。 Furthermore, in the embodiment of the present invention, an example will be mainly described in which the number of vehicle trajectories that straddle the measurement line L0 is measured for each direction in which the vehicle trajectory straddles the measurement line L0. That is, in the embodiment of the present invention, the number of vehicle trajectories that straddle the measurement line L0 in the direction D1 (from one to the other of two regions divided by the measurement line L0) and the measurement line L0 in the direction D2 (the other An example will be mainly explained in which the number of vehicle trajectories that span the vehicle trajectory and the number of vehicle trajectories that straddle the vehicle trajectory and the vehicle trajectory that spans the vehicle trajectory from the vehicle trajectory to the vehicle trajectory that straddles the vehicle trajectory from the vehicle trajectory to the vehicle trajectory that straddles the vehicle trajectory from the vehicle trajectory to the vehicle trajectory that straddles the vehicle trajectory from the vehicle trajectory to the vehicle trajectory that straddles the vehicle trajectory is measured separately. However, the number of vehicle trajectories that straddle the measurement line L0 may be measured regardless of the direction in which the vehicle trajectories straddle the measurement line L0.

以上、本発明の実施形態の概要について説明した。 The outline of the embodiment of the present invention has been described above.

（１．実施形態の詳細）
続いて、本発明の実施形態の詳細について説明する。 (1. Details of embodiment)
Next, details of embodiments of the present invention will be described.

（１－１．システムの機能構成例）
まず、図２を参照しながら、本発明の実施形態に係る移動物体抽出システムの機能構成例について説明する。 (1-1. Example of system functional configuration)
First, an example of the functional configuration of a moving object extraction system according to an embodiment of the present invention will be described with reference to FIG. 2.

図２は、本発明の実施形態に係る移動物体抽出システムの機能構成例を示すブロック図である。図２に示されるように、本発明の実施形態に係る移動物体抽出システム１は、移動物体抽出装置１０およびカメラ２０を備える。移動物体抽出装置１０とカメラ２０とは、有線または無線によって接続されており、カメラ２０によって撮像された各画像フレームは、時系列に沿って連続的に移動物体抽出装置１０に出力される。移動物体抽出装置１０は、制御部１１０および記憶部１３０を備える。 FIG. 2 is a block diagram showing an example of the functional configuration of a moving object extraction system according to an embodiment of the present invention. As shown in FIG. 2, the moving object extraction system 1 according to the embodiment of the present invention includes a moving object extraction device 10 and a camera 20. The moving object extraction device 10 and the camera 20 are connected by wire or wirelessly, and each image frame captured by the camera 20 is continuously outputted to the moving object extraction device 10 in chronological order. The moving object extraction device 10 includes a control section 110 and a storage section 130.

制御部１１０は、プロセッサを含み、記憶部１３０により記憶されているプログラムがプロセッサによりＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）に展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。あるいは、制御部１１０は、専用のハードウェアにより構成されていてもよいし、複数のハードウェアの組み合わせにより構成されてもよい。 The control unit 110 includes a processor, and its functions can be realized by the processor loading a program stored in the storage unit 130 into a RAM (Random Access Memory) and executing it. At this time, a computer-readable recording medium on which the program is recorded may also be provided. Alternatively, the control unit 110 may be configured by dedicated hardware or a combination of multiple pieces of hardware.

本発明の実施形態では、制御部１１０が、プロセッサの例として、２つのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）およびＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）チップを含む場合を想定する。しかし、制御部１１０が含むプロセッサは、これらに限定されない。例えば、制御部１１０は、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒaｙ）などのプロセッサを含んでもよい。本発明の実施形態では、このように互いに特性の異なる複数のプロセッサが並列化されており、当該複数のプロセッサによる分散処理が行われる場合を想定する。これによって高いパフォーマンスの演算が行われ得る。 In the embodiment of the present invention, it is assumed that the control unit 110 includes two CPUs (Central Processing Units) and an AI (Artificial Intelligence) chip as an example of a processor. However, the processor included in the control unit 110 is not limited to these. For example, the control unit 110 may include a processor such as a GPU (Graphics Processing Unit) or an FPGA (Field Programmable Gate Array). In the embodiment of the present invention, it is assumed that a plurality of processors having different characteristics are parallelized and distributed processing is performed by the plurality of processors. This allows high performance calculations to be performed.

制御部１１０は、車両検出部１１１と、移動軌跡抽出部１１５と、計測処理部１１６とを備える。車両検出部１１１は、画像フレーム分配部１１２と、重処理量モデルＭ１（第１のモデル）と、軽処理量モデルＭ２（第２のモデル）とを備える。 The control unit 110 includes a vehicle detection unit 111, a movement trajectory extraction unit 115, and a measurement processing unit 116. The vehicle detection unit 111 includes an image frame distribution unit 112, a heavy throughput model M1 (first model), and a light throughput model M2 (second model).

重処理量モデルＭ１は、画像フレーム分配部１１２から入力される画像フレームから所定の演算量（第１の演算量）によって車両検知を行う。また、軽処理量モデルＭ２は、画像フレーム分配部１１２から入力される画像フレームから重処理量モデルＭ１の演算量よりも小さい演算量（第２の演算量）によって車両検知を行う。重処理量モデルＭ１および軽処理量モデルＭ２それぞれの具体的な演算量は限定されない。例えば、演算量は、モデルへの入力からモデルへの出力までにモデルによって行われる演算の回数であってよい。 The heavy throughput model M1 performs vehicle detection from image frames input from the image frame distribution unit 112 using a predetermined amount of calculation (first amount of calculation). Furthermore, the light throughput model M2 performs vehicle detection from the image frames input from the image frame distribution unit 112 with a smaller amount of calculation (second amount of calculation) than the amount of calculation of the heavy throughput model M1. The specific calculation amount of each of the heavy throughput model M1 and the light throughput model M2 is not limited. For example, the amount of computation may be the number of computations performed by the model from input to the model to output to the model.

重処理量モデルＭ１および軽処理量モデルＭ２それぞれは、画像フレーム分配部１１２から入力される画像フレームを構成する複数の矩形領域それぞれに車両が存在する確からしさを示す値を車両検出スコアとして出力する。一例として、車両検出スコアが所定のスコアよりも大きい矩形領域が、車両が存在する領域（車両領域）として検出される。 The heavy throughput model M1 and the light throughput model M2 each output a value indicating the probability that a vehicle exists in each of a plurality of rectangular areas that constitute the image frame input from the image frame distribution unit 112 as a vehicle detection score. . As an example, a rectangular area with a vehicle detection score greater than a predetermined score is detected as an area where a vehicle exists (vehicle area).

ここで、画像フレームを構成する矩形領域のサイズ、位置、数などは特に限定されない。また、矩形領域の代わりに他の形状の領域が用いられてもよい。また、重処理量モデルＭ１および軽処理量モデルＭ２それぞれは、学習済みのニューラルネットワーク（例えば、学習済みのディープラーニングニューラルネットワーク）であってよい。学習済みのニューラルネットワークは、車両が写る画像フレームと当該画像フレームに写る車両位置（すなわち、車両が存在する矩形領域）との組み合わせを教師データとしてニューラルネットワークを学習させることによって生成され得る。 Here, the size, position, number, etc. of the rectangular areas constituting the image frame are not particularly limited. Moreover, other shaped areas may be used instead of the rectangular area. Furthermore, each of the heavy-throughput model M1 and the light-throughput model M2 may be a trained neural network (for example, a trained deep learning neural network). A trained neural network can be generated by training the neural network using a combination of an image frame in which a vehicle is captured and a vehicle position (that is, a rectangular area where the vehicle is present) in the image frame as training data.

具体的な構成の例として、重処理量モデルＭ１および軽処理量モデルＭ２それぞれは、畳み込み層とプーリング層との繰り返しおよび多段の全結合を含んだニューラルネットワークであってよい。すなわち、重処理量モデルＭ１および軽処理量モデルＭ２それぞれは、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）によって構成されてよい。しかし、重処理量モデルＭ１および軽処理量モデルＭ２それぞれは、畳み込み層およびプーリング層を有していないニューラルネットワークによって構成されてもよい。また、重処理量モデルＭ１および軽処理量モデルＭ２それぞれの全結合は多段に構成されていなくてもよい。 As a specific example of a configuration, each of the heavy-throughput model M1 and the light-throughput model M2 may be a neural network including repetition of convolutional layers and pooling layers and multi-stage full connections. That is, each of the heavy throughput model M1 and the light throughput model M2 may be configured by a CNN (Convolutional Neural Network). However, each of the heavy-throughput model M1 and the light-throughput model M2 may be configured by a neural network that does not have a convolution layer or a pooling layer. Further, the full connections of the heavy throughput model M1 and the light throughput model M2 do not need to be configured in multiple stages.

記憶部１３０は、制御部１１０を動作させるためのプログラムおよびデータを記憶することが可能な記憶装置である。また、記憶部１３０は、制御部１１０の動作の過程で必要となる各種データを一時的に記憶することもできる。例えば、記憶装置は、不揮発性の記憶装置であってよい。例えば、記憶部１３０は、各種データの例として、計測線Ｌ０の位置を記憶し得る。 The storage unit 130 is a storage device that can store programs and data for operating the control unit 110. Further, the storage unit 130 can also temporarily store various data required during the operation of the control unit 110. For example, the storage device may be a non-volatile storage device. For example, the storage unit 130 may store the position of the measurement line L0 as an example of various data.

以上、本発明の実施形態に係る移動物体抽出システム１の機能構成例について説明した。 The functional configuration example of the moving object extraction system 1 according to the embodiment of the present invention has been described above.

（１－２．システムの動作例）
続いて、図３～図９を参照しながら、本発明の実施形態に係る移動物体抽出システム１の動作の例について説明する。 (1-2. System operation example)
Next, an example of the operation of the moving object extraction system 1 according to the embodiment of the present invention will be described with reference to FIGS. 3 to 9.

図３は、本発明の実施形態に係る移動物体抽出システム１の動作の例を示すフローチャートである。図３に示されるように、Ｓ１１～Ｓ１６が時系列に沿って連続的に入力される画像フレームごとに実行される。 FIG. 3 is a flowchart showing an example of the operation of the moving object extraction system 1 according to the embodiment of the present invention. As shown in FIG. 3, S11 to S16 are executed for each image frame that is input continuously in time series.

（車両検出部１１１）
まず、移動物体抽出装置１０において、車両検出部１１１は、カメラ２０から時系列に沿って連続的に入力される画像フレームから車両位置（すなわち、車両領域）を検出する（Ｓ１１）。 (Vehicle detection unit 111)
First, in the moving object extraction device 10, the vehicle detection unit 111 detects a vehicle position (that is, a vehicle region) from image frames that are continuously input in time series from the camera 20 (S11).

図４は、車両検出部１１１による車両検出の例を示す図である。図４を参照すると、カメラ２０から移動物体抽出装置１０に入力された直後の画像フレーム（現在の画像フレーム）の例として、現在の画像フレームＧ１が示されている。例えば、車両検出部１１１は、現在の画像フレームＧ１から車両Ａ１が存在する領域として車両領域Ｐ１１を検出する。同様に、車両検出部１１１は、現在の画像フレームＧ１から、車両Ａ２が存在する領域として車両領域Ｐ２１を検出し、車両Ａ３が存在する領域として車両領域Ｐ３１を検出し、車両Ａ４が存在する領域として車両領域Ｐ４１を検出する。 FIG. 4 is a diagram showing an example of vehicle detection by the vehicle detection unit 111. Referring to FIG. 4, a current image frame G1 is shown as an example of an image frame (current image frame) immediately after being input to the moving object extraction device 10 from the camera 20. For example, the vehicle detection unit 111 detects a vehicle region P11 from the current image frame G1 as the region where the vehicle A1 exists. Similarly, from the current image frame G1, the vehicle detection unit 111 detects a vehicle region P21 as a region where vehicle A2 exists, a vehicle region P31 as a region where vehicle A3 exists, and a region where vehicle A4 exists. The vehicle area P41 is detected as follows.

（移動軌跡抽出部１１５）
図３に戻って説明を続ける。続いて、移動軌跡抽出部１１５は、車両検出部１１１によって検出された車両（すなわち、車両領域の位置）を追跡する（Ｓ１２）。ここでは、移動軌跡抽出部１１５が、同一車両が写る車両領域の中心点を追跡する場合を主に想定する。しかし、追跡される車両領域の位置は、車両領域の中心点以外の点（例えば、車両領域の左上隅など）であってもよい。 (Movement trajectory extraction unit 115)
Returning to FIG. 3, the explanation will be continued. Subsequently, the movement trajectory extraction unit 115 tracks the vehicle (that is, the position of the vehicle area) detected by the vehicle detection unit 111 (S12). Here, it is mainly assumed that the movement trajectory extraction unit 115 tracks the center point of a vehicle area in which the same vehicle is photographed. However, the position of the vehicle area to be tracked may be a point other than the center point of the vehicle area (eg, the upper left corner of the vehicle area, etc.).

移動軌跡抽出部１１５は、現在の画像フレームと現在の画像フレームよりも過去にカメラ２０から入力された画像フレーム（過去の画像フレーム）とのそれぞれから車両検出部１１１によって得られた複数の矩形領域それぞれから抽出された特徴量を算出する。また、移動軌跡抽出部１１５は、車両の特徴量同士の類似度が閾値よりも大きい場合に、その車両同士が同一車両であるとみなす。そして、移動軌跡抽出部１１５は、現在の画像フレームおよび過去の画像フレームそれぞれにおいて同一とみなした車両の位置同士を統合する（Ｓ１３）。例えば、移動軌跡抽出部１１５は、現在の画像フレームおよび過去の画像フレームそれぞれにおいて同一とみなした車両の位置同士を対応付ける。 The movement trajectory extraction unit 115 extracts a plurality of rectangular areas obtained by the vehicle detection unit 111 from each of the current image frame and an image frame input from the camera 20 in the past than the current image frame (past image frame). Calculate the feature amount extracted from each. Furthermore, when the degree of similarity between the feature amounts of the vehicles is greater than a threshold value, the movement trajectory extraction unit 115 considers the vehicles to be the same vehicle. Then, the movement trajectory extraction unit 115 integrates the vehicle positions that are considered to be the same in the current image frame and the past image frames (S13). For example, the movement trajectory extraction unit 115 associates vehicle positions that are considered to be the same in each of the current image frame and past image frames.

ここで、過去の画像フレームは、現在の画像フレームよりも１フレーム前の画像フレーム（以下、単に「１フレーム前の画像フレーム」とも言う。）であってよい。しかし、過去の画像フレームは、現在の画像フレームよりも２フレームまたは３フレーム以上前の画像フレームであってもよい。 Here, the past image frame may be an image frame that is one frame before the current image frame (hereinafter also simply referred to as "one frame previous image frame"). However, the past image frame may be an image frame that is two or more frames before the current image frame.

図５は、１フレーム前の画像フレームでの車両検出の例を示す図である。図５を参照すると、現在の画像フレームＧ２が示されている。例えば、車両検出部１１１は、１フレーム前の画像フレームから車両Ａ１が存在する領域として車両領域Ｐ１０を検出する。同様に、車両検出部１１１は、１フレーム前の画像フレームから、車両Ａ２が存在する領域として車両領域Ｐ２０を検出し、車両Ａ３が存在する領域として車両領域Ｐ３０を検出し、車両Ａ４が存在する領域として車両領域Ｐ４０を検出する。 FIG. 5 is a diagram showing an example of vehicle detection in an image frame one frame before. Referring to FIG. 5, a current image frame G2 is shown. For example, the vehicle detection unit 111 detects the vehicle region P10 as the region where the vehicle A1 exists from the image frame one frame before. Similarly, the vehicle detection unit 111 detects a vehicle area P20 as an area where vehicle A2 exists, a vehicle area P30 as an area where vehicle A3 exists, and a vehicle area P30 as an area where vehicle A4 exists, from the image frame one frame before. Vehicle area P40 is detected as the area.

また、図５には、現在の画像フレームよりも１フレーム前までに抽出された車両の軌跡の例が示されている。例えば、現在の画像フレームよりも１フレーム前までに抽出された車両Ａ１の軌跡の例として車両軌跡Ｔ１０が示されている。同様に、現在の画像フレームよりも１フレーム前までに抽出された車両Ａ２の軌跡の例として車両軌跡Ｔ２０が示されており、現在の画像フレームよりも１フレーム前までに抽出された車両Ａ３の軌跡の例として車両軌跡Ｔ３０が示されており、現在の画像フレームよりも１フレーム前までに抽出された車両Ａ４の軌跡の例として車両軌跡Ｔ４０が示されている。 Further, FIG. 5 shows an example of a vehicle trajectory extracted up to one frame before the current image frame. For example, a vehicle trajectory T10 is shown as an example of the trajectory of the vehicle A1 extracted up to one frame before the current image frame. Similarly, vehicle trajectory T20 is shown as an example of the trajectory of vehicle A2 extracted up to one frame before the current image frame, and vehicle trajectory T20 is shown as an example of the trajectory of vehicle A2 extracted up to one frame before the current image frame. A vehicle trajectory T30 is shown as an example of the trajectory, and a vehicle trajectory T40 is shown as an example of the trajectory of the vehicle A4 extracted up to one frame before the current image frame.

図６は、移動軌跡抽出部１１５による車両追跡の例を示す図である。図６には、現在の画像フレームＧ３が示されており、現在の画像フレームＧ３から検出された車両領域Ｐ１１～Ｐ４１の他、現在の画像フレームよりも１フレーム前の画像フレームから検出された車両領域Ｐ１０～Ｐ４０が示されている。また、現在の画像フレームよりも１フレーム前までに抽出された車両軌跡Ｔ１０～Ｔ４０が示されている。 FIG. 6 is a diagram showing an example of vehicle tracking by the movement trajectory extraction unit 115. In FIG. 6, a current image frame G3 is shown, and in addition to vehicle areas P11 to P41 detected from the current image frame G3, vehicles detected from an image frame one frame before the current image frame are shown. Areas P10 to P40 are shown. Also shown are vehicle trajectories T10 to T40 extracted up to one frame before the current image frame.

例えば、移動軌跡抽出部１１５は、現在の画像フレームから車両検出部１１１によって得られた車両Ａ１の特徴量と、１フレーム前の画像フレームから車両検出部１１１によって得られた車両Ａ１の特徴量とを比較する。移動軌跡抽出部１１５は、特徴量同士の類似度が閾値よりも大きい場合に、現在の画像フレームにおける車両の位置（車両領域Ｐ１１）と、１フレーム前の画像フレームにおける車両の位置（車両領域Ｐ１０）とを対応付ける。 For example, the movement trajectory extraction unit 115 extracts the feature amount of the vehicle A1 obtained by the vehicle detection unit 111 from the current image frame, and the feature amount of the vehicle A1 obtained by the vehicle detection unit 111 from the image frame one frame before. Compare. When the similarity between the feature amounts is greater than the threshold, the movement trajectory extraction unit 115 extracts the vehicle position in the current image frame (vehicle region P11) and the vehicle position in the previous image frame (vehicle region P10). ).

これによって、現在の画像フレームＧ４に示されるように、車両Ａ１の軌跡が、現在の画像フレームよりも１フレーム前までに抽出された車両軌跡Ｔ１０から、現在の画像フレームまでに抽出された車両軌跡Ｔ１１に更新される。 As a result, as shown in the current image frame G4, the trajectory of the vehicle A1 changes from the vehicle trajectory T10 extracted up to one frame before the current image frame to the vehicle trajectory extracted up to the current image frame. It is updated to T11.

なお、ここでは、現在の画像フレームおよび１フレーム前の画像フレームの双方から車両が検出される場合を主に想定する。しかし、現在の画像フレームおよび１フレーム前の画像フレームの少なくとも一方から車両の位置が検出されない場合も想定される。かかる場合であっても、車両の追跡および車両の位置同士の統合は継続されてよい（すなわち、２フレームまたは３フレーム以上離れた画像フレーム同士の対応付けが行われてもよい）。 Note that here, we mainly assume a case where a vehicle is detected from both the current image frame and the image frame one frame before. However, there may be cases where the position of the vehicle is not detected from at least one of the current image frame and the previous image frame. Even in such a case, tracking of the vehicle and integration of vehicle positions may be continued (that is, image frames separated by two or three frames or more may be correlated).

同様にして、車両Ａ２の軌跡は、現在の画像フレームよりも１フレーム前までに抽出された車両軌跡Ｔ２０から、現在の画像フレームまでに抽出された車両軌跡Ｔ２１に更新される。車両Ａ３の軌跡は、現在の画像フレームよりも１フレーム前までに抽出された車両軌跡Ｔ３０から、現在の画像フレームまでに抽出された車両軌跡Ｔ３１に更新される。車両Ａ４の軌跡は、現在の画像フレームよりも１フレーム前までに抽出された車両軌跡Ｔ４０から、現在の画像フレームまでに抽出された車両軌跡Ｔ４１に更新される。 Similarly, the trajectory of the vehicle A2 is updated from the vehicle trajectory T20 extracted up to one frame before the current image frame to the vehicle trajectory T21 extracted up to the current image frame. The trajectory of the vehicle A3 is updated from a vehicle trajectory T30 extracted up to one frame before the current image frame to a vehicle trajectory T31 extracted up to the current image frame. The trajectory of the vehicle A4 is updated from a vehicle trajectory T40 extracted up to one frame before the current image frame to a vehicle trajectory T41 extracted up to the current image frame.

（計測処理部１１６）
図３に戻って説明を続ける。続いて、計測処理部１１６は、移動軌跡抽出部１１５によって同一とみなされた車両の速度を計測する（Ｓ１４）。車両の速度は、どのようにして計測されてもよい。さらに、計測処理部１１６は、計測した速度に基づいて停止車両を検知する（Ｓ１５）。例えば、計測処理部１１６は、速度がゼロの車両を停止車両として検知する。続いて、計測処理部１１６は、既に説明したように、カメラ２０から入力される動画像に設定された計測線Ｌ０（図７）を跨いだ車両台数（すなわち、車両軌跡の数）を計測する（Ｓ１６）。例えば、計測処理部１１６は、計測線Ｌ０（図７）を跨いだ車両台数（すなわち、車両軌跡の数）を、車両軌跡が跨いだ方向別に計測する。 (Measurement processing unit 116)
Returning to FIG. 3, the explanation will be continued. Subsequently, the measurement processing unit 116 measures the speeds of the vehicles that are considered to be the same by the movement trajectory extraction unit 115 (S14). The speed of the vehicle may be measured in any manner. Further, the measurement processing unit 116 detects a stopped vehicle based on the measured speed (S15). For example, the measurement processing unit 116 detects a vehicle whose speed is zero as a stopped vehicle. Next, as described above, the measurement processing unit 116 measures the number of vehicles (that is, the number of vehicle trajectories) that straddle the measurement line L0 (FIG. 7) set in the video input from the camera 20. (S16). For example, the measurement processing unit 116 measures the number of vehicles (that is, the number of vehicle trajectories) that straddle the measurement line L0 (FIG. 7) for each direction that the vehicle trajectories straddle.

図７は、計測線を跨いだ車両台数の計測の例を示す図である。図７には、現在の画像フレームＧ５が示されており、現在の画像フレームまでに抽出された車両軌跡Ｔ１１～Ｔ４１が示されている。図７を参照すると、車両軌跡Ｔ１１、Ｔ２１、Ｔ４１は、計測線Ｌ０を跨いでいない。一方、車両軌跡Ｔ３１は、計測線Ｌ０を方向Ｄ１に跨いだところである。したがって、計測処理部１１６は、計測線Ｌ０を方向Ｄ１に跨いだ車両台数に１を加算させればよい。 FIG. 7 is a diagram showing an example of measuring the number of vehicles that straddle the measurement line. FIG. 7 shows the current image frame G5, and shows vehicle trajectories T11 to T41 extracted up to the current image frame. Referring to FIG. 7, vehicle trajectories T11, T21, and T41 do not straddle measurement line L0. On the other hand, the vehicle trajectory T31 straddles the measurement line L0 in the direction D1. Therefore, the measurement processing unit 116 may add 1 to the number of vehicles that straddle the measurement line L0 in the direction D1.

（改善可能な点）
以上、本発明の実施形態に係る移動物体抽出システム１の動作の例について説明してきたが、以上に説明した移動物体抽出システム１には改善可能な点がある。そこで、以下では、改善可能な点について説明する。 (Points that can be improved)
Although an example of the operation of the moving object extraction system 1 according to the embodiment of the present invention has been described above, there are points that can be improved in the moving object extraction system 1 described above. Therefore, points that can be improved will be explained below.

図８は、本発明の実施形態に係る移動物体抽出システム１における改善可能な点について説明するための図である。上記では、各画像フレームから車両を検知する処理について説明した。具体的には、各画像フレームから車両を検知する処理は、重処理量モデルＭ１または軽処理量モデルＭ２によって行われ得る。 FIG. 8 is a diagram for explaining possible improvements in the moving object extraction system 1 according to the embodiment of the present invention. The above describes the process of detecting a vehicle from each image frame. Specifically, the process of detecting a vehicle from each image frame may be performed by the heavy-throughput model M1 or the light-throughput model M2.

ここで、重処理量モデルＭ１は、（軽処理量モデルＭ２の演算量よりも）演算量が大きいため、ある程度よりも小さいサイズで画像フレームに写る物体（以下、「小物体」とも言う。）も高精度に検知可能である。しかし、重処理量モデルＭ１では、画像フレーム１つに対する処理に多くの時間が掛かるため、単位時間あたりに処理可能な画像フレームの数が比較的少なくなってしまう。したがって、重処理量モデルＭ１は、物体軌跡のフレームレートがあまり向上しないという特徴を有する。 Here, since the heavy processing amount model M1 has a larger amount of calculation (than the amount of calculation of the light processing amount model M2), an object (hereinafter also referred to as a "small object") that is smaller than a certain size in the image frame. can also be detected with high precision. However, in the heavy processing amount model M1, since it takes a long time to process one image frame, the number of image frames that can be processed per unit time is relatively small. Therefore, the heavy throughput model M1 has the characteristic that the frame rate of the object trajectory does not improve much.

一方、軽処理量モデルＭ２では、画像フレーム１つに対する処理に少ない時間しか掛からないため、単位時間あたりに処理可能な画像フレームの数は比較的多くなる。したがって、軽処理量モデルＭ２では、物体軌跡のフレームレートが高く維持され得る。しかし、軽処理量モデルＭ２は、（重処理量モデルＭ１の演算量よりも）演算量が小さいため、小物体の検知精度があまり向上しないという特徴を有する。以下では、ある程度よりも大きいサイズで画像フレームに写る物体を、「大物体」とも言う。 On the other hand, in the light processing amount model M2, since it takes only a short time to process one image frame, the number of image frames that can be processed per unit time is relatively large. Therefore, in the light processing amount model M2, the frame rate of the object trajectory can be maintained high. However, since the light processing amount model M2 has a smaller amount of calculations (than the amount of calculations of the heavy processing amount model M1), it has the characteristic that the detection accuracy of small objects does not improve much. In the following, an object that is larger than a certain size and appears in the image frame will also be referred to as a "large object."

以下では、実際に画像フレームから重処理量モデルＭ１および軽処理量モデルＭ２それぞれによって物体がどのように検知されるかを説明しながら、重処理量モデルＭ１および軽処理量モデルＭ２それぞれの特徴について、より具体的に説明する。 Below, while explaining how objects are actually detected from image frames by the heavy-throughput model M1 and the light-throughput model M2, we will explain the characteristics of the heavy-throughput model M1 and the light-throughput model M2. , will be explained more specifically.

図８は、重処理量モデルＭ１および軽処理量モデルＭ２それぞれの特徴について説明するための図である。図８を参照すると、図４に示された画像フレームＧ１と同様の画像フレームＧ１が示されている。 FIG. 8 is a diagram for explaining the characteristics of the heavy throughput model M1 and the light throughput model M2. Referring to FIG. 8, an image frame G1 similar to image frame G1 shown in FIG. 4 is shown.

ここで、（画像フレームＧ１を撮像する）カメラ２０との距離がある距離よりも小さい領域（近距離領域Ｒ１）に存在する物体（例えば、車両Ａ１～Ａ４）は、同一サイズの物体であっても画像フレームＧ１にはある程度よりも大きいサイズで写る。したがって、近距離領域Ｒ１に存在する物体は、精度が比較的低いモデルによっても検知され得る。さらに、近距離領域Ｒ１に写る物体は、カメラ２０との距離が比較的小さいため、画像フレームＧ１における単位時間あたりの移動量が大きくなる。したがって、近距離領域Ｒ１に存在する物体の追跡には、比較的高いフレームレートの動画像が必要となる。 Here, objects (for example, vehicles A1 to A4) that exist in an area (near-distance area R1) where the distance to the camera 20 (which captures the image frame G1) is smaller than a certain distance are objects of the same size. is also reflected in the image frame G1 at a size larger than a certain degree. Therefore, an object existing in the short distance region R1 can be detected even by a model with relatively low accuracy. Furthermore, since the distance from the camera 20 to the object captured in the short-distance region R1 is relatively small, the amount of movement per unit time in the image frame G1 becomes large. Therefore, moving images with a relatively high frame rate are required to track an object existing in the short distance region R1.

一方、カメラ２０との距離がある距離よりも大きい領域（遠距離領域Ｒ２）に存在する物体（例えば、車両Ａ５）は、同一サイズの物体であっても画像フレームＧ１にはある程度よりも小さいサイズで写る。したがって、遠距離領域Ｒ２に存在する物体は、比較的精度が高いモデルによって検知される必要がある。さらに、遠距離領域Ｒ２に写る物体は、カメラ２０との距離が比較的大きいため、画像フレームＧ１における単位時間あたりの移動量が小さくなる。したがって、遠距離領域Ｒ２に存在する物体は、比較的低いフレームレートの動画像からでも追跡され得る。 On the other hand, an object (for example, a vehicle A5) that exists in an area where the distance from the camera 20 is larger than a certain distance (long-distance area R2) has a size smaller than a certain level in the image frame G1 even if the object is the same size. Take a picture with Therefore, objects existing in the long-distance region R2 need to be detected by a relatively accurate model. Furthermore, since the object photographed in the long-distance region R2 has a relatively large distance from the camera 20, the amount of movement per unit time in the image frame G1 becomes small. Therefore, an object existing in the long-distance region R2 can be tracked even from a moving image with a relatively low frame rate.

（本実施形態による改善案）
以上を考慮すると、近距離領域Ｒ１に存在する物体は、精度が比較的低い軽処理量モデルＭ２によって検知されてよく、比較的多い実行頻度によって検知される（すなわち、比較的高いフレームレートの動画像から検知される）のが望ましいということになる。一方、遠距離領域Ｒ２に存在する物体は、精度が比較的高い重処理量モデルＭ１によって検知されるのが望ましく、比較的少ない実行頻度によって検知されてよい（すなわち、比較的低いフレームレートの動画像から検知されてよい）ということになる。 (Improvement proposal according to this embodiment)
Considering the above, an object existing in the short-distance region R1 may be detected by the light-throughput model M2 with relatively low accuracy, and is detected with a relatively high execution frequency (i.e., a video with a relatively high frame rate). This means that it is desirable to have the sensor detected from the image. On the other hand, objects existing in the long-distance region R2 are preferably detected by the high-accuracy heavy-throughput model M1, and may be detected by a relatively low execution frequency (i.e., a moving image with a relatively low frame rate). (can be detected from the image).

より詳細に、画像フレーム分配部１１２は、カメラ２０から時系列に沿って連続的に入力される複数の画像フレーム（動画像）を取得する。そして、画像フレーム分配部１１２は、軽処理量モデルＭ２による物体検知の実行頻度が重処理量モデルＭ１による物体検知の実行頻度以上になるように画像フレームごとに重処理量モデルＭ１または軽処理量モデルＭ２に動画像を分配する。さらに、移動軌跡抽出部１１５は、重処理量モデルＭ１による物体検知の結果と軽処理量モデルＭ２による物体検知の結果とに基づいて、動画像中の物体の軌跡を抽出する。 More specifically, the image frame distribution unit 112 acquires a plurality of image frames (moving images) that are continuously input from the camera 20 in chronological order. Then, the image frame distribution unit 112 distributes the heavy-throughput model M1 or the light-throughput model to each image frame so that the frequency of execution of object detection using the light-throughput model M2 is equal to or higher than the frequency of execution of object detection using the heavy-throughput model M1. Distribute moving images to model M2. Furthermore, the movement trajectory extraction unit 115 extracts the trajectory of the object in the moving image based on the result of object detection by the heavy-throughput model M1 and the result of object detection by the light-throughput model M2.

かかる構成によれば、大物体は、精度が比較的高い重処理量モデルＭ１によって検知されるため、移動物体の検知精度の低下が抑制され得る。さらに、かかる構成によれば、小物体は、比較的多い実行頻度によって軽処理量モデルＭ２によって検知されるため、移動物体の軌跡の追跡精度の低下が抑制され得る。 According to this configuration, a large object is detected by the heavy throughput model M1 with relatively high accuracy, so that a decrease in the detection accuracy of a moving object can be suppressed. Furthermore, according to this configuration, a small object is detected by the light-throughput model M2 with a relatively high execution frequency, so that a decrease in tracking accuracy of the trajectory of a moving object can be suppressed.

図９は、画像フレーム分配部１１２および移動軌跡抽出部１１５の具体的な機能について説明するための図である。図９には、カメラ２０から画像フレーム分配部１１２に入力される動画像（複数の画像フレーム）の例として、時刻の古いほうから順に、画像フレームＧ１１～Ｇ１６が示されている。 FIG. 9 is a diagram for explaining specific functions of the image frame distribution section 112 and the movement trajectory extraction section 115. FIG. 9 shows image frames G11 to G16 in order from the oldest to the oldest as an example of a moving image (a plurality of image frames) input from the camera 20 to the image frame distribution unit 112.

画像フレーム分配部１１２は、カメラ２０から１番目の画像フレームＧ１１が入力されると、１番目の画像フレームＧ１１を軽処理量モデルＭ２に出力することによって、軽処理量モデルＭ２に大物体の検出を実行させる。続いて、画像フレーム分配部１１２は、カメラ２０から２番目の画像フレームＧ１２が入力されると、２番目の画像フレームＧ１２を軽処理量モデルＭ２に出力することによって、軽処理量モデルＭ２に大物体の検出を実行させる。 When the first image frame G11 is input from the camera 20, the image frame distribution unit 112 outputs the first image frame G11 to the light-throughput model M2, thereby causing the light-throughput model M2 to detect a large object. Execute. Subsequently, when the second image frame G12 is input from the camera 20, the image frame distribution unit 112 outputs the second image frame G12 to the light-throughput model M2, so that the light-throughput model M2 receives the second image frame G12. Execute object detection.

このとき、移動軌跡抽出部１１５は、１番目の画像フレームＧ１１から得られた大物体の特徴量と２番目の画像フレームＧ１２から得られた大物体の特徴量との類似度と閾値とを比較する。移動軌跡抽出部１１５は、特徴量同士の類似度が閾値よりも大きい場合に、１番目の画像フレームＧ１１から得られた大物体の位置（例えば、車両領域の位置）と、２番目の画像フレームＧ１２から得られた大物体の位置（例えば、車両領域の位置）とを対応付ける。 At this time, the movement trajectory extraction unit 115 compares the similarity between the feature amount of the large object obtained from the first image frame G11 and the feature amount of the large object obtained from the second image frame G12 with a threshold value. do. When the degree of similarity between the feature amounts is greater than the threshold, the movement trajectory extraction unit 115 extracts the position of the large object obtained from the first image frame G11 (for example, the position of the vehicle area) and the position of the large object obtained from the second image frame G11. The position of the large object obtained from G12 (for example, the position of the vehicle area) is associated with the position of the large object obtained from G12.

続いて、画像フレーム分配部１１２は、カメラ２０から３番目の画像フレームＧ１３が入力されると、３番目の画像フレームＧ１３を重処理量モデルＭ１に出力することによって、重処理量モデルＭ１に小物体および大物体の検出を実行させる。 Subsequently, when the third image frame G13 is input from the camera 20, the image frame distribution unit 112 outputs the third image frame G13 to the heavy throughput model M1, thereby adding a small amount to the heavy throughput model M1. Perform object and large object detection.

このとき、移動軌跡抽出部１１５は、２番目の画像フレームＧ１２から得られた大物体の特徴量と３番目の画像フレームＧ１３から得られた大物体の特徴量との類似度と閾値とを比較する。移動軌跡抽出部１１５は、特徴量同士の類似度が閾値よりも大きい場合に、２番目の画像フレームＧ１２から得られた大物体の位置（例えば、車両領域の位置）と、３番目の画像フレームＧ１３から得られた大物体の位置（例えば、車両領域の位置）とを対応付ける。 At this time, the movement trajectory extraction unit 115 compares the similarity between the feature amount of the large object obtained from the second image frame G12 and the feature amount of the large object obtained from the third image frame G13 with the threshold value. do. When the degree of similarity between the feature amounts is greater than the threshold, the movement trajectory extraction unit 115 extracts the position of the large object obtained from the second image frame G12 (for example, the position of the vehicle area) and the position of the large object obtained from the third image frame G12. The position of the large object obtained from G13 (for example, the position of the vehicle area) is associated with the position of the large object obtained from G13.

４番目の画像フレームＧ１４、５番目の画像フレームＧ１５、および、６番目の画像フレームＧ１６がカメラ２０から入力された場合には、１番目の画像フレームＧ１１、２番目の画像フレームＧ１２、および、３番目の画像フレームＧ１３がカメラ２０から入力された場合に実行される処理と同様な処理が再度実行される。 When the fourth image frame G14, the fifth image frame G15, and the sixth image frame G16 are input from the camera 20, the first image frame G11, the second image frame G12, and the third image frame The same process as that executed when the th image frame G13 is input from the camera 20 is executed again.

ただし、移動軌跡抽出部１１５は、３番目の画像フレームＧ１３から得られた小物体の特徴量と６番目の画像フレームＧ１６から得られた小物体の特徴量との類似度と閾値とを比較する。そして、移動軌跡抽出部１１５は、特徴量同士の類似度が閾値よりも大きい場合に、図９に示されたように、３番目の画像フレームＧ１３から得られた小物体の位置（例えば、車両領域の位置）と、６番目の画像フレームＧ１６から得られた小物体の位置（例えば、車両領域の位置）とを対応付ける。 However, the movement trajectory extraction unit 115 compares the similarity between the feature amount of the small object obtained from the third image frame G13 and the feature amount of the small object obtained from the sixth image frame G16 with a threshold value. . Then, when the similarity between the feature amounts is greater than the threshold, the movement trajectory extraction unit 115 extracts the position of the small object (for example, the vehicle The position of the small object (for example, the position of the vehicle area) obtained from the sixth image frame G16 is associated with the position of the small object (for example, the position of the vehicle area).

このように、画像フレーム分配部１１２から重処理量モデルＭ１に入力された現在の画像フレーム（第１の画像フレーム）と過去の画像フレーム（第１の画像フレーム）とのそれぞれから重処理量モデルＭ１によって小物体（第２の物体）の特徴量が得られる。そして、移動軌跡抽出部１１５によって類似度同士の類似度が閾値よりも大きい場合に、当該現在の画像フレームおよび当該過去の画像フレームそれぞれにおける小物体（第２の物体）の位置同士を対応付ける処理が実行される。これらが繰り返し実行されることにより、小物体の軌跡が抽出される。
In this way, a heavy throughput model is generated from each of the current image frame (first image frame) and the past image frame (first image frame) input to the heavy throughput model M1 from the image frame distribution unit 112. The feature amount of the small object ( second object) is obtained by M1. Then, when the degree of similarity between the degrees of similarity is greater than a threshold value, the movement trajectory extraction unit 115 performs a process of associating the positions of the small object ( second object) in each of the current image frame and the past image frame. executed. By repeating these steps, the trajectory of the small object is extracted.

また、画像フレーム分配部１１２から軽処理量モデルＭ２に入力された現在の画像フレーム（第２の画像フレーム）と過去の画像フレーム（第２の画像フレーム）とのそれぞれから軽処理量モデルＭ２によって大物体（第１の物体）の特徴量が得られる。そして、移動軌跡抽出部１１５によって類似度同士の類似度が閾値よりも大きい場合に、当該現在の画像フレームおよび当該過去の画像フレームそれぞれにおける大物体（第１の物体）の位置同士を対応付ける処理が実行される。これらが繰り返し実行されることにより、大物体の軌跡が抽出される。
Further, the light processing amount model M2 is used from each of the current image frame (second image frame) and the past image frame (second image frame) inputted to the light processing amount model M2 from the image frame distribution unit 112. The feature amount of the large object ( first object) is obtained. Then, when the degree of similarity between the degrees of similarity is greater than a threshold value, the movement trajectory extraction unit 115 performs a process of associating the positions of the large object ( first object) in each of the current image frame and the past image frame. executed. By repeating these steps, the locus of the large object is extracted.

図９に示された例では、軽処理量モデルＭ２による物体検知の頻度が重処理量モデルＭ１による物体検知の頻度以上になるように構成された１つの繰り返し単位（所定の単位）が構成されている。そして、画像フレーム分配部１１２は、かかる繰り返し単位が繰り返し実行されるように動画像を分配している。より詳細に、図９に示された例では、１つの実行単位が、連続する二つの画像フレームから軽処理量モデルＭ２によって物体検知がされた後に、一つの画像フレームから重処理量モデルＭ１によって物体検知がなされるように構成されている。 In the example shown in FIG. 9, one repeating unit (predetermined unit) is configured such that the frequency of object detection by the light throughput model M2 is greater than or equal to the frequency of object detection by the heavy throughput model M1. ing. Then, the image frame distribution unit 112 distributes the moving image so that the repetition unit is repeatedly executed. More specifically, in the example shown in FIG. 9, one execution unit performs object detection from two consecutive image frames using the light-throughput model M2, and then performs object detection from one image frame using the heavy-throughput model M1. It is configured to perform object detection.

ここで、１つの繰り返し単位はどのように構成されてもよい。しかし、例えば、図９に示されたように、１つの繰り返し単位に重処理量モデルＭ１によって物体検知がなされる画像フレームが一つだけ含まれていれば、この繰り返し単位が繰り返し実行されることによって、重処理量モデルＭ１による物体検知も一定のフレーム間隔にて行われるようになる。すなわち、重処理量モデルＭ１による小物体の検知が一定のフレーム間隔にて行われるようになる。なお、大物体の検知は、重処理量モデルＭ１および軽処理量モデルＭ２の双方によって行われるため、繰り返し単位の構成によらず、一定のフレーム間隔にて行われ得る。 Here, one repeating unit may be configured in any manner. However, for example, as shown in FIG. 9, if one repeating unit includes only one image frame in which object detection is performed by the heavy-throughput model M1, this repeating unit will be repeatedly executed. Accordingly, object detection by the heavy-throughput model M1 is also performed at constant frame intervals. That is, detection of small objects by the heavy throughput model M1 is performed at regular frame intervals. Note that since detection of a large object is performed by both the heavy-throughput model M1 and the light-throughput model M2, it can be performed at constant frame intervals regardless of the configuration of the repeating unit.

（分散処理）
本発明の実施形態では、互いに特性の異なる複数の演算デバイスが並列化されており、当該複数の演算デバイスによる分散処理が行われる場合を想定する。これによって高いパフォーマンスの演算が行われ得る。 (Distributed processing)
In an embodiment of the present invention, a case is assumed in which a plurality of arithmetic devices having mutually different characteristics are parallelized and distributed processing is performed by the plurality of arithmetic devices. This allows high performance calculations to be performed.

ここで、重処理量モデルＭ１は、第１の演算デバイスにおいて実行され、軽処理量モデルＭ２は、第１の演算デバイスとは物理的に異なる第２の演算デバイスにおいて実行されるものとする。重処理量モデルＭ１が実行される演算デバイス（第１の演算デバイス）は、軽処理量モデルＭ２が実行される演算デバイス（第２の演算デバイス）よりも演算能力が高いのが望ましい。これによって、必要な演算量に合った演算デバイスによって各モデルが実行されるため、処理可能な動画像のフレームレートの低下が抑制され得る。なお、ここで言う“演算能力が高い”とは、必ずしも演算速度の速さのみで決定されるものではない。例えば、演算速度がデバイスＡ＞デバイスＢであっても、デバイスＡのメモリ搭載量が重処理モデルＭ１を動かすことができる量より少ない場合、重処理モデルＭ１は動作できないため、動作に十分なメモリ搭載量のデバイスＢの方を“演算能力が高い”と見做して、デバイスＢで動作させることもできる。 Here, it is assumed that the heavy-throughput model M1 is executed on a first computing device, and the light-throughput model M2 is executed on a second computing device that is physically different from the first computing device. It is desirable that the arithmetic device (first arithmetic device) on which the heavy throughput model M1 is executed has a higher arithmetic capacity than the arithmetic device (second arithmetic device) on which the light throughput model M2 is executed. As a result, each model is executed by a calculation device suitable for the required amount of calculation, so that a decrease in the frame rate of processable moving images can be suppressed. Note that "high computing power" as used herein does not necessarily mean that it is determined only by the speed of computing. For example, even if the calculation speed is device A > device B, if the amount of memory installed in device A is less than the amount that can run heavy processing model M1, heavy processing model M1 cannot operate, so there is insufficient memory for operation. It is also possible to operate with device B, considering that the loaded device B has "higher computing power".

以下では、演算デバイスの例としてプロセッサが用いられる場合を主に想定する。しかし、演算デバイスは、演算に用いられるデバイスであればよいため、演算デバイスの種類は特に限定されない。例えば、演算デバイスは、メモリなどであってもよい。以下では、分散処理の例について説明する。 In the following, it is mainly assumed that a processor is used as an example of a calculation device. However, the type of arithmetic device is not particularly limited, as the arithmetic device may be any device used for arithmetic operations. For example, the computing device may be a memory or the like. An example of distributed processing will be described below.

なお、以下では、複数のプロセッサが、１つ目のＣＰＵ（以下、「ＣＰＵ♯１」とも表記する。）、２つ目のＣＰＵ（以下、「ＣＰＵ♯２」とも表記する。）、および、ＡＩチップを含む場合について説明する。ＡＩチップは、ＣＰＵよりも演算能力が高い演算デバイスの例として用いられる。しかし、プロセッサの種類はこれらに限定されない。例えば、複数のプロセッサは、ＧＰＵまたはＦＰＧＡなどを含んでもよい。 Note that in the following description, the plurality of processors includes a first CPU (hereinafter also referred to as "CPU #1"), a second CPU (hereinafter also referred to as "CPU #2"), and A case in which an AI chip is included will be explained. An AI chip is used as an example of a computing device that has higher computing power than a CPU. However, the types of processors are not limited to these. For example, the multiple processors may include GPUs, FPGAs, and the like.

図１０は、分散処理の例について説明するための図である。図１０に示されるように、ＣＰＵ♯１、ＣＰＵ♯２およびＡＩチップそれぞれには、重処理量モデルＭ１を用いた推論要求（すなわち、重処理量モデルＭ１による物体検知の要求）、または、軽処理量モデルＭ２を用いた推論要求（すなわち、軽処理量モデルＭ２による物体検知の要求）が出力される。推論要求は、画像フレーム分配部１１２から出力され得る。 FIG. 10 is a diagram for explaining an example of distributed processing. As shown in FIG. 10, each of CPU #1, CPU #2, and AI chip has an inference request using the heavy throughput model M1 (that is, a request for object detection using the heavy throughput model M1), or a light An inference request using the throughput model M2 (that is, a request for object detection using the light throughput model M2) is output. The inference request may be output from the image frame distributor 112.

（軽処理量モデルＭ２を用いる）１つ目の推論要求ｒ１は、演算能力が比較的低いＣＰＵ♯１に出力される。続いて、（軽処理量モデルＭ２を用いる）２つ目の推論要求ｒ２は、演算能力が比較的低いＣＰＵ♯２に出力される。さらに、（重処理量モデルＭ１を用いる）３つ目の推論要求は、演算能力が比較的高いＡＩチップに出力される。このようにして推論要求が各プロセッサに分配され、各プロセッサによって推論要求に基づく推論が実行されると、１つの繰り返し単位（デバイス周回１回目）が終了する。 The first inference request r1 (using the light throughput model M2) is output to the CPU #1, which has relatively low computing power. Subsequently, the second inference request r2 (using the light throughput model M2) is output to the CPU #2, which has relatively low computing power. Furthermore, the third inference request (using the heavy throughput model M1) is output to an AI chip with relatively high computing power. When the inference request is distributed to each processor in this way and each processor executes inference based on the inference request, one repetition unit (the first round of the device) is completed.

推論要求ｒ４～ｒ６も同様に各プロセッサに分配され、各プロセッサによって推論要求に基づく推論が実行されると、繰り返し単位（デバイス周回２回目）が終了する。推論要求ｒ７～ｒ１２も同様に各プロセッサに分配され、推論要求ｒ１２に続く推論要求も同様に各プロセッサに分配され、各プロセッサによって推論要求に基づく推論が実行される。これによって、必要な演算量に合ったプロセッサによって重処理量モデルＭ１および軽処理量モデルＭ２が実行されるため、処理可能な動画像のフレームレートの低下が抑制され得る。 Inference requests r4 to r6 are similarly distributed to each processor, and when each processor executes inference based on the inference requests, the repetition unit (second device rotation) ends. Inference requests r7 to r12 are similarly distributed to each processor, inference requests following inference request r12 are similarly distributed to each processor, and inference based on the inference request is executed by each processor. As a result, the heavy-throughput model M1 and the light-throughput model M2 are executed by a processor suitable for the required amount of calculations, so that a decrease in the frame rate of processable moving images can be suppressed.

以上、本発明の実施形態に係る移動物体抽出システム１の動作の例について説明した。 An example of the operation of the moving object extraction system 1 according to the embodiment of the present invention has been described above.

（２．ハードウェア構成例）
続いて、本発明の実施形態に係る移動物体抽出装置１０のハードウェア構成例について説明する。以下では、本発明の実施形態に係る移動物体抽出装置１０のハードウェア構成例として、情報処理装置９００のハードウェア構成例について説明する。なお、以下に説明する情報処理装置９００のハードウェア構成例は、移動物体抽出装置１０のハードウェア構成の一例に過ぎない。したがって、移動物体抽出装置１０のハードウェア構成は、以下に説明する情報処理装置９００のハードウェア構成から不要な構成が削除されてもよいし、新たな構成が追加されてもよい。 (2. Hardware configuration example)
Next, an example of the hardware configuration of the moving object extraction device 10 according to the embodiment of the present invention will be described. An example of the hardware configuration of the information processing device 900 will be described below as an example of the hardware configuration of the moving object extraction device 10 according to the embodiment of the present invention. Note that the example hardware configuration of the information processing device 900 described below is only an example of the hardware configuration of the moving object extraction device 10. Therefore, in the hardware configuration of the moving object extraction device 10, unnecessary configurations may be deleted from the hardware configuration of the information processing device 900 described below, or new configurations may be added.

図１１は、本発明の実施形態に係る移動物体抽出装置１０の例としての情報処理装置９００のハードウェア構成を示す図である。情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）♯１（９０１－１）と、ＣＰＵ♯２（９０１－２）と、ＡＩチップ９０１－３と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３と、ホストバス９０４と、ブリッジ９０５と、外部バス９０６と、インタフェース９０７と、入力装置９０８と、出力装置９０９と、ストレージ装置９１０と、通信装置９１１と、を備える。 FIG. 11 is a diagram showing a hardware configuration of an information processing device 900 as an example of the moving object extraction device 10 according to the embodiment of the present invention. The information processing device 900 includes a CPU (Central Processing Unit) #1 (901-1), a CPU #2 (901-2), an AI chip 901-3, a ROM (Read Only Memory) 902, and a RAM (Random). Access Memory) 903, a host bus 904, a bridge 905, an external bus 906, an interface 907, an input device 908, an output device 909, a storage device 910, and a communication device 911.

ＣＰＵ♯１（９０１－１）、ＣＰＵ♯２（９０１－２）、ＡＩチップ（９０１－３）は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ♯１（９０１－１）、ＣＰＵ♯２（９０１－２）、ＡＩチップ（９０１－３）は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバス等から構成されるホストバス９０４により相互に接続されている。 CPU #1 (901-1), CPU #2 (901-2), and AI chip (901-3) function as an arithmetic processing unit and a control device, and control overall operations within the information processing device 900 according to various programs. do. Further, CPU #1 (901-1), CPU #2 (901-2), and AI chip (901-3) may be microprocessors. The ROM 902 stores programs used by the CPU 901, calculation parameters, and the like. The RAM 903 temporarily stores programs used in the execution of the CPU 901 and parameters that change as appropriate during the execution. These are interconnected by a host bus 904 composed of a CPU bus and the like.

ホストバス９０４は、ブリッジ９０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス９０６に接続されている。なお、必ずしもホストバス９０４、ブリッジ９０５および外部バス９０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 904 is connected via a bridge 905 to an external bus 906 such as a PCI (Peripheral Component Interconnect/Interface) bus. Note that the host bus 904, bridge 905, and external bus 906 do not necessarily need to be configured separately, and these functions may be implemented in one bus.

入力装置９０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチおよびレバー等ユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路等から構成されている。情報処理装置９００を操作するユーザは、この入力装置９０８を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 908 includes input means for the user to input information, such as a mouse, keyboard, touch panel, button, microphone, switch, and lever, and an input control circuit that generates an input signal based on the user's input and outputs it to the CPU 901. It is composed of etc. By operating the input device 908, a user operating the information processing device 900 can input various data to the information processing device 900 and instruct processing operations.

出力装置９０９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ（ＬＣＤ）装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置、ランプ等の表示装置およびスピーカ等の音声出力装置を含む。 The output device 909 includes, for example, a CRT (Cathode Ray Tube) display device, a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Diode) device, a display device such as a lamp, and an audio output device such as a speaker.

ストレージ装置９１０は、データ格納用の装置である。ストレージ装置９１０は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置等を含んでもよい。ストレージ装置９１０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）で構成される。このストレージ装置９１０は、ハードディスクを駆動し、ＣＰＵ９０１が実行するプログラムや各種データを格納する。 The storage device 910 is a device for storing data. The storage device 910 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 910 is configured with, for example, an HDD (Hard Disk Drive). This storage device 910 drives a hard disk and stores programs executed by the CPU 901 and various data.

通信装置９１１は、例えば、ネットワークに接続するための通信デバイス等で構成された通信インタフェースである。また、通信装置９１１は、無線通信または有線通信のどちらに対応してもよい。 The communication device 911 is, for example, a communication interface configured with a communication device for connecting to a network. Further, the communication device 911 may support either wireless communication or wired communication.

以上、本発明の実施形態に係る移動物体抽出装置１０のハードウェア構成例について説明した。 The example hardware configuration of the moving object extraction device 10 according to the embodiment of the present invention has been described above.

（３．まとめ）
以上に説明したように、本発明の実施形態によれば、第１の演算量によって画像フレーム中の物体検知を行う重処理量モデルＭ１と、第１の演算量よりも小さい第２の演算量によって画像フレーム中の物体検知を行う軽処理量モデルＭ２と、を備える、移動物体抽出装置１０が提供される。さらに、移動物体抽出装置１０は、画像フレーム分配部１１２および移動軌跡抽出部１１５を備える。 (3. Summary)
As described above, according to the embodiment of the present invention, there is a heavy processing amount model M1 that detects an object in an image frame using a first amount of calculation, and a second amount of calculation that is smaller than the first amount of calculation. A moving object extraction device 10 is provided, which includes a light processing amount model M2 that detects an object in an image frame. Furthermore, the moving object extraction device 10 includes an image frame distribution section 112 and a movement trajectory extraction section 115.

画像フレーム分配部１１２は、複数の画像フレームから構成された動画像を取得し、軽処理量モデルＭ２による物体検知の実行頻度が重処理量モデルＭ１による物体検知の実行頻度よりも多くなるように画像フレームごとに重処理量モデルＭ１または軽処理量モデルＭ２に動画像を分配する。そして、移動軌跡抽出部１１５は、重処理量モデルＭ１による物体検知の結果と軽処理量モデルＭ２による物体検知の結果とに基づいて、動画像中の物体の軌跡を抽出する。 The image frame distribution unit 112 acquires a moving image composed of a plurality of image frames, and sets the object detection frequency using the light throughput model M2 to be higher than the frequency of object detection using the heavy throughput model M1. The moving image is distributed to the heavy-throughput model M1 or the light-throughput model M2 for each image frame. Then, the movement trajectory extraction unit 115 extracts the trajectory of the object in the moving image based on the result of object detection by the heavy-throughput model M1 and the result of object detection by the light-throughput model M2.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 Although preferred embodiments of the present invention have been described above in detail with reference to the accompanying drawings, the present invention is not limited to such examples. It is clear that a person with ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea stated in the claims. It is understood that these also naturally fall within the technical scope of the present invention.

例えば、上記では、移動物体抽出装置１０とカメラ２０とが別体として構成されている場合を主に説明した。しかし、移動物体抽出装置１０とカメラ２０とは、一体化されて構成されていてもよい。 For example, above, the case where the moving object extraction device 10 and the camera 20 are configured as separate bodies has been mainly described. However, the moving object extraction device 10 and the camera 20 may be configured to be integrated.

１移動物体抽出システム
１０移動物体抽出装置
１１０制御部
１１１車両検出部
１１２画像フレーム分配部
１１５移動軌跡抽出部
１１６計測処理部
１３０記憶部
２０カメラ
Ｌ０計測線
Ｍ１重処理量モデル（第１のモデル）
Ｍ２軽処理量モデル（第２のモデル）
1 Moving object extraction system 10 Moving object extraction device 110 Control unit 111 Vehicle detection unit 112 Image frame distribution unit 115 Movement trajectory extraction unit 116 Measurement processing unit 130 Storage unit 20 Camera L0 Measurement line M1 Heavy throughput model (first model)
M2 light throughput model (second model)

Claims

a first model that detects a first object in an image frame and a second object that is smaller than the first object using a first amount of calculation;
a second model that detects the first object in an image frame with a second amount of calculation that is smaller than the first amount of calculation;
A moving image composed of a plurality of image frames is acquired, and the first model is detected for each image frame so that the frequency of execution of object detection by the second model is higher than the frequency of execution of object detection by the first model. an image frame distribution unit that distributes the moving image to the model or the second model;
a first movement trajectory extraction unit that extracts a trajectory of the first object in the video image based on a result of object detection by the first model and a result of object detection by the second model;
a second movement trajectory extraction unit that extracts a trajectory of the second object in the video image based on a result of object detection by the first model ;
A moving object extraction device comprising:

The image frame distribution unit is configured to distribute the moving image so that a predetermined unit configured such that the frequency of object detection by the second model is equal to or higher than the frequency of object detection by the first model is repeatedly executed. distribute,
The moving object extraction device according to claim 1.

the first model is executed on a first computing device;
the second model is executed on a second computing device that is physically different from the first computing device;
the first computing device has higher computing power than the second computing device;
The moving object extraction device according to claim 1 or 2.

The movement trajectory extraction unit is configured to extract information obtained by the first model from each of a current first image frame and a past first image frame that are input to the first model from the image frame distribution unit. a process of associating the positions of the second object in each of the current first image frame and the past first image frame when the degree of similarity between the feature amounts of the second object is greater than a threshold; Extracting the trajectory of the second object by repeating
The moving object extraction device according to any one of claims 1 to 3.

The movement trajectory extraction unit includes:
The second image frame is extracted from each of the second image frame input to the second model from the image frame distribution unit and a past image frame that is one or more frames before the second image frame. When the degree of similarity between the feature amounts of the first object obtained by the model is greater than a threshold, the positions of the first object in each of the second image frame and the past image frame are The process of associating
The feature amount of the first object obtained by the first model from the current first image frame input to the first model from the image frame distribution unit and the feature amount of the first object from the second image frame of the first object in each of the current first image frame and the second image frame when the similarity with the feature amount of the first object obtained by the model No. 2 is larger than a threshold value. The process of associating positions with each other,
Extracting the trajectory of the first object by repeating
The moving object extraction device according to any one of claims 1 to 4.

The moving object extraction device includes a measurement processing unit that measures the number of trajectories of the first object and the second object that straddle a measurement line set in the moving image.
The moving object extraction device according to any one of claims 1 to 5.

The measurement processing unit measures the number of trajectories that straddle the measurement line in each direction in which the trajectories straddle the measurement line.
The moving object extraction device according to claim 6.

Obtaining a moving image composed of multiple image frames;
a first model that detects a first object in an image frame and a second object that is smaller than the first object using a first amount of calculation; and a first model that is smaller than the first amount of calculation. a second model that detects the first object in an image frame using a second amount of calculation; distributing the moving image to the first model or the second model for each image frame so that
extracting a trajectory of the first object in the moving image based on a result of object detection by the first model and a result of object detection by the second model;
extracting a trajectory of the second object in the video image based on a result of object detection by the first model;
A moving object extraction method, including:

computer,
a first model that detects a first object in an image frame and a second object that is smaller than the first object using a first amount of calculation;
a second model that detects the first object in an image frame with a second amount of calculation that is smaller than the first amount of calculation;
A moving image composed of a plurality of image frames is acquired, and the first model is detected for each image frame so that the frequency of execution of object detection by the second model is higher than the frequency of execution of object detection by the first model. an image frame distribution unit that distributes the moving image to the model or the second model;
a first movement trajectory extraction unit that extracts a trajectory of the first object in the video image based on a result of object detection by the first model and a result of object detection by the second model;
a second movement trajectory extraction unit that extracts a trajectory of the second object in the video image based on a result of object detection by the first model ;
A program for functioning as a moving object extraction device.