JP6829412B1

JP6829412B1 - Image processing equipment, image processing system, image processing method, and image processing program

Info

Publication number: JP6829412B1
Application number: JP2020539113A
Authority: JP
Inventors: 守屋　芳美; 芳美守屋; 直大澁谷
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2019-11-11
Filing date: 2019-11-11
Publication date: 2021-02-10
Anticipated expiration: 2039-11-11
Also published as: JPWO2021095085A1; WO2021095085A1

Abstract

複数の画像データに現れるオブジェクトが同一であるか否かを判定する際に、実際の大きさが異なるオブジェクトを同一のオブジェクトであると誤判定してしまう可能性を低減する画像処理装置を得る。画像処理装置において、第一の画像データに現れる第一のオブジェクトの視覚的特徴量である第一の視覚的特徴量と、第二の画像データに現れる第二のオブジェクトの視覚的特徴量である第二の視覚的特徴量とを取得する視覚的特徴量取得部と、第一のオブジェクトの物理的特徴量である第一の物理的特徴量と、第二のオブジェクトの物理的特徴量である第二の物理的特徴量とを取得する物理的特徴量取得部と、学習済みの機械学習モデルを用いて、第一の視覚的特徴量と、第一の物理的特徴量と、第二の視覚的特徴量と、第二の物理的特徴量とから、第一のオブジェクトと第二のオブジェクトが同一のオブジェクトであるか否かを判定する判定部と、を備えた。Obtain an image processing device that reduces the possibility of erroneously determining objects having different actual sizes as the same object when determining whether or not the objects appearing in a plurality of image data are the same. In the image processing device, the first visual feature amount which is the visual feature amount of the first object appearing in the first image data and the visual feature amount of the second object appearing in the second image data. The visual feature acquisition unit that acquires the second visual feature, the first physical feature that is the physical feature of the first object, and the physical feature of the second object. Using the physical feature acquisition unit that acquires the second physical feature and the trained machine learning model, the first visual feature, the first physical feature, and the second It is provided with a determination unit for determining whether or not the first object and the second object are the same object from the visual feature amount and the second physical feature amount.

Description

本発明は、画像処理装置、画像処理システム、画像処理方法、及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing system, an image processing method, and an image processing program.

複数のカメラにまたがって撮影されたオブジェクトが、同一のオブジェクトであるか否かを推定する技術が提案されつつある。
例えば、非特許文献１では、人物画像からニューラルネットワークを用いて特徴抽出を行い、ニューラルネットワークが生成した特徴量ベクトルを使って、人物画像のペアが同一人物であるか否かを推定する技術について記載されている。A technique for estimating whether or not an object photographed across a plurality of cameras is the same object is being proposed.
For example, Non-Patent Document 1 describes a technique of extracting features from a person image using a neural network and estimating whether or not a pair of person images is the same person using a feature amount vector generated by the neural network. Have been described.

Ｅ．Ａｈｍｅｄ，Ｍ．Ｊｏｎｅｓ，Ｔ．Ｋ．Ｍａｒｋｓ， “Ａｎｉｍｐｒｏｖｅｄｄｅｅｐｌｅａｒｎｉｎｇａｒｃｈｉｔｅｃｔｕｒｅｆｏｒｐｅｒｓｏｎｒｅ−ｉｄｅｎｔｉｆｉｃａｔｉｏｎ，” ＩｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ），２０１５．E. Ahmed, M.D. Jones, T.M. K. Marks, “An impromved deep learning architecture for person re-identification,” In Computer Vision and Pattern Recognition (CVPR), 2015.

従来の技術では、画像からオブジェクトの視覚的な特徴を抽出し、オブジェクトの大きさは考慮していない特徴量ベクトルが、オブジェクトの比較に使われていた。そのため、大きさが異なる人物でも、服の色などが同じで見かけが似ている場合には同一人物と判定されてしまう可能性があるという問題があった。 In the conventional technique, a feature vector that extracts the visual features of an object from an image and does not consider the size of the object is used for comparing the objects. Therefore, there is a problem that even people of different sizes may be determined to be the same person if the clothes have the same color and the appearance is similar.

本発明は、上記のような課題を解決するためになされたものであり、複数の画像に写ったオブジェクトが同一であるか否かを判定する際に、実際の大きさが異なるオブジェクトを同一のオブジェクトであると誤判定してしまう可能性を低減することを目的とする。 The present invention has been made to solve the above problems, and when determining whether or not the objects shown in a plurality of images are the same, the objects having different actual sizes are the same. The purpose is to reduce the possibility of erroneously determining that the object is an object.

本発明に係る画像処理装置は、第一の画像データに現れる第一のオブジェクトの視覚的特徴量である第一の視覚的特徴量と、第二の画像データに現れる第二のオブジェクトの視覚的特徴量である第二の視覚的特徴量とを取得する視覚的特徴量取得部と、第一のオブジェクトの物理的特徴量である第一の物理的特徴量と、第二のオブジェクトの物理的特徴量である第二の物理的特徴量とを取得する物理的特徴量取得部と、学習済みの機械学習モデルを用いて、第一の視覚的特徴量と、第一の物理的特徴量と、第二の視覚的特徴量と、第二の物理的特徴量とから、第一のオブジェクトと第二のオブジェクトが同一のオブジェクトであるか否かを判定する判定部と、を備え、判定部は、学習済みの機械学習モデルの入力として、第一の視覚的特徴量と第一の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第一のオブジェクトの特徴量ベクトルである第一の特徴量ベクトルを取得し、かつ、学習済みの機械学習モデルの入力として、第二の視覚的特徴量と第二の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第二のオブジェクトの特徴量ベクトルである第二の特徴量ベクトルを取得する特徴量ベクトル取得部と、第一の特徴量ベクトルと第二の特徴量ベクトルとの類似度を算出する類似度算出部と、類似度算出部が算出した類似度に基づいて、第一の画像データに現れる第一のオブジェクトと、第二の画像データに現れる第二のオブジェクトが同一のオブジェクトであるか否かを判定する類似度判定部と、を備えた。
The image processing apparatus according to the present invention has a first visual feature amount which is a visual feature amount of the first object appearing in the first image data and a visual feature of the second object appearing in the second image data. A visual feature acquisition unit that acquires a second visual feature that is a feature, a first physical feature that is a physical feature of the first object, and a physical feature of the second object. Using the physical feature acquisition unit that acquires the second physical feature, which is the feature, and the trained machine learning model, the first visual feature and the first physical feature comprising a second visual feature quantity, and a second physical feature quantity, a determination section first object and the second object is whether the same object, a determination unit Inputs the first visual feature and the first physical feature as the input of the trained machine learning model, and the feature vector of the first object as the output of the trained machine learning model. The first feature quantity vector is acquired, and the second visual feature quantity and the second physical feature quantity are input as the inputs of the trained machine learning model, and the trained machine learning model is input. As the output of, the feature quantity vector acquisition unit that acquires the second feature quantity vector, which is the feature quantity vector of the second object, and the similarity between the first feature quantity vector and the second feature quantity vector are calculated. Whether the first object appearing in the first image data and the second object appearing in the second image data are the same object based on the similarity calculation unit and the similarity calculated by the similarity calculation unit. It is provided with a similarity determination unit for determining whether or not it is .

本発明に係る画像処理装置によれば、学習済みの機械学習モデルを用いて、第一の視覚的特徴量と、第一の物理的特徴量と、第二の視覚的特徴量と、第二の物理的特徴量から、第一のオブジェクトと第二のオブジェクトが同一のオブジェクトであるか否かを判定する判定部を備えたので、視覚的特徴量だけでなく、オブジェクトの物理的特徴量を用いることにより、複数の画像データに現れるオブジェクトが同一であるか否かを判定する際に、実際の大きさが異なるオブジェクトを同一のオブジェクトであると誤判定してしまう可能性を低減することができる。 According to the image processing apparatus according to the present invention, the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second visual feature amount are used by using the trained machine learning model. Since it is equipped with a judgment unit that determines whether the first object and the second object are the same object from the physical features of the object, not only the visual features but also the physical features of the object can be determined. By using it, it is possible to reduce the possibility that objects having different actual sizes are erroneously determined to be the same object when determining whether or not the objects appearing in a plurality of image data are the same. it can.

実施の形態１における画像処理装置１００、及び画像処理システム１０００の構成を示す構成図である。It is a block diagram which shows the structure of the image processing apparatus 100, and the image processing system 1000 in Embodiment 1. FIG. 実施の形態１におけるオブジェクト検出部３１の処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the processing of the object detection part 31 in Embodiment 1. FIG. 実施の形態１におけるオブジェクト検出部３１の処理、及びオブジェクト追跡部３４の処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the processing of the object detection unit 31 and the processing of the object tracking unit 34 in the first embodiment. 実施の形態１における視覚的特徴量抽出部３２が視覚的特徴量を抽出する処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the process which the visual feature amount extraction unit 32 of Embodiment 1 extracts a visual feature amount. 実施の形態１における特徴量ベクトル取得部８４１が特徴量ベクトルを取得する処理の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the process which the feature quantity vector acquisition unit 841 of Embodiment 1 acquires a feature quantity vector. 実施の形態１における画像処理装置１００を実現するコンピュータのハードウェア構成の例を示す構成図である。It is a block diagram which shows the example of the hardware composition of the computer which realizes the image processing apparatus 100 in Embodiment 1. FIG. 実施の形態１における画像処理装置１００の画像記憶処理を示すフローチャートである。It is a flowchart which shows the image storage process of the image processing apparatus 100 in Embodiment 1. 実施の形態１における画像処理装置１００の画像照合処理の動作を示すフローチャートである。It is a flowchart which shows the operation of the image collation processing of the image processing apparatus 100 in Embodiment 1. FIG.

実施の形態１．
図１は、実施の形態１における画像処理装置１００、及び画像処理システム１０００の構成を示す構成図である。
図１に示すように、画像処理システム１０００は、ｎ台（ｎは１以上の整数）のネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎと、これらネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎの各々から配信された静止画像データまたは動画像ストリームを、通信ネットワークＮＷを介して受信する画像処理装置１００とで構成される。画像処理装置１００は、ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎから受信した静止画像データまたは動画像データ（以下、総じて画像データと記載する）に対して画像解析を行う。画像処理装置１００は、画像解析の結果を示す空間的、地理的または時間的記述子を、画像と関連付けて蓄積する。
ここで、空間的記述子とは画像内のオブジェクトの位置やサイズ等を示すものであり、地理的記述子とは画像を撮像したネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎの位置等を示すものであり、時間的記述子とは画像の撮像時刻等を示すものである。Embodiment 1.
FIG. 1 is a configuration diagram showing a configuration of an image processing device 100 and an image processing system 1000 according to the first embodiment.
As shown in FIG. 1, the image processing system 1000 is distributed from n network cameras NC1, NC2, ..., NCn and each of these network cameras NC1, NC2, ..., NCn. It is composed of an image processing device 100 that receives a still image data or a moving image stream via a communication network NW. The image processing device 100 performs image analysis on still image data or moving image data (hereinafter, generally referred to as image data) received from network cameras NC1, NC2, ..., NCn. The image processing apparatus 100 stores a spatial, geographical, or temporal descriptor indicating the result of image analysis in association with the image.
Here, the spatial descriptor indicates the position and size of an object in the image, and the geographical descriptor indicates the position and the like of the network cameras NC1, NC2, ..., NCn that captured the image. Yes, the temporal descriptor indicates the time when the image was captured.

通信ネットワークＮＷとしては、例えば、有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮなどの構内通信網、拠点間を結ぶ専用回線網、またはインターネットなどの広域通信網が挙げられる。 Examples of the communication network NW include a wired LAN (Local Area Network), a premises communication network such as a wireless LAN, a dedicated line network connecting bases, and a wide area communication network such as the Internet.

ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎは全て同一の構成を有している。各ネットワークカメラＮＣは、被写体を撮像する撮像部（図示せず）と、撮像部の出力を通信ネットワークＮＷ上の画像処理装置１００に向けて送信する送信部（図示せず）とで構成される。撮像部は、被写体の光学像を形成する撮像光学系と、形成された光学像を電気信号に変換する固体撮像素子と、変換された電気信号を静止画像データまたは動画像データとして圧縮符号化するエンコーダ回路とを有している。固体撮像素子としては、例えば、ＣＣＤ（Ｃｈａｒｇｅ−ＣｏｕｐｌｅｄＤｅｖｉｃｅ）またはＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ−ｏｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）素子を使用すればよい。 The network cameras NC1, NC2, ..., NCn all have the same configuration. Each network camera NC is composed of an imaging unit (not shown) that images a subject and a transmitting unit (not shown) that transmits the output of the imaging unit to the image processing device 100 on the communication network NW. .. The image pickup unit compresses and encodes the image pickup optical system that forms the optical image of the subject, the solid-state image sensor that converts the formed optical image into an electric signal, and the converted electric signal as still image data or moving image data. It has an encoder circuit. As the solid-state image sensor, for example, a CCD (Charge-Coupled Device) or CMOS (Complementary Metal-axis Semiconductor) element may be used.

ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎの各々は、固体撮像素子の出力を動画像データとして圧縮符号化する場合には、例えば、ＭＰＥＧ−２ＴＳ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ２ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ）、ＲＴＰ／ＲＴＳＰ（Ｒｅａｌ−ｔｉｍｅＴｒａｎｓｐｏｒｔＰｒｏｔｏｃｏｌ／ＲｅａｌＴｉｍｅＳｔｒｅａｍｉｎｇＰｒｏｔｏｃｏｌ）、ＭＭＴ（ＭＰＥＧＭｅｄｉａＴｒａｎｓｐｏｒｔ）またはＤＡＳＨ（ＤｙｎａｍｉｃＡｄａｐｔｉｖｅＳｔｒｅａｍｉｎｇｏｖｅｒＨＴＴＰ）のストリーミング方式に従い、圧縮符号化された動画像ストリームを生成する。なお、実施の形態１で使用されるストリーミング方式は、ＭＰＥＧ−２ＴＳ、ＲＴＰ／ＲＴＳＰ、ＭＭＴまたはＤＡＳＨに限定されるものではない。ただし、いずれのストリーミング方式でも、動画像ストリームに含まれる動画像データを画像処理装置１００で一意に分離できる識別子情報が、当該動画像ストリーム内
に多重化されているものとする。When each of the network cameras NC1, NC2, ..., NCn compresses and encodes the output of the solid-state imaging element as moving image data, for example, MPEG-2 TS (Moving Picture Experts Group 2 Transport Stream), RTP / RTSP ( Real-time Transport Protocol / Real Time Streaming Protocol, MMT (MPEG Media Transport) or DASH (Dynamic Adaptive Streaming Over HTTP) is streamed according to a streaming method and compressed. The streaming method used in the first embodiment is not limited to MPEG-2 TS, RTP / RTSP, MMT or DASH. However, in any of the streaming methods, it is assumed that the identifier information that can uniquely separate the moving image data included in the moving image stream by the image processing device 100 is multiplexed in the moving image stream.

実施の形態１において、ネットワークカメラＮＣ１は第一のオブジェクトを撮像し、第一のオブジェクトが現れる第一の画像データを出力する。また、ネットワークカメラＮＣ２は第二のオブジェクトを撮像し、第二のオブジェクトが現れる第二の画像データを出力する。ここで、ネットワークカメラＮＣ１が第一のネットワークカメラであり、ネットワークカメラＮＣ２が第二のネットワークカメラである。また、第一のオブジェクトと第二のオブジェクトは、同一のオブジェクトである場合と、異なるオブジェクトである場合の両方を含む。 In the first embodiment, the network camera NC1 images the first object and outputs the first image data in which the first object appears. Further, the network camera NC2 captures the second object and outputs the second image data in which the second object appears. Here, the network camera NC1 is the first network camera, and the network camera NC2 is the second network camera. In addition, the first object and the second object include both the case where they are the same object and the case where they are different objects.

画像処理装置１００は、受信部１、復号部２、画像認識部３、記述子生成部４、データ記録制御部５、記憶部６、インターフェース部７、及び画像照合部８を備える。 The image processing device 100 includes a receiving unit 1, a decoding unit 2, an image recognition unit 3, a descriptor generation unit 4, a data recording control unit 5, a storage unit 6, an interface unit 7, and an image collation unit 8.

受信部１は、ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎから配信データを受信し、受信した配信データから画像データを分離するものである。ここで、配信データには、画像データの他、音声データやメタデータ等が含まれ、画像データには、静止画像データまたは動画像ストリームが含まれる。受信部１は、分離した画像データを復号部２に出力する。 The receiving unit 1 receives distribution data from network cameras NC1, NC2, ..., NCn, and separates image data from the received distribution data. Here, the distribution data includes audio data, metadata, and the like in addition to image data, and the image data includes still image data or moving image stream. The receiving unit 1 outputs the separated image data to the decoding unit 2.

復号部２は、受信部１から入力された、圧縮符号化された画像データを、ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎで使用された圧縮符号化方式に従って復号するものである。復号部２は、復号した画像データを画像認識部３に出力する。また、入力された画像データが圧縮符号化されていない場合には、復号部２は省略可能である。 The decoding unit 2 decodes the compression-encoded image data input from the reception unit 1 according to the compression-encoding method used in the network cameras NC1, NC2, ..., NCn. The decoding unit 2 outputs the decoded image data to the image recognition unit 3. Further, when the input image data is not compressed and encoded, the decoding unit 2 can be omitted.

画像認識部３は、復号部２から入力された画像データに対して画像認識処理を行う。画像認識部３は、オブジェクト検出部３１、視覚的特徴量抽出部３２、物理的特徴量推定部３３、及びオブジェクト追跡部３４を備える。 The image recognition unit 3 performs image recognition processing on the image data input from the decoding unit 2. The image recognition unit 3 includes an object detection unit 31, a visual feature amount extraction unit 32, a physical feature amount estimation unit 33, and an object tracking unit 34.

オブジェクト検出部３１は、復号部２から入力された画像データを解析して、当該画像データに現れるオブジェクトを検出するものである。オブジェクトの検出には、例えば、線形分類器やＲ−ＣＮＮ（ｒｅｇｉｏｎｓｗｉｔｈＣＮＮｆｅａｔｕｒｅｓ）を用いることができる。また、オブジェクト検出部３１は、画像データが示す画像内においてオブジェクトを検出した領域を示すデータを検出領域データとして出力する。ここで、オブジェクトを検出する領域は、図２で示されるように、画像の一部においてオブジェクトを囲むように所定のサイズで設定される。図２は、オブジェクト検出部３１の処理の具体例を示す説明図である。また、検出領域データは、元の画像データの一部であって、画像データの一種とする。
実施の形態１において、オブジェクト検出部３１は、第一のオブジェクトが現れる第一の画像データから第一のオブジェクトを検出し、第一の画像データが示す画像内において第一のオブジェクトを検出した領域を第一の検出領域データとして出力するとともに、第二のオブジェクトが現れる第二の画像データから第二のオブジェクトを検出し、第二の画像データが示す画像内において第二のオブジェクトを検出した領域を第二の検出領域データとして出力する。オブジェクト検出部３１は、第一の画像データに対する処理と、第二の画像データに対する処理のどちらを先に行っても良いし、あるいは同時に行っても良い。The object detection unit 31 analyzes the image data input from the decoding unit 2 and detects an object appearing in the image data. For the detection of the object, for example, a linear classifier or R-CNN (regions with CNN features) can be used. Further, the object detection unit 31 outputs data indicating an area in which the object is detected in the image indicated by the image data as detection area data. Here, the area for detecting the object is set to a predetermined size so as to surround the object in a part of the image as shown in FIG. FIG. 2 is an explanatory diagram showing a specific example of the processing of the object detection unit 31. Further, the detection area data is a part of the original image data and is a kind of image data.
In the first embodiment, the object detection unit 31 detects the first object from the first image data in which the first object appears, and detects the first object in the image indicated by the first image data. Is output as the first detection area data, the second object is detected from the second image data in which the second object appears, and the second object is detected in the image indicated by the second image data. Is output as the second detection area data. The object detection unit 31 may perform either the processing on the first image data or the processing on the second image data first, or may perform the processing at the same time.

オブジェクト検出部３１は、検出されたオブジェクトの数、各オブジェクトの位置情報、各オブジェクトの種別、および各オブジェクトの撮像時刻等をオブジェクトの検出結果として取得する。
実施の形態１において、オブジェクト検出部３１は、図２で示されるように、オブジェクトを矩形領域で検出する。すなわち、上記のオブジェクトを検出する領域は実施の形態１において、矩形で設定される。図２において、オブジェクト検出部３１は、画像データが示す画像Ｇ１において、オブジェクトＰ１とオブジェクトＰ２をそれぞれ、矩形領域ＲＰ１と矩形領域ＲＰ２で検出する。The object detection unit 31 acquires the number of detected objects, the position information of each object, the type of each object, the imaging time of each object, and the like as the object detection result.
In the first embodiment, the object detection unit 31 detects an object in a rectangular area as shown in FIG. That is, the area for detecting the above object is set as a rectangle in the first embodiment. In FIG. 2, the object detection unit 31 detects the object P1 and the object P2 in the rectangular region RP1 and the rectangular region RP2, respectively, in the image G1 indicated by the image data.

図３は、実施の形態１におけるオブジェクト検出部３１の処理、及びオブジェクト追跡部３４の処理の具体例を示す説明図である。図３では、画像処理装置１００が、領域Ｘ１および領域Ｘ２をそれぞれ撮像する２つのネットワークカメラＮＣ１，ＮＣ２から配信データを受信した場合について示している。また、図３は、配信データ間で、オブジェクトＡで示された人物、オブジェクトＢで示された人物、およびオブジェクトＣで示された人物の動きを後述するオブジェクト追跡部３４が追跡した結果を示している。オブジェクト検出部３１は、復号されたネットワークカメラＮＣ１の画像データから、オブジェクトＡａを検出する。ここで、時刻ａにおけるオブジェクトＡをオブジェクトＡａと示す。オブジェクトＡｂ〜Ａｅ、またオブジェクトＢおよびオブジェクトＣについても同様である。オブジェクト検出部３１はネットワークカメラＮＣ１の次の画像データからオブジェクトＡｂを検出する。続けて、オブジェクト検出部３１はネットワークカメラＮＣ１の次の画像データからオブジェクトＡｃを検出する。オブジェクト検出部３１は、上述した検出処理を連続して行い、オブジェクトＡａからオブジェクトＡｅを検出する。同様に、オブジェクト検出部３１は、復号されたネットワークカメラＮＣ２の各画像データから、オブジェクトＢａからオブジェクトＢｅおよびオブジェクトＣａからオブジェクトＣｅを検出する。オブジェクト検出部３１は、検出した全てのオブジェクト（Ａａ〜Ａｅ，Ｂａ〜Ｂｅ，Ｃａ〜Ｃｅ）の位置情報および撮像時刻等を取得する。 FIG. 3 is an explanatory diagram showing a specific example of the processing of the object detection unit 31 and the processing of the object tracking unit 34 in the first embodiment. FIG. 3 shows a case where the image processing device 100 receives distribution data from two network cameras NC1 and NC2 that image the area X1 and the area X2, respectively. Further, FIG. 3 shows the results of tracking the movements of the person indicated by the object A, the person indicated by the object B, and the person indicated by the object C between the distribution data by the object tracking unit 34 described later. ing. The object detection unit 31 detects the object Aa from the image data of the decoded network camera NC1. Here, the object A at the time a is referred to as the object Aa. The same applies to objects Ab to Ae, and objects B and C. The object detection unit 31 detects the object Ab from the next image data of the network camera NC1. Subsequently, the object detection unit 31 detects the object Ac from the next image data of the network camera NC1. The object detection unit 31 continuously performs the above-mentioned detection process to detect the object Ae from the object Aa. Similarly, the object detection unit 31 detects the object Be from the object Ba and the object Ce from the object Ca from each image data of the decoded network camera NC2. The object detection unit 31 acquires position information, imaging time, and the like of all the detected objects (Aa to Ae, Ba to Be, Ca to Ce).

視覚的特徴量抽出部３２は、画像データからオブジェクトの視覚的特徴量を抽出するものである。ここで、オブジェクトの視覚的特徴量とは、オブジェクトの色、オブジェクトのテクスチャ、オブジェクトの形状等の画像の画素値から抽出される特徴量であり、すなわち、人が視覚的に認識できる特徴を示す特徴量である。これに対して、物理的特徴量は、オブジェクトの物理的な特徴を示すものであり、画像上でそのオブジェクトのみを視認するだけでは抽出できない特徴量である。例えば、後述する物理的特徴量推定部３３が行うように、物理的特徴量は、画像上の位置等の情報を用いて推定する必要がある。視覚的特徴量の抽出には、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）等の学習済みの機械学習モデルを用いることができる。実施の形態１において、視覚的特徴量抽出部３２は、オブジェクト検出部３１がオブジェクトを検出した矩形領域におけるオブジェクトの視覚的特徴量を抽出する。
実施の形態１において、視覚的特徴量抽出部３２は、第一の画像データから第一のオブジェクトの視覚的特徴量である第一の視覚的特徴量を抽出し、第二の画像データから第二のオブジェクトの視覚的特徴量である第二の視覚的特徴量を抽出する。より具体的には、視覚的特徴量抽出部３２は、第一の検出領域データが入力されると、第一の検出領域データをリサイズし、第一のリサイズデータを生成する。そして、第一のリサイズデータをＣＮＮに入力することにより、第一のオブジェクトの視覚的特徴量を抽出する。同様に、視覚的特徴量抽出部３２は、第二の検出領域データが入力されると、第二の検出領域データをリサイズし、第二のリサイズデータを生成する。そして、第二のリサイズデータをＣＮＮに入力することにより、第二のオブジェクトの視覚的特徴量を抽出する。ここで、視覚的特徴量抽出部３２は、第一の視覚的特徴量の抽出と、第二の視覚的特徴量の抽出のどちらを先に行っても良いし、あるいは同時に行っても良い。The visual feature amount extraction unit 32 extracts the visual feature amount of the object from the image data. Here, the visual feature amount of an object is a feature amount extracted from the pixel values of an image such as an object color, an object texture, and an object shape, that is, a feature that can be visually recognized by a person. It is a feature quantity. On the other hand, the physical feature amount indicates the physical feature amount of the object, and is a feature amount that cannot be extracted only by visually recognizing the object on the image. For example, as the physical feature amount estimation unit 33 described later performs, the physical feature amount needs to be estimated by using information such as a position on an image. A trained machine learning model such as CNN (Convolutional Neural Network) can be used for extracting the visual features. In the first embodiment, the visual feature amount extraction unit 32 extracts the visual feature amount of the object in the rectangular area where the object detection unit 31 has detected the object.
In the first embodiment, the visual feature amount extraction unit 32 extracts the first visual feature amount, which is the visual feature amount of the first object, from the first image data, and the first visual feature amount is extracted from the second image data. Extract the second visual feature, which is the visual feature of the second object. More specifically, when the first detection area data is input, the visual feature amount extraction unit 32 resizes the first detection area data and generates the first resizing data. Then, by inputting the first resizing data into the CNN, the visual features of the first object are extracted. Similarly, when the second detection area data is input, the visual feature amount extraction unit 32 resizes the second detection area data and generates the second resizing data. Then, by inputting the second resizing data into the CNN, the visual features of the second object are extracted. Here, the visual feature amount extraction unit 32 may perform either the extraction of the first visual feature amount or the extraction of the second visual feature amount first, or may be performed at the same time.

視覚的特徴量抽出部３２の処理の具体例について、図４を参照しながら説明する。図４は、実施の形態１における視覚的特徴量抽出部３２がオブジェクトの視覚的特徴量を抽出する処理の具体例を示す説明図である。
視覚的特徴量抽出部３２は、矩形領域ＲＰ３を所定のサイズにリサイズしたリサイズデータＲＲＰ３をＣＮＮに入力し、かつ、矩形領域ＲＰ４をリサイズデータＲＲＰ３と同じ所定のサイズにリサイズしたリサイズデータＲＲＰ４をＣＮＮに入力する。その結果、視覚的特徴量抽出部３２は、オブジェクトＰ３の矩形領域ＲＰ３における視覚的特徴量である視覚的特徴量ＶＰ３と、オブジェクトＰ４の矩形領域ＲＰ４における視覚的特徴量である視覚的特徴量ＶＰ４を抽出することができる。ここで、画像データのリサイズは、複数の画像データから抽出されるそれぞれの視覚的特徴量の次元数を揃えるために必要となる。また、上記したリサイズデータの所定のサイズは、ＣＮＮの設計段階において、ＣＮＮの設計者により決定される。
しかしながら、画像データのリサイズを行ったために、視覚的特徴量ＶＰ３及び視覚的特徴量ＶＰ４は、オブジェクトの視覚的特徴量を示してはいるものの、オブジェクトの大きさや矩形領域の大きさに関する情報は失われている。そのため、視覚的特徴量ＶＰ３と視覚的特徴量ＶＰ４を比較するだけだと、身長等の物理的特徴量が異なっていても、オブジェクトＰ２とオブジェクトＰ４が同一のオブジェクトとして判定される可能性がある。このような可能性を低減するために、実施の形態１における画像処理装置１００が備える判定部８４は、後述するように、視覚的特徴量だけでなく、物理的特徴量も考慮して、二つのオブジェクトが同一か否かを判定するようにしている。A specific example of the processing of the visual feature amount extraction unit 32 will be described with reference to FIG. FIG. 4 is an explanatory diagram showing a specific example of the process in which the visual feature amount extraction unit 32 in the first embodiment extracts the visual feature amount of the object.
The visual feature amount extraction unit 32 inputs the resized data RRP3 obtained by resizing the rectangular area RP3 to a predetermined size into the CNN, and resizes the rectangular area RP4 to the same predetermined size as the resized data RRP3 and inputs the resized data RRP4 to the CNN. Enter in. As a result, the visual feature amount extraction unit 32 has a visual feature amount VP3 which is a visual feature amount in the rectangular area RP3 of the object P3 and a visual feature amount VP4 which is a visual feature amount in the rectangular area RP4 of the object P4. Can be extracted. Here, resizing of the image data is necessary to make the number of dimensions of each visual feature amount extracted from the plurality of image data uniform. Further, the predetermined size of the above-mentioned resizing data is determined by the CNN designer at the CNN design stage.
However, due to the resizing of the image data, although the visual feature amount VP3 and the visual feature amount VP4 indicate the visual feature amount of the object, the information on the size of the object and the size of the rectangular area is lost. It has been Therefore, if only the visual feature amount VP3 and the visual feature amount VP4 are compared, the object P2 and the object P4 may be determined as the same object even if the physical feature amounts such as height are different. .. In order to reduce such a possibility, the determination unit 84 included in the image processing apparatus 100 according to the first embodiment considers not only the visual feature amount but also the physical feature amount, as will be described later. It is trying to judge whether two objects are the same.

物理的特徴量推定部３３は、画像データからオブジェクトの物理的特徴量を推定するものである。実施の形態１において、物理的特徴量推定部３３は、オブジェクト検出部３１がオブジェクトを検出した画像内の位置と矩形領域の大きさとに基づいて、オブジェクトの物理的特徴量を推定する。物理的特徴量とは、オブジェクトの物理的な特徴を示す特徴量であり、例えば、オブジェクトの高さ、幅、厚み等である。また、物理的特徴量としては、上記の一次元量、すなわち物理寸法だけでなく、面積や体積等のより高次元な量も含む。 The physical feature amount estimation unit 33 estimates the physical feature amount of the object from the image data. In the first embodiment, the physical feature amount estimation unit 33 estimates the physical feature amount of the object based on the position in the image where the object detection unit 31 has detected the object and the size of the rectangular area. The physical feature amount is a feature amount indicating the physical feature of the object, for example, the height, width, thickness, and the like of the object. Further, the physical feature quantity includes not only the above-mentioned one-dimensional quantity, that is, the physical dimension, but also a higher-dimensional quantity such as an area and a volume.

以下で物理的特徴量推定部３３がオブジェクトの物理的特徴量を推定する方法の具体例を示す。
オブジェクトは、ネットワークカメラＮＣからオブジェクトまでの距離に応じて、画像内でのサイズが異なる。そこで、ネットワークカメラＮＣを設置後、一定期間の画像を収集し、オブジェクト検出部で検出されたオブジェクトの種別ごとに、画像内での大きさ（矩形領域の大きさ）と画像内での位置をデータとして収集し、記憶部６に記憶させる。そして、オブジェクトの種別、画像内での大きさ、及び画像内での位置に対するオブジェクトの物理的特徴量を外部機器２００等により設定し、オブジェクトの種別、画像内での大きさ、及び画像内での位置と、オブジェクトの物理的特徴量との対応関係を示す対応情報を予め生成しておく。十分な期間のデータを収集し、対応情報を生成することができれば、この対応情報を用いてオブジェクトの物理的特徴量を推定することができる。A specific example of a method in which the physical feature amount estimation unit 33 estimates the physical feature amount of an object is shown below.
Objects vary in size in the image depending on the distance from the network camera NC to the object. Therefore, after installing the network camera NC, images are collected for a certain period of time, and the size in the image (the size of the rectangular area) and the position in the image are determined for each type of object detected by the object detection unit. It is collected as data and stored in the storage unit 6. Then, the type of the object, the size in the image, and the physical feature amount of the object with respect to the position in the image are set by the external device 200 or the like, and the type of the object, the size in the image, and the size in the image are set. Correspondence information indicating the correspondence relationship between the position of the object and the physical feature amount of the object is generated in advance. If data for a sufficient period can be collected and correspondence information can be generated, the physical features of the object can be estimated using this correspondence information.

また、対応情報を作る段階において、オブジェクトの物理的特徴量があまり正確に分からない場合や、オブジェクトの検出精度が低く矩形領域の大きさにばらつきがある場合には、矩形領域の大きさとオブジェクトの物理的特徴量を一対一に対応付けるのではなく、オブジェクトの物理的特徴量を多段階分類し、推定するようにしてもよい。例えば、三段階に分類する場合では、オブジェクトの種別ごとに物理的特徴量の最大値、平均値、最小値を設定するとともに、矩形領域の大きさに対して第一閾値と第二閾値を設定し、矩形領域の大きさが第一閾値以上の場合は、オブジェクトの物理的特徴量は最大値であると推定し、矩形領域の大きさが第二閾値以上第一閾値未満の場合には、オブジェクトの物理的特徴量は平均値であると推定し、矩形領域の大きさが第二閾値未満の場合には、オブジェクトの物理的特徴量は最小値であると推定するようにすればよい。 In addition, if the physical features of the object are not known very accurately at the stage of creating the correspondence information, or if the detection accuracy of the object is low and the size of the rectangular area varies, the size of the rectangular area and the size of the object Rather than having a one-to-one correspondence between physical features, the physical features of an object may be classified and estimated in multiple stages. For example, when classifying into three stages, the maximum value, average value, and minimum value of the physical feature amount are set for each object type, and the first threshold value and the second threshold value are set for the size of the rectangular area. However, if the size of the rectangular area is greater than or equal to the first threshold value, the physical feature of the object is estimated to be the maximum value, and if the size of the rectangular area is greater than or equal to the second threshold value and less than the first threshold value, it is estimated. The physical feature of the object may be estimated to be an average value, and if the size of the rectangular area is less than the second threshold value, the physical feature of the object may be estimated to be the minimum value.

実施の形態１において、物理的特徴量推定部３３は、オブジェクト検出部３１から第一の検出領域データが入力され、第一の検出領域データから第一のオブジェクトの物理的特徴量である第一の物理的特徴量を推定するとともに、オブジェクト検出部３１から第二の検出領域データが入力され、第二の検出領域データから第二のオブジェクトの物理的特徴量である第二の物理的特徴量を推定する。ここで、物理的特徴量推定部３３は、第一の物理的特徴量の推定と、第二の物理的特徴量の推定のどちらを先に行っても良いし、あるいは同時に行っても良い。 In the first embodiment, in the physical feature amount estimation unit 33, the first detection area data is input from the object detection unit 31, and the first detection area data is the first physical feature amount of the first object. The second detection area data is input from the object detection unit 31, and the second physical feature amount, which is the physical feature amount of the second object, is input from the second detection area data. To estimate. Here, the physical feature amount estimation unit 33 may perform either the estimation of the first physical feature amount or the estimation of the second physical feature amount first, or may perform the estimation at the same time.

オブジェクト追跡部３４は、オブジェクト検出部３１で検出されたオブジェクトを時間方向に追跡するものである。実施の形態１において、オブジェクト追跡部３４は、第一のオブジェクトと第二のオブジェクトの追跡を行う。オブジェクト追跡部３４は、オブジェクトの時間方向への追跡を行う際、オブジェクト検出部３１で検出されたオブジェクトの検出結果を、１つの画像データ内、および時間的に連続する複数の画像データ間で比較して追跡を行う。例えば、追跡対象のオブジェクトが人物の場合、一台のネットワークカメラＮＣで撮影された同一人物を追跡する。上記の画像データ内及び画像データ間での比較において、例えば、物理的特徴量推定部３３で推定された物理的特徴量や視覚的特徴量抽出部３２で抽出された視覚的特徴量を用いて追跡することができる。あるいは、後述する画像照合部８の処理を行うことにより、前フレームと現フレームのオブジェクトが同一か判定し、追跡するようにしてもよい。また、オブジェクト追跡部３４は、オブジェクトの追跡結果であるオブジェクトの動き情報（オプティカルフロー）を記述子生成部３５に出力する。 The object tracking unit 34 tracks the object detected by the object detecting unit 31 in the time direction. In the first embodiment, the object tracking unit 34 tracks the first object and the second object. When tracking an object in the time direction, the object tracking unit 34 compares the detection result of the object detected by the object detecting unit 31 within one image data and among a plurality of image data that are continuous in time. And track. For example, when the object to be tracked is a person, the same person photographed by one network camera NC is tracked. In the comparison within the above image data and between the image data, for example, the physical feature amount estimated by the physical feature amount estimation unit 33 and the visual feature amount extracted by the visual feature amount extraction unit 32 are used. Can be tracked. Alternatively, by performing the processing of the image collation unit 8 described later, it may be determined whether the objects of the previous frame and the current frame are the same and tracked. Further, the object tracking unit 34 outputs the motion information (optical flow) of the object, which is the tracking result of the object, to the descriptor generation unit 35.

図３に追跡対象のオブジェクトが人物の場合の具体例を示す。
オブジェクト追跡部３４は、領域Ｘ１を撮像したネットワークカメラＮＣ１により得られた複数の画像データにおいて、同一の特徴を有するオブジェクトＡ（Ａａ〜Ａｅ）を追跡する。同様に、オブジェクト追跡部３４は、領域Ｘ２を撮像したネットワークカメラＮＣ２により得られた複数の画像データにおいて、それぞれ同一の特徴を有するオブジェクトＢ（Ｂａ〜Ｂｅ）、及びオブジェクトＣ（Ｃａ〜Ｃｅ）を追跡する。オブジェクト追跡部３４は、オブジェクトＡ，Ｂ，Ｃの動き情報として、例えばオブジェクトＡが領域Ｘ１を撮像した画像データ内に出現していた時間、オブジェクトＢ，Ｃが領域Ｘ２を撮像した画像データ内に出現していた時間、オブジェクトＡ，Ｂ，Ｃの移動軌跡を示す情報を記述子生成部３５に出力する。FIG. 3 shows a specific example when the object to be tracked is a person.
The object tracking unit 34 tracks objects A (Aa to Ae) having the same characteristics in a plurality of image data obtained by the network camera NC1 that images the region X1. Similarly, the object tracking unit 34 captures objects B (Ba to Be) and objects C (Ca to Ce) having the same characteristics in a plurality of image data obtained by the network camera NC2 that images the region X2. Chase. The object tracking unit 34 uses the motion information of the objects A, B, and C as, for example, the time during which the object A appears in the image data in which the area X1 is captured, and the time in which the objects B and C are captured in the image data in the region X2. Information indicating the time of appearance and the movement locus of objects A, B, and C is output to the descriptor generation unit 35.

記述子生成部４は、所定のフォーマットに従い、画像データに関連したオブジェクトの特徴を示す特徴記述子を生成する。実施の形態１において、記述子生成部４は、オブジェクト検出部３１が取得したオブジェクトの検出結果、物理的特徴量推定部３３が推定したオブジェクトの物理的特徴量、視覚的特徴量抽出部３２が抽出したオブジェクトの視覚的特徴量、及びオブジェクト追跡部３４が出力したオブジェクトの動き情報を含む特徴記述子を生成する。また、特徴記述子には、時間方向に追跡された同一のオブジェクトであることを示す識別子（ＩＤ）が含まれる。
実施の形態１において、記述子生成部４は、第一のオブジェクトの第一の画像データにおける特徴を示す特徴記述子である第一の特徴記述子と、第二のオブジェクトの第二の画像データにおける特徴を示す特徴記述子である第二の特徴記述子とを生成する。The descriptor generation unit 4 generates a feature descriptor showing the features of the object related to the image data according to a predetermined format. In the first embodiment, the descriptor generation unit 4 includes the detection result of the object acquired by the object detection unit 31, the physical feature amount of the object estimated by the physical feature amount estimation unit 33, and the visual feature amount extraction unit 32. A feature descriptor including the visual feature amount of the extracted object and the motion information of the object output by the object tracking unit 34 is generated. The feature descriptor also includes an identifier (ID) indicating that they are the same object tracked in the time direction.
In the first embodiment, the descriptor generation unit 4 includes a first feature descriptor which is a feature descriptor indicating a feature in the first image data of the first object and a second image data of the second object. A second feature descriptor, which is a feature descriptor indicating the feature in, is generated.

データ記録制御部５は、復号部２から入力された復号された画像データと、記述子生成部４から入力された特徴記述子とを対応付けて記憶部６に格納することにより、データベースを構築する。ここで、一つの画像データに複数のオブジェクトが現れる場合には、特徴記述子は、オブジェクト毎に生成し、一つの画像データに複数の特徴記述子を対応付けるようにしてもよいし、複数のオブジェクトの特徴記述子を一つの特徴記述子としてまとめて、一つの画像データに一つの特徴記述子を対応付けるようにしてもよい。また、実施の形態１においては、復号部２から入力された画像データを特徴記述子と対応付けて記憶するようにしたが、オブジェクト検出部３１が生成したオブジェクトの検出領域データを特徴記述子と対応付けて記憶するようにしてもよい。
実施の形態１において、データ記録制御部５は、第一の画像データと第一の特徴記述子とを対応付けて記憶部６に格納するとともに、第二の画像データと第二の特徴記述子とを対応付けて記憶部６に格納する。データ記録制御部５は、第一の画像データ及び第一の特徴記述子の格納と、第二の画像データ及び第二の特徴記述子の格納はどちらを先に行っても良いし、同時に行っても良い。The data recording control unit 5 constructs a database by associating the decoded image data input from the decoding unit 2 with the feature descriptor input from the descriptor generation unit 4 and storing them in the storage unit 6. To do. Here, when a plurality of objects appear in one image data, a feature descriptor may be generated for each object, and a plurality of feature descriptors may be associated with one image data, or a plurality of objects. The feature descriptors of are grouped together as one feature descriptor, and one feature descriptor may be associated with one image data. Further, in the first embodiment, the image data input from the decoding unit 2 is stored in association with the feature descriptor, but the detection area data of the object generated by the object detection unit 31 is used as the feature descriptor. It may be associated and stored.
In the first embodiment, the data recording control unit 5 stores the first image data and the first feature descriptor in the storage unit 6 in association with each other, and stores the second image data and the second feature descriptor in the storage unit 6. Is stored in the storage unit 6 in association with. The data recording control unit 5 may store the first image data and the first feature descriptor and the second image data and the second feature descriptor first, or at the same time. You may.

データ記録制御部５は、画像データと、特徴記述子とを、双方向に高速にアクセスすることができる形式で、記憶部６に格納するのが望ましい。また、データ記録制御部５は、画像データと特徴記述子との対応関係を示すインデックステーブルを作成してデータベースを構築してもよい。例えば、データ記録制御部５は、画像データを構成する特定の画像フレームのデータ位置が与えられた場合、当該データ位置に対応する特徴記述子の記憶部６上の格納位置を高速に特定可能なように、インデックス情報を付加する。また、データ記録制御部５は、記憶部６上の格納位置に対応するデータ位置を高速に特定可能なようにインデックス情報を付加してもよい。 It is desirable that the data recording control unit 5 stores the image data and the feature descriptor in the storage unit 6 in a format that allows high-speed access in both directions. Further, the data recording control unit 5 may create an index table showing the correspondence between the image data and the feature descriptor to construct a database. For example, when the data position of a specific image frame constituting the image data is given, the data recording control unit 5 can quickly specify the storage position on the storage unit 6 of the feature descriptor corresponding to the data position. Index information is added as in. Further, the data recording control unit 5 may add index information so that the data position corresponding to the storage position on the storage unit 6 can be specified at high speed.

記憶部６は、各種情報を記憶するものであり、後述するようにハードディスク等の記憶装置１０００１により構成される。実施の形態１において、記憶部６は、画像データと特徴記述子とを対応付けて記憶する。また、実施の形態１において、記憶部６は、後述する判定部８４が用いる学習済みの機械学習モデルを記憶する。
実施の形態１において、記憶部６は、第一の画像データと第一の特徴記述子とを対応付けて記憶し、第二の画像データと第二の特徴記述子とを対応付けて記憶する。この第一の特徴記述子は、第一のオブジェクトの第一の画像データにおける視覚的特徴量である第一の視覚的特徴量と、第一のオブジェクトの物理的特徴量である第一の物理的特徴量とを含み、第二の特徴記述子は、第二のオブジェクトの第二の画像データにおける視覚的特徴量である第二の視覚的特徴量と、第二のオブジェクトの物理的特徴量である第二の物理的特徴量とを含む。すなわち、実施の形態１における記憶部６は、第一の視覚的特徴量と第一の物理的特徴量とを含む第一の特徴記述子と、第二の視覚的特徴量と第二の物理的特徴量とを含む第二の特徴記述子とを記憶する。
また、実施の形態１においては、記憶部６が画像データ及び特徴記述子を記憶する構成を示したが、当該構成に限定されるものではない。記憶部６に替えて、通信ネットワークＮＷ上に配置された単数または複数のネットワークストレージ装置（図示せず）が画像データ及び特徴記述子を記憶し、当該ネットワークストレージ装置にデータ記録制御部５がアクセスするように構成してもよい。これにより、データ記録制御部５が画像データと特徴記述子とを、外部のネットワークストレージ装置に蓄積し、画像処理装置１００の外部にデータベースを構築することができる。また、判定部８４が用いる学習済みの機械学習モデルも記憶部６ではなく、外部のネットワークストレージ装置に記憶するようにしてもよい。The storage unit 6 stores various types of information, and is configured by a storage device 10001 such as a hard disk as described later. In the first embodiment, the storage unit 6 stores the image data and the feature descriptor in association with each other. Further, in the first embodiment, the storage unit 6 stores the learned machine learning model used by the determination unit 84, which will be described later.
In the first embodiment, the storage unit 6 stores the first image data and the first feature descriptor in association with each other, and stores the second image data and the second feature descriptor in association with each other. .. This first feature descriptor is a first visual feature that is a visual feature in the first image data of the first object and a first physics that is a physical feature of the first object. The second feature descriptor includes the second visual feature, which is the visual feature in the second image data of the second object, and the physical feature of the second object. Includes a second physical feature that is. That is, the storage unit 6 in the first embodiment has a first feature descriptor including a first visual feature amount and a first physical feature amount, and a second visual feature amount and a second physical feature amount. A second feature descriptor including a feature quantity is stored.
Further, in the first embodiment, the storage unit 6 has shown a configuration for storing the image data and the feature descriptor, but the present invention is not limited to the configuration. Instead of the storage unit 6, one or more network storage devices (not shown) arranged on the communication network NW store image data and feature descriptors, and the data recording control unit 5 accesses the network storage device. It may be configured to do so. As a result, the data recording control unit 5 can store the image data and the feature descriptor in the external network storage device, and build a database outside the image processing device 100. Further, the trained machine learning model used by the determination unit 84 may also be stored in an external network storage device instead of the storage unit 6.

インターフェース部７は、外部機器２００と画像処理装置の各部を接続して、交信や外部機器２００による各種制御を可能にするものである。
外部機器２００は、インターフェース部７を介して、記憶部６内のデータベースや画像取得部８１にアクセスするものである。画像処理装置１００のユーザーは、外部機器２００を用いて、後述する画像取得部８１が画像を検索する検索条件を設定したり、記憶部６に画像データ等を追加したりすることができる。The interface unit 7 connects the external device 200 and each part of the image processing device to enable communication and various controls by the external device 200.
The external device 200 accesses the database and the image acquisition unit 81 in the storage unit 6 via the interface unit 7. The user of the image processing device 100 can use the external device 200 to set search conditions for the image acquisition unit 81, which will be described later, to search for an image, or to add image data or the like to the storage unit 6.

画像照合部８は、複数の画像データに現れるオブジェクトの照合を行うものであり、実施の形態１において、画像照合部８は、第一の画像データに現れる第一のオブジェクトと、第二の画像データに現れる第二のオブジェクトが同一のオブジェクトであるか否かを照合し判定する。また、実施の形態１において、画像照合部８は、画像取得部８１、視覚的特徴量取得部８２、物理的特徴量取得部８３、及び判定部８４を備える。
画像照合部８は、インターフェース部７を介して、外部機器２００から検索条件が設定されると、処理を開始する。ここで、検索条件とは、検索対象とするエリア情報、検索対象とする時刻情報、検索対象とするオブジェクトの種類や特徴等である。検索条件の具体例として、例えば、あるネットワークカメラＮＣ内で同一のオブジェクトとして追跡された時間が一定時間超えたオブジェクトを検索することを指示する条件、またはネットワークカメラＮＣ内で予め設定されたエリア（例えば、進入禁止エリア）に該当する位置情報を有するオブジェクトを検出することを指示する条件が挙げられる。また、画像照合部８は、検索条件として画像データを入力し、当該画像データに現れるオブジェクトと同じ特徴、例えば視覚的特徴量等、を有するオブジェクトを検索するようにしてもよい。The image collation unit 8 collates objects appearing in a plurality of image data, and in the first embodiment, the image collation unit 8 collates the first object appearing in the first image data and the second image. It is checked and determined whether or not the second object appearing in the data is the same object. Further, in the first embodiment, the image collation unit 8 includes an image acquisition unit 81, a visual feature amount acquisition unit 82, a physical feature amount acquisition unit 83, and a determination unit 84.
The image collation unit 8 starts processing when a search condition is set from the external device 200 via the interface unit 7. Here, the search conditions are area information to be searched, time information to be searched, types and features of objects to be searched, and the like. As a specific example of the search condition, for example, a condition instructing to search for an object whose time tracked as the same object in a certain network camera NC exceeds a certain period of time, or a preset area in the network camera NC ( For example, there is a condition for instructing to detect an object having position information corresponding to the entry prohibited area). Further, the image collation unit 8 may input image data as a search condition and search for an object having the same features as the object appearing in the image data, such as a visual feature amount.

画像取得部８１は、照合する複数の画像データと当該画像データに対応付けられた特徴記述子を取得するものである。実施の形態１において、画像取得部８１は、第一のオブジェクトが現れる第一の画像データ、第二のオブジェクトが現れる第二の画像データ、第一の画像データに対応付けられた第一の特徴記述子、及び第二の画像データに対応付けられた第二の特徴記述子を取得する。 The image acquisition unit 81 acquires a plurality of image data to be collated and a feature descriptor associated with the image data. In the first embodiment, the image acquisition unit 81 has the first image data in which the first object appears, the second image data in which the second object appears, and the first feature associated with the first image data. The descriptor and the second feature descriptor associated with the second image data are acquired.

実施の形態１において、画像取得部８１は、外部機器２００により設定された検索条件に合致するオブジェクトを記憶部６から検索し、当該オブジェクトが現れる画像データを取得する。
画像取得部８１は、記憶部６を検索することにより画像データを取得する場合、画像データあるいは特徴記述子に含まれる位置情報や撮影時刻情報に基づいて、検索対象を絞るようにしてもよい。例えば、図３において、オブジェクトＡと同一のオブジェクトを検索したい場合、ネットワークカメラＮＣ１がオブジェクトＡ（Ａａ〜Ａｅ）を撮像した時刻と、ネットワークカメラＮＣ２がオブジェクトＢ（Ｂａ〜Ｂｅ）を撮像した時刻が同時刻であるならば、オブジェクトＡとオブジェクトＢは同一でないと判断できるので、検索対象から除外することができる。これに対して、ネットワークカメラＮＣ１がオブジェクトＡ（Ａａ〜Ａｅ）を撮像した時刻の少し後に、オブジェクトＣがネットワークカメラＮＣ２で撮像されている場合、オブジェクトＣは領域Ｘ１から領域Ｘ２に歩いてきたオブジェクトＡである可能性がある、すなわち、オブジェクトＡとオブジェクトＣとは同一のオブジェクトである可能性があるので、検索対象から除外しない。上記の処理を行うことにより、検索量を減らすことができる。In the first embodiment, the image acquisition unit 81 searches the storage unit 6 for an object that matches the search condition set by the external device 200, and acquires image data in which the object appears.
When the image acquisition unit 81 acquires the image data by searching the storage unit 6, the image acquisition unit 81 may narrow down the search target based on the position information or the shooting time information included in the image data or the feature descriptor. For example, in FIG. 3, when it is desired to search for the same object as the object A, the time when the network camera NC1 images the object A (Aa to Ae) and the time when the network camera NC2 images the object B (Ba to Be) are If the time is the same, it can be determined that the object A and the object B are not the same, so that the object A and the object B can be excluded from the search target. On the other hand, when the object C is imaged by the network camera NC2 shortly after the time when the network camera NC1 images the objects A (Aa to Ae), the object C is an object that has walked from the area X1 to the area X2. Since there is a possibility that it is A, that is, object A and object C may be the same object, they are not excluded from the search target. By performing the above processing, the search amount can be reduced.

視覚的特徴量取得部８２は、入力された画像データに現れるオブジェクトの視覚的特徴量を取得するものである。実施の形態１において、視覚的特徴量取得部８２は、第一の画像データにおける第一のオブジェクトの視覚的特徴量である第一の視覚的特徴量と、第二の画像データにおける第二のオブジェクトの視覚的特徴量である第二の視覚的特徴量とを取得する。
また、実施の形態１において、視覚的特徴量取得部８２は、画像データに対応付けられた特徴記述子から視覚的特徴量を取得する。すなわち、第一の画像データに対応付けられた第一の特徴記述子に含まれる第一の視覚的特徴量を取得し、第二の画像データに対応付けられた第二の特徴記述子に含まれる第二の視覚的特徴量を取得する。また、実施の形態１において、特徴記述子が含む視覚的特徴量は、視覚的特徴量抽出部３２が抽出した視覚的特徴量であるため、実施の形態１における視覚的特徴量取得部８２は、視覚的特徴量抽出部３２が第一の検出領域データから抽出した第一の視覚的特徴量を取得し、視覚的特徴量抽出部３２が第二の検出領域データから抽出した第二の視覚的特徴量を取得するものでもある。The visual feature amount acquisition unit 82 acquires the visual feature amount of the object appearing in the input image data. In the first embodiment, the visual feature amount acquisition unit 82 has a first visual feature amount which is a visual feature amount of the first object in the first image data and a second visual feature amount in the second image data. Get a second visual feature, which is the visual feature of the object.
Further, in the first embodiment, the visual feature amount acquisition unit 82 acquires the visual feature amount from the feature descriptor associated with the image data. That is, the first visual feature amount included in the first feature descriptor associated with the first image data is acquired and included in the second feature descriptor associated with the second image data. Obtain the second visual feature quantity. Further, in the first embodiment, the visual feature amount included in the feature descriptor is the visual feature amount extracted by the visual feature amount extraction unit 32, so that the visual feature amount acquisition unit 82 in the first embodiment , The visual feature amount extraction unit 32 acquires the first visual feature amount extracted from the first detection area data, and the visual feature amount extraction unit 32 acquires the second visual feature amount extracted from the second detection area data. It also acquires the target feature amount.

上記において、視覚的特徴量取得部８２は、特徴記述子から視覚的特徴量を取得するものとしたが、画像データに現れるオブジェクトの視覚的特徴量を取得できれば、この構成に限らない。例えば、視覚的特徴量取得部８２は、画像データを画像認識部３に入力し、オブジェクト検出部３１及び視覚的特徴量抽出部３２の処理を行うことにより、オブジェクトの視覚的特徴量を取得するようにしてもよい。また、オブジェクトを検出した矩形領域の情報が特徴記述子に含まれる場合には、オブジェクト検出部３１の処理は省略し、視覚的特徴量抽出部３２の処理のみ行えばよい。実施の形態１においては、記憶部６が画像データと対応付けて記憶する特徴記述子は、視覚的特徴量を含むようにしたが、記憶部６に記憶するデータ量削減等の理由により、特徴記述子に視覚的特徴量が含まれない場合や、検索条件として外部機器２００から特徴記述子が対応付けられていない画像データが入力された場合には、上記の方法を用いることにより、オブジェクトの視覚的特徴量を取得することができる。 In the above, the visual feature amount acquisition unit 82 is supposed to acquire the visual feature amount from the feature descriptor, but the present invention is not limited to this configuration as long as the visual feature amount of the object appearing in the image data can be acquired. For example, the visual feature amount acquisition unit 82 acquires the visual feature amount of the object by inputting the image data into the image recognition unit 3 and performing the processing of the object detection unit 31 and the visual feature amount extraction unit 32. You may do so. Further, when the information of the rectangular area in which the object is detected is included in the feature descriptor, the processing of the object detection unit 31 may be omitted and only the processing of the visual feature amount extraction unit 32 may be performed. In the first embodiment, the feature descriptor stored by the storage unit 6 in association with the image data includes the visual feature amount, but the feature is characterized by reducing the amount of data stored in the storage unit 6. If the descriptor does not include the visual feature amount, or if image data to which the feature descriptor is not associated is input from the external device 200 as a search condition, the object can be obtained by using the above method. The amount of visual features can be acquired.

物理的特徴量取得部８３は、入力された画像データに現れるオブジェクトの物理的特徴量を取得するものである。実施の形態１において、物理的特徴量取得部８３は、第一のオブジェクトの物理的特徴量である第一の物理的特徴量と、第二のオブジェクトの物理的特徴量である第二の物理的特徴量とを取得する。
また、実施の形態１において、物理的特徴量取得部８３は、画像データに対応付けられた特徴記述子から物理的特徴量を取得する、すなわち、第一の画像データに対応付けられた第一の特徴記述子に含まれる第一の物理的特徴量を取得し、第二の画像データに対応付けられた第二の特徴記述子に含まれる第二の物理的特徴量を取得する。また、実施の形態１において、特徴記述子が含む物理的特徴量は、物理的特徴量推定部３３が推定した物理的特徴量であるため、実施の形態１における物理的特徴量取得部８３は、物理的特徴量推定部３３が第一の検出領域データから推定した第一の物理的特徴量を取得し、物理的特徴量推定部３３が第二の検出領域データから推定した第二の物理的特徴量を取得するものでもある。The physical feature amount acquisition unit 83 acquires the physical feature amount of the object appearing in the input image data. In the first embodiment, the physical feature acquisition unit 83 includes a first physical feature that is the physical feature of the first object and a second physics that is the physical feature of the second object. Get the feature quantity.
Further, in the first embodiment, the physical feature amount acquisition unit 83 acquires the physical feature amount from the feature descriptor associated with the image data, that is, the first image data associated with the first image data. The first physical feature amount included in the feature descriptor of is acquired, and the second physical feature amount included in the second feature descriptor associated with the second image data is acquired. Further, in the first embodiment, the physical feature amount included in the feature descriptor is the physical feature amount estimated by the physical feature amount estimation unit 33, so that the physical feature amount acquisition unit 83 in the first embodiment , The physical feature amount estimation unit 33 acquires the first physical feature amount estimated from the first detection area data, and the physical feature amount estimation unit 33 acquires the second physical feature amount estimated from the second detection area data. It also acquires the target feature amount.

上記において、物理的特徴量取得部８３は、特徴記述子から物理的特徴量を取得するものとしたが、視覚的特徴量取得部８２と同様に、画像データに現れるオブジェクトの物理的特徴量を取得できれば、この構成に限らない。例えば、物理的特徴量取得部８３は、画像データを画像認識部３に入力し、オブジェクト検出部３１及び物理的特徴量推定部３３の処理を行うことにより、オブジェクトの物理的特徴量を取得するようにしてもよい。また、オブジェクトを検出した矩形領域の情報が特徴記述子に含まれる場合には、オブジェクト検出部３１の処理は省略し、物理的特徴量推定部３３の処理のみ行えばよい。実施の形態１においては、記憶部６が画像データと対応付けて記憶する特徴記述子は、物理的特徴量を含むようにしたが、記憶部６に記憶するデータ量削減等の理由により、特徴記述子に物理的特徴量が含まれない場合や、検索条件として外部機器２００から特徴記述子が対応付けられていない画像データが入力された場合には、上記の方法を用いることにより、オブジェクトの物理的特徴量を取得することができる。また、検索条件として画像データが入力された場合において、当該画像データに写ったオブジェクトの物理的特徴量が既知の場合には、当該物理的特徴量を画像処理装置１００のユーザーが外部機器２００から入力することにより直接取得するようにしてもよい。 In the above, the physical feature amount acquisition unit 83 acquires the physical feature amount from the feature descriptor, but similarly to the visual feature amount acquisition unit 82, the physical feature amount of the object appearing in the image data is acquired. If it can be obtained, it is not limited to this configuration. For example, the physical feature amount acquisition unit 83 acquires the physical feature amount of the object by inputting the image data into the image recognition unit 3 and performing the processing of the object detection unit 31 and the physical feature amount estimation unit 33. You may do so. Further, when the information of the rectangular area in which the object is detected is included in the feature descriptor, the processing of the object detection unit 31 may be omitted and only the processing of the physical feature amount estimation unit 33 may be performed. In the first embodiment, the feature descriptor stored by the storage unit 6 in association with the image data includes the physical feature amount, but the feature is characterized by reducing the amount of data stored in the storage unit 6. When the physical feature amount is not included in the descriptor, or when image data to which the feature descriptor is not associated is input from the external device 200 as a search condition, the object can be obtained by using the above method. Physical features can be obtained. Further, when image data is input as a search condition and the physical feature amount of the object reflected in the image data is known, the user of the image processing device 100 can obtain the physical feature amount from the external device 200. You may get it directly by inputting it.

判定部８４は、入力された複数の画像データに現れるオブジェクトが同一か否かを、学習済みの機械学習モデルを用いて、判定するものである。ここで、判定部８４が用いる学習済みの機械学習モデルは、視覚的特徴量抽出部３２が用いる学習済みの機械学習モデルとは別のものである。実施の形態１において、判定部８４は、学習済みの機械学習モデルを用いて、第一の視覚的特徴量と、第一の物理的特徴量と、第二の視覚的特徴量と、第二の物理的特徴量から、第一の画像データに現れる第一のオブジェクトと、第二の画像データに現れる第二のオブジェクトが同一のオブジェクトであるか否かを判定するものである。ここで、判定部８４は、第一の視覚的特徴量と第一の物理的特徴量とを第一のオブジェクトに関する入力としており、第二の視覚的特徴量と第二の物理的特徴量とを第二のオブジェクトに関する入力としている。
実施の形態１において、判定部８４は、特徴量ベクトル取得部８４１、類似度算出部８４２、及び類似度判定部８４３を備える。The determination unit 84 determines whether or not the objects appearing in the plurality of input image data are the same by using the trained machine learning model. Here, the trained machine learning model used by the determination unit 84 is different from the trained machine learning model used by the visual feature extraction unit 32. In the first embodiment, the determination unit 84 uses the trained machine learning model to obtain the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second visual feature amount. From the physical features of the above, it is determined whether or not the first object appearing in the first image data and the second object appearing in the second image data are the same object. Here, the determination unit 84 uses the first visual feature amount and the first physical feature amount as inputs for the first object, and sets the second visual feature amount and the second physical feature amount. Is the input for the second object.
In the first embodiment, the determination unit 84 includes a feature quantity vector acquisition unit 841, a similarity calculation unit 842, and a similarity determination unit 843.

特徴量ベクトル取得部８４１は、学習済みの機械学習モデルの入力として、オブジェクトの視覚的特徴量と、オブジェクトの物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、オブジェクトの特徴量ベクトルを出力させることにより、当該特徴量ベクトルを取得するものである。ここで、特徴量ベクトルとはオブジェクトの特徴を示すベクトルである。実施の形態１において、特徴量ベクトル取得部８４１は、学習済みの機械学習モデルの入力として、第一の視覚的特徴量と第一の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第一のオブジェクトの特徴量ベクトルである第一の特徴量ベクトルを取得し、かつ、学習済みの機械学習モデルの入力として、第二の視覚的特徴量と第二の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第二のオブジェクトの特徴量ベクトルである第二の特徴量ベクトルを取得する。 The feature vector acquisition unit 841 inputs the visual features of the object and the physical features of the object as inputs of the trained machine learning model, and outputs the features of the object as the output of the trained machine learning model. The feature quantity vector is acquired by outputting the quantity vector. Here, the feature amount vector is a vector indicating the features of the object. In the first embodiment, the feature quantity vector acquisition unit 841 inputs the first visual feature quantity and the first physical feature quantity as the input of the trained machine learning model, and the trained machine learning model. As the output of, the first feature vector, which is the feature vector of the first object, is acquired, and as the input of the trained machine learning model, the second visual feature and the second physical feature are obtained. The quantity and the quantity are input, and the second feature quantity vector, which is the feature quantity vector of the second object, is acquired as the output of the trained machine learning model.

実施の形態１において、特徴量ベクトル取得部８４１は、特徴記述子に含まれるオブジェクトの視覚的特徴量、すなわち画像データからＣＮＮにより得られた視覚的特徴量と、特徴記述子に含まれるオブジェクトの物理的特徴量、すなわちオブジェクトが検出された矩形領域から推定された物理的特徴量とを入力として用いる。また、実施の形態１において、学習済みの機械学習モデルは、全結合型のニューラルネットであって、距離学習により学習したものとする。ここで、距離学習とは、出力される２つのベクトルに対して距離を定義し、２つのベクトル間の距離が近ければ同一のオブジェクト、遠ければ異なるオブジェクトとなるように学習するものである。上記において、距離を定義すると述べたが、距離の公理を満たさない類似度、例えば、コサイン類似度等を２つのベクトルの近さの尺度として用いるようにしてもよい。以下では、２つのベクトルの近さの尺度を、距離もコサイン類似度等の他の尺度も、まとめて類似度と呼ぶこととする。また、実施の形態１においては、類似度として、ユークリッド距離を用いる。すなわち、第一のオブジェクトの特徴を表す第一の特徴量ベクトルと第二のオブジェクトの特徴を表す第二の特徴量ベクトルのユークリッド距離が小さい場合には、第一のオブジェクトと第二のオブジェクトとは同一の可能性が高く、ユークリッド距離が大きい場合には、第一のオブジェクトと第二のオブジェクトとは異なる可能性が高いと判断できる。 In the first embodiment, the feature amount vector acquisition unit 841 determines the visual feature amount of the object included in the feature descriptor, that is, the visual feature amount obtained by CNN from the image data and the object included in the feature descriptor. The physical feature amount, that is, the physical feature amount estimated from the rectangular area where the object is detected is used as an input. Further, in the first embodiment, it is assumed that the trained machine learning model is a fully connected neural network and is trained by distance learning. Here, the distance learning is to define a distance for two output vectors and learn so that if the distance between the two vectors is short, the object is the same, and if the distance between the two vectors is far, the object is different. Although it has been described above that the distance is defined, a similarity that does not satisfy the axiom of distance, for example, a cosine similarity, may be used as a measure of the closeness of two vectors. In the following, the measure of proximity of two vectors will be collectively referred to as the degree of similarity with other measures such as distance and cosine similarity. Further, in the first embodiment, the Euclidean distance is used as the similarity. That is, when the Euclidean distance between the first feature vector representing the features of the first object and the second feature vector representing the features of the second object is small, the first object and the second object Are likely to be the same, and if the Euclidean distance is large, it can be determined that the first object and the second object are likely to be different.

学習時の教師データについては、物理的特徴量が既知のオブジェクトが現れる画像データを用意すればよい。また、当然ながら、学習時の教師データについては、ある画像データに現れるオブジェクトが別の画像データに現れるオブジェクトと同一であるか否かも既知なものとする。画像データから視覚的特徴量抽出部３２を用いて視覚的特徴量の抽出を行い、抽出された視覚的特徴量と既知の物理的特徴量を学習させたい機械学習モデルに入力し、同一のオブジェクトのベクトルは近くに位置するように、異なるオブジェクトのベクトルは遠くに位置するように学習させる。学習方法としては、ＳｉａｍｅｓｅＮｅｔやＴｒｉｐｌｅｔＬｏｓｓ等を用いればよい。また、上記においては、教師データとして画像データを用意したが、学習に必要なものは視覚的特徴量と物理的特徴量であるため、あるオブジェクトに関する視覚的特徴量と物理的特徴量を用意できれば、画像データそのものはなくてもよい。 As the teacher data at the time of learning, it is sufficient to prepare image data in which an object having a known physical feature amount appears. Further, as a matter of course, regarding the teacher data at the time of learning, it is also known whether or not the object appearing in one image data is the same as the object appearing in another image data. The visual features are extracted from the image data using the visual features extraction unit 32, and the extracted visual features and the known physical features are input to the machine learning model to be trained, and the same object is used. Trains the vectors of to be closer and the vectors of different objects to be farther away. As a learning method, Siamese Net, Triplet Loss, or the like may be used. Further, in the above, image data is prepared as teacher data, but since what is necessary for learning is visual features and physical features, if visual features and physical features related to a certain object can be prepared. , The image data itself does not have to be present.

実施の形態１において、類似度としてユークリッド距離を採用したので、学習済みの機械学習モデルは、同一オブジェクト間の特徴量ベクトルの類似度は小さくなるように重みパラメータを学習し、異なるオブジェクト間の特徴量ベクトルの類似度は大きくなるように重みパラメータが学習されている。すなわち、特徴量ベクトル取得部８４１が取得する特徴量ベクトルは、異なる画像であっても同一のオブジェクトから取得された特徴量ベクトル同士では、類似度が小さくなり、異なるオブジェクトから取得された特徴量ベクトル同士では、類似度が大きくなるようにオブジェクトの特徴が表現されたベクトルである。上記において、距離学習の対象は全結合ニューラルネットとしたが、全結合ニューラルネットと視覚的特徴量を抽出するＣＮＮとをまとめて距離学習を行うようにしてもよい。
また、判定部８４が用いる学習済みの機械学習モデルは、全結合型のニューラルネットであって、距離学習により学習したものとしたが、オブジェクトが同一か否かの判定を行うことができる構成であればこれに限らず、ロジスティック回帰等を用いる構成であってもよい。Since the Euclidean distance is adopted as the similarity in the first embodiment, the trained machine learning model learns the weight parameter so that the similarity of the feature vector between the same objects becomes small, and the features between different objects. The weight parameters are learned so that the similarity of the quantity vectors increases. That is, the feature vector acquired by the feature vector acquisition unit 841 has a small similarity between the feature vectors acquired from the same object even if they are different images, and the feature vector acquired from different objects. Between each other, it is a vector in which the features of the objects are expressed so that the degree of similarity increases. In the above, the target of the distance learning is the fully connected neural network, but the distance learning may be performed by combining the fully connected neural network and the CNN for extracting the visual features.
Further, the trained machine learning model used by the determination unit 84 is a fully connected neural network, which is learned by distance learning, but has a configuration capable of determining whether or not the objects are the same. If there is, it is not limited to this, and a configuration using logistic regression or the like may be used.

図５は、特徴量ベクトル取得部８４１が特徴量ベクトルを取得する処理の具体例を示す説明図である。特徴量ベクトル取得部８４１は、視覚的特徴量Ｉ１と物理的特徴量Ｉ２を学習済みの機械学習モデルＭ１に入力し、学習済みの機械学習モデルＭ１は特徴量ベクトルＶ１を出力する。また、図５は、視覚的特徴量抽出部３２が、オブジェクト検出部３１がオブジェクトを検出した矩形領域ＲＲから、ＣＮＮにより視覚的特徴量Ｉ１を抽出する処理についても示している。 FIG. 5 is an explanatory diagram showing a specific example of the process in which the feature quantity vector acquisition unit 841 acquires the feature quantity vector. The feature quantity vector acquisition unit 841 inputs the visual feature quantity I1 and the physical feature quantity I2 into the trained machine learning model M1, and the trained machine learning model M1 outputs the feature quantity vector V1. Further, FIG. 5 also shows a process in which the visual feature amount extraction unit 32 extracts the visual feature amount I1 by CNN from the rectangular region RR in which the object detection unit 31 detects the object.

類似度算出部８４２は、第一の特徴量ベクトルと第二の特徴量ベクトルとの類似度を算出するものである。上述したように、実施の形態１においては、類似度としてユークリッド距離を用いる。 The similarity calculation unit 842 calculates the similarity between the first feature amount vector and the second feature amount vector. As described above, in the first embodiment, the Euclidean distance is used as the similarity.

類似度判定部８４３は、類似度算出部８４２が算出した類似度に基づいて、第一のオブジェクトと第二のオブジェクトが同一か否かを判定するものである。第一のオブジェクトと第二のオブジェクトが同一であるとき小さくなるような類似度、例えば、ユークリッド距離を類似度として用いる場合には、類似度判定部８４３は、類似度算出部８４２が算出した類似度が所定の閾値以下の場合に、第一のオブジェクトと第二のオブジェクトが同一であると判定する。逆に、第一のオブジェクトと第二のオブジェクトが同一であるとき大きくなるような類似度、例えば、コサイン類似度を類似度として用いる場合には、類似度判定部８４３は、類似度算出部８４２が算出した類似度が所定の閾値以上の場合に、第一のオブジェクトと第二のオブジェクトが同一であると判定する。また、閾値は外部機器２００から設定するようにしてもよいし、閾値含めて機械学習で学習するようにしてもよい。 The similarity determination unit 843 determines whether or not the first object and the second object are the same based on the similarity calculated by the similarity calculation unit 842. When the similarity that becomes smaller when the first object and the second object are the same, for example, the Euclidean distance is used as the similarity, the similarity determination unit 843 uses the similarity calculated by the similarity calculation unit 842. When the degree is equal to or less than a predetermined threshold value, it is determined that the first object and the second object are the same. On the contrary, when the similarity that increases when the first object and the second object are the same, for example, the cosine similarity is used as the similarity, the similarity determination unit 843 uses the similarity calculation unit 842. When the similarity calculated by is equal to or greater than a predetermined threshold value, it is determined that the first object and the second object are the same. Further, the threshold value may be set from the external device 200, or the threshold value may be included in the learning by machine learning.

次に、実施の形態１における画像処理装置１００のハードウェア構成について説明する。画像処理装置１００の各機能は、コンピュータにより実現される。図６は、画像処理装置１００を実現するコンピュータのハードウェア構成の例を示す構成図である。
図６に示したハードウェアには、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等の処理装置１００００と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やハードディスク等の記憶装置１０００１が備えられる。
図１に示す、受信部１、復号部２、画像認識部３、記述子生成部４、データ記録制御部５、インターフェース部７、及び画像照合部８は、記憶装置１０００１に記憶されたプログラムが処理装置１００００で実行されることにより実現され、記憶部６は記憶装置１０００１により実現される。
また、画像処理装置１００の各機能を実現する方法は、上記したハードウェアとプログラムの組み合わせに限らず、処理装置にプログラムをインプリメントしたＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）のような、ハードウェア単体で実現するようにしてもよいし、一部の機能を専用のハードウェアで実現し、一部を処理装置とプログラムの組み合わせで実現するようにしてもよい。Next, the hardware configuration of the image processing apparatus 100 according to the first embodiment will be described. Each function of the image processing apparatus 100 is realized by a computer. FIG. 6 is a configuration diagram showing an example of a hardware configuration of a computer that realizes the image processing device 100.
The hardware shown in FIG. 6 includes a processing device 10000 such as a CPU (Central Processing Unit) and a storage device 10001 such as a ROM (Read Only Memory) and a hard disk.
In the receiving unit 1, the decoding unit 2, the image recognition unit 3, the descriptor generation unit 4, the data recording control unit 5, the interface unit 7, and the image collation unit 8 shown in FIG. 1, the programs stored in the storage device 10001 are stored. It is realized by being executed by the processing device 10000, and the storage unit 6 is realized by the storage device 10001.
Further, the method of realizing each function of the image processing device 100 is not limited to the combination of the hardware and the program described above, and is realized by the hardware alone such as an LSI (Large Scale Integrated Circuit) in which the program is implemented in the processing device. Alternatively, some functions may be realized by dedicated hardware, and some may be realized by a combination of a processing device and a program.

実施の形態１における画像処理装置１００、及び画像処理システム１０００は、上記のように構成される。
次に、画像処理装置１００及び画像処理システム１０００の動作について、画像記憶処理と画像照合処理に分けて説明する。ここで、画像処理装置１００の動作が画像処理方法であり、当該画像処理方法をコンピュータに実行させるプログラムが画像処理プログラムである。The image processing device 100 and the image processing system 1000 according to the first embodiment are configured as described above.
Next, the operations of the image processing device 100 and the image processing system 1000 will be described separately for the image storage process and the image collation process. Here, the operation of the image processing device 100 is an image processing method, and a program that causes a computer to execute the image processing method is an image processing program.

まず、画像記憶処理について、図７を参照しながら説明する。
図７は、実施の形態１における画像処理装置１００の画像記憶処理を示すフローチャートである。First, the image storage process will be described with reference to FIG. 7.
FIG. 7 is a flowchart showing the image storage process of the image processing device 100 according to the first embodiment.

まず、ステップＳ１において、受信部１は、ネットワークカメラＮＣ１，ＮＣ２，…，ＮＣｎから配信データを受信して画像データを分離し、画像データを復号部２に出力する。 First, in step S1, the receiving unit 1 receives the distribution data from the network cameras NC1, NC2, ..., NCn, separates the image data, and outputs the image data to the decoding unit 2.

ステップＳ２で、復号部２は、ステップＳ１で分離した画像データを復号し、画像認識部３に出力する。 In step S2, the decoding unit 2 decodes the image data separated in step S1 and outputs the image data to the image recognition unit 3.

ステップＳ３で、画像認識部３が有するオブジェクト検出部３１は、復号された画像データに現れるオブジェクトの検出を試みる。ここで、検出対象となるオブジェクトは、自動車、自転車および歩行者など、追跡対象の動くオブジェクトとする。
ステップＳ４で、オブジェクト検出部３１は、オブジェクトを検出したか否か判定を行う。オブジェクトを検出しなかった場合（ステップＳ４；ＮＯ）、フローチャートはステップＳ１の処理に戻り、一方、オブジェクトを検出した場合（ステップＳ４；ＹＥＳ）、フローチャートはステップＳ５の処理に進む。ここで、ステップＳ３とステップＳ４をまとめてオブジェクト検出工程とする。In step S3, the object detection unit 31 included in the image recognition unit 3 attempts to detect an object appearing in the decoded image data. Here, the object to be detected is a moving object to be tracked, such as a car, a bicycle, and a pedestrian.
In step S4, the object detection unit 31 determines whether or not the object has been detected. If the object is not detected (step S4; NO), the flowchart returns to the process of step S1, while if the object is detected (step S4; YES), the flowchart proceeds to the process of step S5. Here, step S3 and step S4 are collectively referred to as an object detection step.

ステップＳ５の視覚的特徴量抽出工程で、視覚的特徴量抽出部３２は、オブジェクト検出部３１がオブジェクトを検出した矩形領域の画像データを入力として、オブジェクトの視覚的特徴量を抽出する。視覚的特徴量の抽出にはＣＮＮを用いることができる。視覚的特徴量抽出部３２は、抽出したオブジェクトの視覚的特徴量をオブジェクト追跡部３４に出力する。 In the visual feature amount extraction step of step S5, the visual feature amount extraction unit 32 extracts the visual feature amount of the object by inputting the image data of the rectangular region in which the object detection unit 31 has detected the object. CNN can be used to extract the visual features. The visual feature amount extraction unit 32 outputs the visual feature amount of the extracted object to the object tracking unit 34.

ステップＳ６の物理的特徴量推定工程で、物理的特徴量推定部３３は、オブジェクト検出部３１がオブジェクトを検出した矩形領域に基づいて、オブジェクトの物理的特徴量を推定し、推定結果をオブジェクト追跡部３４に出力する。
ここで、ステップＳ５とステップＳ６の動作は同時に行うようにしてもよいし、どちらかの動作を先に行うようにしても良い。In the physical feature amount estimation step of step S6, the physical feature amount estimation unit 33 estimates the physical feature amount of the object based on the rectangular area in which the object detection unit 31 detects the object, and tracks the estimation result. Output to unit 34.
Here, the operations of step S5 and step S6 may be performed at the same time, or either operation may be performed first.

ステップＳ７で、オブジェクト追跡部３４は、オブジェクトの画像データを参照し、１つの画像フレーム内で検出された各オブジェクトに対してそれぞれ異なるＩＤを付与する。また、ステップＳ８で、オブジェクト追跡部３４は、検出された各オブジェクトについて、動き情報を抽出する。 In step S7, the object tracking unit 34 refers to the image data of the object and assigns a different ID to each object detected in one image frame. Further, in step S8, the object tracking unit 34 extracts motion information for each of the detected objects.

ステップＳ９で、オブジェクト追跡部３４は、ステップＳ５で取得したオブジェクトの視覚的特徴量、ステップＳ６で取得したオブジェクトの物理的特徴量、及びステップＳ８で抽出したオブジェクトの動き情報を参照し、オブジェクト検出部３１で検出したオブジェクトと、当該オブジェクトと時間的に連続した過去の画像フレームから検出されたオブジェクトとが、同一であるか否か判定を行う。オブジェクトが同一でないと判定した場合（ステップＳ９；ＮＯ）、ステップＳ１１の処理に進む。一方、オブジェクトが同一であると判定した場合（ステップＳ９；ＹＥＳ）、ステップＳ１０の処理に進み、オブジェクト追跡部３４は、ステップＳ７で付与したＩＤを、同一である過去のオブジェクトに付与されたＩＤで書き換える。 In step S9, the object tracking unit 34 refers to the visual feature amount of the object acquired in step S5, the physical feature amount of the object acquired in step S6, and the motion information of the object extracted in step S8, and detects the object. It is determined whether or not the object detected by the unit 31 and the object detected from the past image frames that are continuous in time with the object are the same. If it is determined that the objects are not the same (step S9; NO), the process proceeds to step S11. On the other hand, if it is determined that the objects are the same (step S9; YES), the process proceeds to step S10, and the object tracking unit 34 assigns the ID assigned in step S7 to the ID assigned to the same past object. Rewrite with.

ステップＳ１１で、オブジェクト追跡部３４は、オブジェクト検出部３１から入力された全てのオブジェクトに対して処理を行ったか否か判定を行う。全てのオブジェクトに対して処理を行っていない場合（ステップＳ１１；ＮＯ）、フローチャートはステップＳ８の処理に戻る。一方、全てのオブジェクトに対して処理を行った場合（ステップＳ１１；ＹＥＳ）、オブジェクト追跡部３４はオブジェクトのＩＤおよびオブジェクトの動き情報を記述子生成部４に出力する。 In step S11, the object tracking unit 34 determines whether or not all the objects input from the object detecting unit 31 have been processed. If no processing has been performed on all the objects (step S11; NO), the flowchart returns to the processing in step S8. On the other hand, when processing is performed on all the objects (step S11; YES), the object tracking unit 34 outputs the object ID and the object movement information to the descriptor generation unit 4.

ステップＳ１２で、記述子生成部４は、入力されたオブジェクトの視覚的特徴量、オブジェクトの物理的特徴量、ネットワークカメラＮＣの位置情報および撮像時刻、オブジェクトのＩＤおよびオブジェクトの動き情報に基づいて、特徴記述子を生成する。記述子生成部４は、生成した特徴記述子をデータ記録制御部５に出力する。 In step S12, the descriptor generation unit 4 is based on the input visual features of the object, physical features of the object, position information and imaging time of the network camera NC, ID of the object, and motion information of the object. Generate a feature descriptor. The descriptor generation unit 4 outputs the generated feature descriptor to the data recording control unit 5.

ステップＳ１３で、データ記録制御部５は、ステップＳ１２で生成された特徴記述子と、ステップＳ２で復号された画像データとを関連付けて記憶部６に格納する制御を行い、記憶部６は入力された画像データと特徴記述子とを記憶する。以上で、画像処理装置１００は画像記憶処理を終了する。 In step S13, the data recording control unit 5 controls the feature descriptor generated in step S12 and the image data decoded in step S2 to be stored in the storage unit 6, and the storage unit 6 is input. The image data and the feature descriptor are stored. With the above, the image processing apparatus 100 ends the image storage process.

次に、画像処理装置１００の画像照合処理について、図８を参照しながら説明する。
図８は、実施の形態１における画像処理装置１００の画像照合処理の動作を示すフローチャートである。Next, the image collation process of the image processing apparatus 100 will be described with reference to FIG.
FIG. 8 is a flowchart showing the operation of the image collation processing of the image processing apparatus 100 according to the first embodiment.

まず、ステップＳ２１で、画像処理装置１００のユーザーにより外部機器２００を介して検索条件が設定される。画像取得部８１は、検索条件が設定されると、当該検索条件で検索を行うことを決定する。 First, in step S21, the search condition is set by the user of the image processing device 100 via the external device 200. When the search condition is set, the image acquisition unit 81 decides to perform the search under the search condition.

ステップＳ２２で、画像取得部８１は、インターフェース部７を介して外部機器２００から設定された検索条件に基づいて、記憶部６内の検索を行う。すなわち、画像取得部８１は、オブジェクトの特徴記述子等に基づき、設定された検索条件と一致するオブジェクトを検索し、オブジェクトの絞り込みを行う。 In step S22, the image acquisition unit 81 searches the storage unit 6 based on the search conditions set from the external device 200 via the interface unit 7. That is, the image acquisition unit 81 searches for an object that matches the set search conditions based on the feature descriptor of the object and the like, and narrows down the objects.

ステップＳ２３で、画像取得部８１は、それぞれ異なる画像データに現れるオブジェクトのペアを１組以上見つけたか判定を行う。オブジェクトのペアを見つけられなかった場合（ステップＳ２３；ＮＯ）、画像処理装置１００は処理を終了する。一方、オブジェクトのペアを１組以上見つけた場合（ステップＳＴ２３；ＹＥＳ）、処理はステップＳ２４に進む。 In step S23, the image acquisition unit 81 determines whether or not one or more pairs of objects appearing in different image data have been found. If the pair of objects cannot be found (step S23; NO), the image processing apparatus 100 ends the process. On the other hand, when one or more pairs of objects are found (step ST23; YES), the process proceeds to step S24.

ステップＳ２４で、画像取得部８１は、記憶部６から検索した１組以上のオブジェクトのペアのそれぞれに対応付けられた画像データと特徴記述子を読み出し、取得する。ここでは、１組のオブジェクトのペアに関する画像データと特徴記述子とを取得した場合について説明するが、２組以上取得した場合においても、以下の処理は同様である。画像取得部８１は、読み出した画像データおよび特徴記述子を視覚的特徴量取得部８２及び物理的特徴量取得部８３に出力する。実施の形態１において、画像取得部８１は、検索条件に合致した第一の画像データ及び第一の特徴記述子と、第二の画像データ及び第二の特徴記述子を取得し、それらを視覚的特徴量取得部８２及び物理的特徴量取得部８３に出力する。ここで、ステップＳ２１からステップＳ２４までのステップをまとめて画像取得工程とする。 In step S24, the image acquisition unit 81 reads and acquires the image data and the feature descriptor associated with each of the pair of one or more objects searched from the storage unit 6. Here, the case where the image data and the feature descriptor relating to one set of object pairs are acquired will be described, but the following processing is the same even when two or more sets are acquired. The image acquisition unit 81 outputs the read image data and the feature descriptor to the visual feature amount acquisition unit 82 and the physical feature amount acquisition unit 83. In the first embodiment, the image acquisition unit 81 acquires the first image data and the first feature descriptor and the second image data and the second feature descriptor that match the search conditions, and visually displays them. Output to the target feature amount acquisition unit 82 and the physical feature amount acquisition unit 83. Here, the steps from step S21 to step S24 are collectively referred to as an image acquisition step.

ステップＳ２５の視覚的特徴量取得工程で、視覚的特徴量取得部８２は、特徴記述子から、取得した画像データにおけるオブジェクトの視覚的特徴量を取得する。
実施の形態１において、視覚的特徴量取得部８２は、第一の画像データにおける第一のオブジェクトの視覚的特徴量である第一の視覚的特徴量と、第二の画像データにおける第二のオブジェクトの視覚的特徴量である第二の視覚的特徴量とを取得する。In the visual feature amount acquisition step of step S25, the visual feature amount acquisition unit 82 acquires the visual feature amount of the object in the acquired image data from the feature descriptor.
In the first embodiment, the visual feature amount acquisition unit 82 has a first visual feature amount which is a visual feature amount of the first object in the first image data and a second visual feature amount in the second image data. Get a second visual feature, which is the visual feature of the object.

ステップＳ２６物理的特徴量取得工程で、物理的特徴量取得部８３は、特徴記述子からオブジェクトの物理的特徴量を取得する。
実施の形態１において、物理的特徴量取得部８３は、第一のオブジェクトの物理的特徴量である第一の物理的特徴量と、第二のオブジェクトの物理的特徴量である第二の物理的特徴量とを取得する。
ステップＳ２５とステップＳ２６の動作は、同時に行うようにしてもよいし、どちらかの動作を先に行うようにしてもよい。Step S26 In the physical feature acquisition step, the physical feature acquisition unit 83 acquires the physical feature of the object from the feature descriptor.
In the first embodiment, the physical feature acquisition unit 83 includes a first physical feature that is the physical feature of the first object and a second physics that is the physical feature of the second object. Get the feature quantity.
The operations of step S25 and step S26 may be performed at the same time, or either operation may be performed first.

ステップＳ２７の特徴量ベクトル取得工程で、特徴量ベクトル取得部８４１は、ステップＳ２５及びステップＳ２６で取得したオブジェクトの視覚的特徴量と物理的特徴量を学習済みの機械学習モデルに入力し、当該学習済みの機械学習モデルにオブジェクトの特徴量ベクトルを出力させることにより、特徴量ベクトルを取得する。
実施の形態１において、特徴量ベクトル取得部８４１は、学習済みの機械学習モデルの入力として、第一の視覚的特徴量と第一の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第一のオブジェクトの特徴量ベクトルである第一の特徴量ベクトルを取得し、かつ、学習済みの機械学習モデルの入力として、第二の視覚的特徴量と第二の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第二のオブジェクトの特徴量ベクトルである第二の特徴量ベクトルを取得する。ここで、第一の特徴量ベクトルと第二の特徴量ベクトルの取得はどちらを先に行ってもよく、あるいは、同時に行ってもよい。In the feature amount vector acquisition step of step S27, the feature amount vector acquisition unit 841 inputs the visual feature amount and the physical feature amount of the object acquired in steps S25 and S26 into the trained machine learning model, and the learning The feature vector is acquired by outputting the feature vector of the object to the completed machine learning model.
In the first embodiment, the feature quantity vector acquisition unit 841 inputs the first visual feature quantity and the first physical feature quantity as the input of the trained machine learning model, and the trained machine learning model. As the output of, the first feature vector, which is the feature vector of the first object, is acquired, and as the input of the trained machine learning model, the second visual feature and the second physical feature are obtained. The quantity and the quantity are input, and the second feature quantity vector, which is the feature quantity vector of the second object, is acquired as the output of the trained machine learning model. Here, either of the first feature quantity vector and the second feature quantity vector may be acquired first, or they may be acquired at the same time.

ステップＳ２８の類似度算出工程で、類似度算出部８４２は、ステップＳ２５で抽出された特徴量ベクトル間の類似度を算出する。具体的には、類似度算出部８４２は、第一の特徴量ベクトルと第二の特徴量ベクトルとの類似度を算出する。実施の形態１では、類似度算出部８４２は、類似度としてユークリッド距離を算出する。 In the similarity calculation step of step S28, the similarity calculation unit 842 calculates the similarity between the feature amount vectors extracted in step S25. Specifically, the similarity calculation unit 842 calculates the similarity between the first feature amount vector and the second feature amount vector. In the first embodiment, the similarity calculation unit 842 calculates the Euclidean distance as the similarity.

ステップＳ２９の類似度判定工程で、類似度判定部８４３は、類似度算出工程で算出された類似度に基づいて、オブジェクトのペアが同一であるか否か判定を行う。具体的には、類似度判定部８４３は、類似度算出部が算出した類似度が所定の閾値以下の場合に、第一の画像データに現れる第一のオブジェクトと、第二の画像データに現れる第二のオブジェクトが同一のオブジェクトであると判定する。 In the similarity determination step of step S29, the similarity determination unit 843 determines whether or not the pair of objects is the same based on the similarity calculated in the similarity calculation step. Specifically, the similarity determination unit 843 appears in the first object appearing in the first image data and in the second image data when the similarity calculated by the similarity calculation unit is equal to or less than a predetermined threshold value. It is determined that the second object is the same object.

ステップＳ３０で、類似度判定部８４３は、ステップＳ２７の判定結果を、バッファ等に格納し、インターフェース部７を介して外部機器２００に出力し、画像処理装置は処理を終了する。外部機器２００が表示機器を備える場合には、表示機器が判定結果を表示するようにしてもよい。また、ステップＳ３０において、二つのオブジェクトが同一であった場合、二つのオブジェクトのＩＤを統一し、ＩＤ統一後の特徴記述子を記憶部６に記憶させる処理を加えてもよい。 In step S30, the similarity determination unit 843 stores the determination result of step S27 in a buffer or the like, outputs the determination result to the external device 200 via the interface unit 7, and the image processing apparatus ends the process. When the external device 200 includes a display device, the display device may display the determination result. Further, in step S30, when the two objects are the same, a process may be added in which the IDs of the two objects are unified and the feature descriptor after the ID unification is stored in the storage unit 6.

以上のような動作により、実施の形態1に係る画像処理装置１００は、視覚的特徴量の次元数を合わせるために画像データをリサイズしてから視覚的特徴量を抽出するようにしているが、複数の画像データに現れるオブジェクトが同一であるか否かを判定する際に、上記視覚的特徴量だけでなく、オブジェクトの物理的特徴量も用いることにより、実際の大きさが異なるオブジェクトを同一のオブジェクトであると誤判定してしまう可能性を低減することができる。
また、実施の形態１においては、画像データをリサイズしてから視覚的特徴量を抽出するようにしたが、リサイズをしない場合においても、より一般に、視覚的特徴量だけでなく、物理的特徴量も用いることにより、実際の大きさが異なるオブジェクトを同一のオブジェクトであると誤判定してしまう可能性を低減する、すなわち、より適切に複数の画像データに現れるオブジェクトが同一か判定することができる。By the above operation, the image processing apparatus 100 according to the first embodiment resizes the image data in order to match the number of dimensions of the visual feature amount, and then extracts the visual feature amount. When determining whether or not the objects appearing in a plurality of image data are the same, not only the above visual features but also the physical features of the objects are used to make the objects having different actual sizes the same. It is possible to reduce the possibility of erroneously determining that the object is an object.
Further, in the first embodiment, the visual feature amount is extracted after resizing the image data, but more generally, not only the visual feature amount but also the physical feature amount is extracted even when the image data is not resized. By also using, it is possible to reduce the possibility of erroneously determining objects having different actual sizes as the same object, that is, more appropriately determining whether the objects appearing in a plurality of image data are the same. ..

また、実施の形態１における画像処理装置１００は、学習済みの機械学習モデルの入力として、第一の視覚的特徴量と第一の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第一のオブジェクトの特徴量ベクトルである第一の特徴量ベクトルを取得し、かつ、学習済みの機械学習モデルの入力として、第二の視覚的特徴量と第二の物理的特徴量とを入力し、学習済みの機械学習モデルの出力として、第二のオブジェクトの特徴量ベクトルである第二の特徴量ベクトルを取得する特徴量ベクトル取得部８４１を備えたので、オブジェクトの視覚的特徴量とオブジェクトの物理的特徴量とを複合的に考慮した特徴量ベクトルを得ることができる。
また、特徴量ベクトル取得部８４１は、学習済みの機械学習モデルを用いて、特徴量ベクトルを取得するようにしたので、単に視覚的特徴量と物理的特徴量のそれぞれを成分に持つベクトル、すなわち視覚的特徴量がｎ次元であり、物理的特徴量がｍ次元だった場合に、それらを単に結合した（ｎ＋ｍ）次元のベクトルをオブジェクトの比較に用いるよりも、画像処理装置１００の設計者の負担を軽減しつつ、精度よくオブジェクトの照合を行うことができる。例えば、単に視覚的特徴量と物理的特徴量とを結合し、それらを成分にもつベクトルを生成した場合には、視覚的特徴量と物理的特徴量は別の次元を持つ量であるので、結合後の二つのベクトル間の距離をどのように測れば、適切にオブジェクトの比較を行うことができるか定かではない。すなわち、単に視覚的特徴量と物理的特徴量を単に結合するだけだと、実験等を重ねて距離を適切に測ることができるルールを画像処理装置１００の設計者等が設定しなければならないが、学習済みの機械学習モデルを用いて特徴量ベクトルを算出するようにすれば、設計者が設定した距離の定義に合わせてベクトルを変換し、特徴量ベクトルを出力してくれるので、画像処理装置１００の設計者の作業負担を軽減することができる。Further, the image processing device 100 according to the first embodiment inputs the first visual feature amount and the first physical feature amount as the input of the trained machine learning model, and the trained machine learning model As an output, the first feature vector, which is the feature vector of the first object, is acquired, and as the input of the trained machine learning model, the second visual feature and the second physical feature are input. As the output of the trained machine learning model, the feature quantity vector acquisition unit 841 for acquiring the second feature quantity vector, which is the feature quantity vector of the second object, is provided. It is possible to obtain a feature vector that considers the quantity and the physical feature of the object in a complex manner.
Further, since the feature amount vector acquisition unit 841 acquires the feature amount vector by using the trained machine learning model, the vector having each of the visual feature amount and the physical feature amount as components, that is, When the visual features are n-dimensional and the physical features are m-dimensional, rather than simply using a (n + m) -dimensional vector that combines them for object comparison, the designer of the image processing device 100 Objects can be collated with high accuracy while reducing the burden. For example, if a visual feature and a physical feature are simply combined to generate a vector having them as components, the visual feature and the physical feature are quantities having different dimensions. It is unclear how to measure the distance between two vectors after joining to make a proper comparison of objects. That is, if the visual features and the physical features are simply combined, the designer of the image processing apparatus 100 or the like must set a rule that allows the distance to be measured appropriately by repeating experiments and the like. If the feature vector is calculated using the trained machine learning model, the vector is converted according to the definition of the distance set by the designer and the feature vector is output. Therefore, the image processing device The work load of 100 designers can be reduced.

また、実施の形態１における画像処理装置１００は、第一の画像データと第二の画像データとを取得する画像取得部８１と、第一の画像データから第一のオブジェクトを検出し、第一のオブジェクトを検出した第一の画像データが示す画像内の領域を第一の検出領域データとして出力するオブジェクト検出部３１と、オブジェクト検出部３１から第一の検出領域データが入力され、第一の検出領域データから第一の視覚的特徴量を抽出する視覚的特徴量抽出部３２をさらに備えたので、第一の画像データに第一視覚的特徴量があらわに含まれない場合でも、視覚的特徴量抽出部３２が第一の画像データから視覚的特徴量を抽出することにより、オブジェクトの照合を行うことができる。 Further, the image processing apparatus 100 according to the first embodiment detects the image acquisition unit 81 that acquires the first image data and the second image data, and the first object from the first image data, and first. The object detection unit 31 that outputs the area in the image indicated by the first image data that detected the object as the first detection area data, and the first detection area data are input from the object detection unit 31, and the first Since the visual feature amount extraction unit 32 that extracts the first visual feature amount from the detection area data is further provided, even if the first visual feature amount is not explicitly included in the first image data, it is visually displayed. The feature amount extraction unit 32 can collate the objects by extracting the visual feature amount from the first image data.

また、実施の形態１における画像処理装置１００は、オブジェクト検出部３１から第一の検出領域データが入力され、第一の検出領域データから第一の物理的特徴量を推定する物理的特徴量推定部３３をさらに備えたので、第一の画像データに第一の物理的特徴量があらわに含まれない場合でも、物理的特徴量推定部３３が第一の画像データから物理的特徴量を推定することにより、オブジェクトの照合を行うことができる。 Further, in the image processing apparatus 100 according to the first embodiment, the first detection area data is input from the object detection unit 31, and the first physical feature amount is estimated from the first detection area data. Since the unit 33 is further provided, the physical feature amount estimation unit 33 estimates the physical feature amount from the first image data even if the first image data does not explicitly include the first physical feature amount. By doing so, the objects can be collated.

また、実施の形態１における画像処理装置１００は、第一の画像データと、第一のオブジェクトの第一の画像データにおける特徴を示す特徴記述子であり、第一の視覚的特徴量を含む第一の特徴記述子とを対応付けて記憶する記憶部６をさらに備え、視覚的特徴量取得部８２は、記憶部６に記憶された第一の特徴記述子に含まれる第一の視覚的特徴量を取得するようにしたので、画像照合処理において、第一の画像データから第一の視覚的特徴量を抽出する処理を行わなくても良く、計算量を減らすことができる。 Further, the image processing apparatus 100 according to the first embodiment is a feature descriptor indicating features in the first image data and the first image data of the first object, and includes a first visual feature amount. A storage unit 6 that stores one feature descriptor in association with each other is further provided, and the visual feature amount acquisition unit 82 is a first visual feature included in the first feature descriptor stored in the storage unit 6. Since the amount is acquired, it is not necessary to perform the process of extracting the first visual feature amount from the first image data in the image collation process, and the calculation amount can be reduced.

また、実施の形態１における画像処理装置１００が備える記憶部６は、第一の視覚的特徴量と第一の物理的特徴量とを含む第一の特徴記述子を記憶し、物理的特徴量取得部８３は、記憶部６に記憶された第一の特徴記述子に含まれる第一の物理的特徴量を取得するようにしたので、画像照合処理において、第一の画像データから第一の物理的特徴量を推定する処理を行わなくても良く、計算量を減らすことができる。 Further, the storage unit 6 included in the image processing apparatus 100 according to the first embodiment stores a first feature descriptor including the first visual feature amount and the first physical feature amount, and stores the first physical feature amount. Since the acquisition unit 83 acquires the first physical feature amount included in the first feature descriptor stored in the storage unit 6, in the image matching process, the first image data is used as the first physical feature amount. It is not necessary to perform the process of estimating the physical features, and the amount of calculation can be reduced.

以下で、実施の形態１における画像処理装置１００の変形例について説明する。 A modification of the image processing apparatus 100 according to the first embodiment will be described below.

上記において、第一の画像データと第二の画像データがそれぞれ異なるネットワークカメラ（ネットワークカメラＮＣ１とネットワークカメラＮＣ２）で撮像された場合について、説明したが、第一の画像データと第二の画像データは同じネットワークカメラＮＣで撮像された画像データであってもよい。例えば、追跡処理のように時間的に連続したフレームに対する比較に対して画像照合処理を行っても良いし、時間的に離れたフレームに対して画像照合処理を行っても良い。 In the above, the case where the first image data and the second image data are captured by different network cameras (network camera NC1 and network camera NC2) has been described, but the first image data and the second image data have been described. May be image data captured by the same network camera NC. For example, the image matching process may be performed on the comparison of frames that are continuous in time as in the tracking process, or the image matching process may be performed on the frames that are separated in time.

画像照合処理において、画像取得部８１は、オブジェクトのペアを記憶部６から検索するようにしたが、検索条件として画像データを外部機器２００から入力し、当該画像データに現れるオブジェクトと同一のオブジェクトを検索したい場合には、オブジェクトのペアの片方として入力した画像データに現れるオブジェクトを含めるものとする。また、検索条件として画像データを入力する場合、外部機器２００から入力するのではなく、ネットワークカメラＮＣによりリアルタイムで撮像された画像データを復号部２から直接入力するようにしてもよい。
また、単に２つの画像データに現れるオブジェクトが同一か否かを判定したい場合や、事前に複数の画像データを収集している場合は、記憶部６の検索は行わず、外部機器２００から入力した画像データに現れるオブジェクト同士の照合を行うようにしてもよい。In the image collation process, the image acquisition unit 81 searches for the pair of objects from the storage unit 6, but the image data is input from the external device 200 as a search condition, and the same object as the object appearing in the image data is input. If you want to search, include the objects that appear in the image data entered as one of the object pairs. Further, when the image data is input as the search condition, the image data captured in real time by the network camera NC may be directly input from the decoding unit 2 instead of being input from the external device 200.
Further, when it is simply desired to determine whether or not the objects appearing in the two image data are the same, or when a plurality of image data are collected in advance, the storage unit 6 is not searched and the object is input from the external device 200. You may try to collate the objects appearing in the image data.

上記したように、画像取得部８１は、第一の画像データと第二の画像データを同じ方法で取得しなくてもよい。例えば、第一の画像データは外部機器２００から取得し、第二の画像データは記憶部６から取得するようにしてもよい。また、画像データの取得と同様に、視覚的特徴量の取得及び物理的特徴量の取得についても、第一の視覚的特徴量と第二の視覚的特徴量、また、第一の物理的特徴量と第二の物理的特徴量は、それぞれ同じ方法で取得する必要はない。例えば、第一の視覚的特徴量は視覚的特徴量抽出部３２から取得し、第二の視覚的特徴量は記憶部６から取得するようにしてもよい。同様に、第一の物理的特徴量は物理的特徴量推定部３３から取得し、第二の物理的特徴量は記憶部６から取得するようにしてもよい。 As described above, the image acquisition unit 81 does not have to acquire the first image data and the second image data by the same method. For example, the first image data may be acquired from the external device 200, and the second image data may be acquired from the storage unit 6. Further, as with the acquisition of image data, regarding the acquisition of visual features and the acquisition of physical features, the first visual feature, the second visual feature, and the first physical feature are also obtained. The quantity and the second physical feature need not be obtained in the same way. For example, the first visual feature amount may be acquired from the visual feature amount extraction unit 32, and the second visual feature amount may be acquired from the storage unit 6. Similarly, the first physical feature amount may be acquired from the physical feature amount estimation unit 33, and the second physical feature amount may be acquired from the storage unit 6.

本発明に係る画像処理装置は、例えば、監視システムや画像検索システムに用いられるのに適している。 The image processing apparatus according to the present invention is suitable for use in, for example, a surveillance system or an image retrieval system.

１００画像処理装置、１０００画像処理システム、１受信部、２復号部、３画像認識部、３１オブジェクト検出部、３２視覚的特徴量抽出部、３３物理的特徴量推定部、３４オブジェクト追跡部、４記述子生成部、５データ記録制御部、６記憶部、７インターフェース部、８画像照合部、８１画像取得部、８２視覚的特徴量取得部、８３物理的特徴量取得部、８４判定部、８４１特徴量ベクトル取得部、８４２類似度算出部、８４３類似度判定部、２００外部機器。 100 image processing device, 1000 image processing system, 1 receiving unit, 2 decoding unit, 3 image recognition unit, 31 object detecting unit, 32 visual feature quantity extracting unit, 33 physical feature quantity estimating unit, 34 object tracking unit, 4 Descriptor generation unit, 5 data recording control unit, 6 storage unit, 7 interface unit, 8 image collation unit, 81 image acquisition unit, 82 visual feature amount acquisition unit, 83 physical feature amount acquisition unit, 84 judgment unit, 841 Feature vector acquisition unit, 842 similarity calculation unit, 843 similarity determination unit, 200 external devices.

Claims

The first visual feature, which is the visual feature of the first object that appears in the first image data, and the second visual, which is the visual feature of the second object that appears in the second image data. A visual feature acquisition unit that acquires features,
A physical feature acquisition unit that acquires a first physical feature that is the physical feature of the first object and a second physical feature that is the physical feature of the second object. When,
Using the trained machine learning model, from the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second physical feature amount. , A determination unit for determining whether or not the first object and the second object are the same object,
Equipped with a,
The determination unit
The first visual feature amount and the first physical feature amount are input as the input of the trained machine learning model, and the output of the trained machine learning model of the first object The first feature quantity vector, which is a feature quantity vector, is acquired, and the second visual feature quantity and the second physical feature quantity are input as inputs of the trained machine learning model. As the output of the trained machine learning model, a feature quantity vector acquisition unit that acquires a second feature quantity vector that is a feature quantity vector of the second object, and a feature quantity vector acquisition unit.
A similarity calculation unit that calculates the similarity between the first feature vector and the second feature vector,
Whether or not the first object appearing in the first image data and the second object appearing in the second image data are the same object based on the similarity calculated by the similarity calculation unit. The similarity judgment unit that determines whether or not
An image processing device characterized by comprising.

The first visual feature amount is obtained from the first resizing data obtained by resizing the first detection area data, which is the data indicating the area where the first object is detected in the image indicated by the first image data. The second detection area data, which is the data indicating the area where the second object is detected in the image indicated by the second image data, is resized to the same size as the first resizing data. A visual feature amount extraction unit for extracting the second visual feature amount from the resizing data is further provided.
The first aspect of claim 1, wherein the visual feature amount acquisition unit acquires the first visual feature amount extracted from the first detection area data by the visual feature amount extraction unit. Image processing equipment.

An image acquisition unit that acquires the first image data,
An object detection unit that detects the first object from the first image data and outputs the first detection area data,
The image processing apparatus according to claim 2 , further comprising.

The object detection unit further includes a physical feature amount estimation unit in which the first detection area data is input and the first physical feature amount is estimated from the first detection area data.
The image according to claim 3, wherein the physical feature amount acquisition unit acquires the first physical feature amount estimated from the first detection area data by the physical feature amount estimation unit. Processing equipment.

The first image data is associated with a feature descriptor indicating a feature of the first object in the first image data and a first feature descriptor including the first visual feature amount. With a storage unit to memorize
Any of claims 1 to 4, wherein the visual feature amount acquisition unit acquires the first visual feature amount included in the first feature descriptor stored in the storage unit. The image processing apparatus according to paragraph 1.

The storage unit stores the first feature descriptor including the first visual feature amount and the first physical feature amount.
The image processing according to claim 5, wherein the physical feature amount acquisition unit acquires the first physical feature amount included in the first feature descriptor stored in the storage unit. apparatus.

A first network camera that captures the first object and outputs the first image data in which the first object appears.
A second network camera that captures a second object and outputs a second image data in which the second object appears,
An image acquisition unit that acquires the first image data and the second image data,
The first visual feature amount, which is the visual feature amount of the first object appearing in the first image data, and the second visual feature amount, which is the visual feature amount of the second object appearing in the second image data. A visual feature acquisition unit that acquires a visual feature,
A physical feature acquisition unit that acquires a first physical feature that is the physical feature of the first object and a second physical feature that is the physical feature of the second object. When,
Using the trained machine learning model, from the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second physical feature amount. , A determination unit for determining whether or not the first object and the second object are the same,
Equipped with a,
The determination unit
The first visual feature amount and the first physical feature amount are input as the input of the trained machine learning model, and the output of the trained machine learning model of the first object The first feature quantity vector, which is a feature quantity vector, is acquired, and the second visual feature quantity and the second physical feature quantity are input as inputs of the trained machine learning model. As the output of the trained machine learning model, a feature quantity vector acquisition unit that acquires a second feature quantity vector that is a feature quantity vector of the second object, and a feature quantity vector acquisition unit.
A similarity calculation unit that calculates the similarity between the first feature vector and the second feature vector,
Whether or not the first object appearing in the first image data and the second object appearing in the second image data are the same object based on the similarity calculated by the similarity calculation unit. The similarity judgment unit that determines whether or not
An image processing system characterized by being equipped with .

The first visual feature, which is the visual feature of the first object that appears in the first image data, and the second visual, which is the visual feature of the second object that appears in the second image data. The visual feature acquisition process for acquiring features and
Physical feature amount acquisition step for acquiring the first physical feature amount which is the physical feature amount of the first object and the second physical feature amount which is the physical feature amount of the second object. When,
Using the trained machine learning model, from the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second physical feature amount. , A determination step of determining whether the first object and the second object are the same,
Only including,
The determination step is
The first visual feature amount and the first physical feature amount are input as the input of the trained machine learning model, and the output of the trained machine learning model of the first object The first feature quantity vector, which is a feature quantity vector, is acquired, and the second visual feature quantity and the second physical feature quantity are input as inputs of the trained machine learning model. As the output of the trained machine learning model, a feature quantity vector acquisition step of acquiring a second feature quantity vector which is a feature quantity vector of the second object, and a feature quantity vector acquisition step.
A similarity calculation step for calculating the similarity between the first feature vector and the second feature vector, and
Whether or not the first object appearing in the first image data and the second object appearing in the second image data are the same object based on the similarity calculated in the similarity calculation step. The similarity determination process to determine whether
An image processing method characterized by including .

The first visual feature, which is the visual feature of the first object that appears in the first image data, and the second visual, which is the visual feature of the second object that appears in the second image data. The visual feature acquisition process for acquiring features and
Physical feature amount acquisition step for acquiring the first physical feature amount which is the physical feature amount of the first object and the second physical feature amount which is the physical feature amount of the second object. When,
Using the trained machine learning model, from the first visual feature amount, the first physical feature amount, the second visual feature amount, and the second physical feature amount. , A determination step of determining whether the first object and the second object are the same,
Is an image processing program that causes a computer to execute
The determination step is
The first visual feature amount and the first physical feature amount are input as the input of the trained machine learning model, and the output of the trained machine learning model of the first object The first feature quantity vector, which is a feature quantity vector, is acquired, and the second visual feature quantity and the second physical feature quantity are input as inputs of the trained machine learning model. As the output of the trained machine learning model, a feature quantity vector acquisition step of acquiring a second feature quantity vector which is a feature quantity vector of the second object, and a feature quantity vector acquisition step.
A similarity calculation step for calculating the similarity between the first feature vector and the second feature vector, and
Whether or not the first object appearing in the first image data and the second object appearing in the second image data are the same object based on the similarity calculated in the similarity calculation step. The similarity determination process to determine whether
An image processing program characterized by including .