JP2021196755A

JP2021196755A - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP2021196755A
Application number: JP2020101721A
Authority: JP
Inventors: 琢佐々木; Taku Sasaki; 啓太三上; Keita Mikami; 将司外山; Shoji Toyama; 哲希柴田; Tetsuki Shibata; 鮎美松本; Ayumi Matsumoto
Original assignee: Nippon Telegraph and Telephone Corp; NTT Communications Corp
Current assignee: Nippon Telegraph and Telephone Corp; NTT Communications Corp
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2021-12-27
Anticipated expiration: 2040-06-11
Also published as: JP7481171B2

Abstract

To provide an appropriate image to be used for analysis, for improving accuracy of image analysis.SOLUTION: An image processing apparatus 10 configured to process an image to be used for analysis of whether an image captured by a monitoring camera includes a desired subject or not includes a conversion section 132 which converts a background and/or a subject in the image to be used for analysis. The conversion section 132 converts, when converting a background of an image, the background of the image to a background captured by the monitoring camera or a background of the same kind as the background captured by the monitoring camera, or converts, when converting a subject of the image, a property of the subject to a property that the subject is likely to have in an area captured by the monitoring camera.SELECTED DRAWING: Figure 3

Description

本発明は、画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program.

従来、画像解析において、解析対象の画像から、オブジェクト（例えば、人物）が写る部分を切り出し、切り出した部分（切出済画像）の特徴量を抽出し、抽出した特徴量に基づき、切り出した部分の解析を行う技術がある。また、ディープニューラルネットワークによって構成されるモデルを用いて画像解析を行うことが提案されている。このモデルの学習において、多数の画像を含む公開データセットを学習用データとして用いることが多い（非特許文献１参照）。 Conventionally, in image analysis, a part in which an object (for example, a person) appears is cut out from an image to be analyzed, a feature amount of the cut out part (cut out image) is extracted, and a cutout part is cut out based on the extracted feature amount. There is a technology to analyze. It has also been proposed to perform image analysis using a model composed of a deep neural network. In training this model, a public data set containing a large number of images is often used as training data (see Non-Patent Document 1).

比戸将平, 馬場雪乃他,“データサイエンティスト養成読本機械学習入門編”, 技術評論社, 2015年9月, 15頁Shohei Hido, Yukino Baba et al., “Data Scientist Training Reader: Introduction to Machine Learning”, Gijutsu-Hyoronsha, September 2015, p. 15

しかしながら、公開データセットの多数の画像を用いてモデルの学習を行っても、モデルが所望の画像解析精度を満たすことができない場合があった。 However, even if the model is trained using a large number of images in the public data set, the model may not satisfy the desired image analysis accuracy.

本発明は、上記に鑑みてなされたものであって、画像解析の精度向上のために、解析のために用いられる適切な画像を提供することができる画像処理装置、画像処理方法及び画像処理プログラムを提供することを目的とする。 The present invention has been made in view of the above, and is an image processing apparatus, an image processing method, and an image processing program capable of providing an appropriate image used for analysis in order to improve the accuracy of image analysis. The purpose is to provide.

上述した課題を解決し、目的を達成するために、本発明の画像処理装置は、監視カメラに撮影された画像に所望の被写体が撮像されているか否かの解析のために用いられる画像を処理する画像処理装置であって、解析のために用いられる画像の背景及び／または被写体を変換する変換部を有し、変換部は、画像の背景を変換する場合、画像の背景を、監視カメラで撮影されている背景、または、監視カメラで撮影されている背景と同種の背景に変換し、画像の被写体を変換する場合、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the image processing apparatus of the present invention processes an image used for analyzing whether or not a desired subject is captured in an image captured by a surveillance camera. An image processing device that converts the background and / or subject of an image used for analysis, and the conversion unit uses a surveillance camera to convert the background of the image when converting the background of the image. When converting to a background that is being shot or a background that is similar to the background that is being shot by a surveillance camera and converting the subject of the image, the nature of the subject is likely to be possessed by the subject in the area shot by the surveillance camera. It is characterized by converting into a property.

また、本発明の画像処理方法は、監視カメラに撮影された画像に所望の被写体が撮像されているか否かの解析のために用いられる画像を処理する画像処理装置が実行する画像処理方法であって、解析のために用いられる画像の背景及び／または被写体を変換する変換工程を含み、変換工程は、画像の背景を変換する場合、画像の背景を、監視カメラで撮影されている背景、または、監視カメラで撮影されている背景と同種の背景に変換し、画像の被写体を変換する場合、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換することを特徴とする。 Further, the image processing method of the present invention is an image processing method executed by an image processing apparatus that processes an image used for analyzing whether or not a desired subject is captured in an image captured by a surveillance camera. Including a conversion step of converting the background and / or subject of the image used for analysis, the conversion step includes, when converting the background of the image, the background of the image, the background taken by the surveillance camera, or. , When converting to a background of the same type as the background shot by the surveillance camera and converting the subject of the image, it is characterized by converting the properties of the subject to the properties that the subject is likely to have in the area shot by the surveillance camera. And.

また、本発明の画像処理プログラムは、監視カメラに撮影された画像に所望の被写体が撮像されているか否かの解析のために用いられる画像の背景及び／または被写体を変換する変換ステップをコンピュータに実行させ変換ステップは、画像の背景を変換する場合、画像の背景を、監視カメラで撮影されている背景、または、監視カメラで撮影されている背景と同種の背景に変換し、画像の被写体を変換する場合、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換する。 Further, the image processing program of the present invention provides a computer with a conversion step of converting an image background and / or a subject used for analyzing whether or not a desired subject is captured in an image captured by a surveillance camera. When converting the background of the image, the conversion step is executed to convert the background of the image to the background taken by the surveillance camera or the same kind of background as the background taken by the surveillance camera, and the subject of the image is converted. When converting, the property of the subject is converted into a property that the subject is likely to have in the area photographed by the surveillance camera.

本発明によれば、画像解析の精度向上のために、解析のために用いられる適切な画像を提供することができる。 According to the present invention, it is possible to provide an appropriate image used for analysis in order to improve the accuracy of image analysis.

図１は、実施の形態に係る解析システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of the analysis system according to the embodiment. 図２は、解析装置による解析処理の内容を説明する図である。FIG. 2 is a diagram illustrating the content of analysis processing by the analysis device. 図３は、画像処理装置の構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of the configuration of the image processing device. 図４は、変換部の処理内容を説明する図である。FIG. 4 is a diagram illustrating the processing content of the conversion unit. 図５は、変換部の処理内容を説明する図である。FIG. 5 is a diagram illustrating the processing content of the conversion unit. 図６は、学習装置の構成の一例を示すブロック図である。FIG. 6 is a block diagram showing an example of the configuration of the learning device. 図７は、解析装置の構成の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of the configuration of the analysis device. 図８は、実施の形態１に係る画像処理の処理手順を示すフローチャートである。FIG. 8 is a flowchart showing a processing procedure of image processing according to the first embodiment. 図９は、実施の形態２の学習装置の概要を説明する図である。FIG. 9 is a diagram illustrating an outline of the learning device of the second embodiment. 図１０は、実施の形態２の学習装置の構成例を示す図である。FIG. 10 is a diagram showing a configuration example of the learning device of the second embodiment. 図１１は、実施の形態２における、サブオブジェクトの座標の自動付与の例を示す図である。FIG. 11 is a diagram showing an example of automatically assigning the coordinates of the sub-object in the second embodiment. 図１２は、実施の形態２における、除去済画像の作成方法の例を示す図である。FIG. 12 is a diagram showing an example of a method for creating a removed image in the second embodiment. 図１３は、実施の形態２のlocal branchがピックアップする領域の例を示す図である。FIG. 13 is a diagram showing an example of a region picked up by the local branch of the second embodiment. 図１４は、実施の形態２における解析装置の構成の一例を示すブロック図である。FIG. 14 is a block diagram showing an example of the configuration of the analysis device according to the second embodiment. 図１５は、実施の形態２の学習装置の処理手順の例を示す図である。FIG. 15 is a diagram showing an example of a processing procedure of the learning device of the second embodiment. 図１６は、実施の形態２の解析装置の処理手順の例を示す図である。FIG. 16 is a diagram showing an example of a processing procedure of the analysis device of the second embodiment. 図１７は、実施の形態２の学習装置により学習されたディープニューラルネットワークによる分析結果の例を示す図である。FIG. 17 is a diagram showing an example of an analysis result by a deep neural network learned by the learning device of the second embodiment. 図１８は、プログラムが実行されることにより、画像処理装置、学習装置及び解析装置が実現されるコンピュータの一例を示す図である。FIG. 18 is a diagram showing an example of a computer in which an image processing device, a learning device, and an analysis device are realized by executing a program.

以下に、本願に係る画像処理装置、画像処理方法及び画像処理プログラムの実施の形態を図面に基づいて詳細に説明する。また、本発明は、以下に説明する実施の形態により限定されるものではない。 Hereinafter, embodiments of an image processing apparatus, an image processing method, and an image processing program according to the present application will be described in detail with reference to the drawings. Further, the present invention is not limited to the embodiments described below.

［実施の形態１］
まず、実施の形態１について説明する。本実施の形態は、ディープニューラルネットワークによって構成されるモデルを用いて画像解析を行う解析システムに関する。モデルは、所望の被写体または所望の被写体の候補が撮像された画像における特徴量を抽出し、抽出した特徴量を用いて、画像内の被写体または被写体の候補が属する属性の推定や、被写体または被写体の候補と検出対象との照合を行うモデルである。また、本実施の形態１において、解析対象となる画像は、監視カメラに撮影された画像である。 [Embodiment 1]
First, the first embodiment will be described. The present embodiment relates to an analysis system that performs image analysis using a model configured by a deep neural network. The model extracts the feature amount in the image in which the desired subject or the candidate of the desired subject is captured, and uses the extracted feature amount to estimate the attribute to which the subject or the candidate of the subject belongs in the image, or to estimate the subject or the subject. This is a model that collates the candidate with the detection target. Further, in the first embodiment, the image to be analyzed is an image taken by a surveillance camera.

［解析システムの構成］
まず、本実施の形態１における解析システムの構成について説明する。図１は、実施の形態１における解析システムの構成の一例を示すブロック図である。 [Analysis system configuration]
First, the configuration of the analysis system according to the first embodiment will be described. FIG. 1 is a block diagram showing an example of the configuration of the analysis system according to the first embodiment.

図１に示すように、実施の形態１に係る解析システム１は、ディープニューラルネットワークによって構成されるモデルを用いて画像解析を行う解析装置３０と、解析装置３０のモデルの学習を実行する学習装置２０とを有する。そして、解析システム１は、学習装置２０の前段に、学習対象の画像を処理する画像処理装置１０を有する。 As shown in FIG. 1, the analysis system 1 according to the first embodiment is an analysis device 30 that performs image analysis using a model configured by a deep neural network, and a learning device that executes learning of the model of the analysis device 30. Has 20 and. The analysis system 1 has an image processing device 10 that processes an image to be learned in front of the learning device 20.

図２は、解析装置３０による解析処理の内容を説明する図である。解析装置３０が用いるモデルは、解析対象の画像ｘの特徴量（特徴量ベクトル）を抽出する特徴抽出モジュールと、特徴抽出モジュールが抽出した特徴量を用いて、切出済画像内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行う解析モジュールとを有する。切出済画像は、元の画像から、被写体を含む部分を切り出した画像である。 FIG. 2 is a diagram illustrating the content of analysis processing by the analysis device 30. The model used by the analysis device 30 includes a feature extraction module that extracts the feature amount (feature amount vector) of the image x to be analyzed, and a feature amount extracted by the feature extraction module, to which the subject in the cut-out image belongs. It has an analysis module that estimates attributes and collates the subject with the subject to be detected. The cut-out image is an image obtained by cutting out a portion including a subject from the original image.

具体的には、解析装置３０では、モデルにおける特徴抽出モジュールが特徴量を抽出する特徴量抽出ステップを行う（図２のステップＳ１）。続いて、モデルは、特徴抽出モジュールが抽出した特徴量を用いて、画像内のオブジェクトが属する属性を推定する属性推定ステップ（図２のステップＳ２）またはオブジェクトと検出対象のオブジェクトとを照合する照合ステップ（図２のステップＳ３）を行い、解析結果を出力する。属性は、人物の性別、年代の他、骨格や歩容も含む。また、属性は、人間のみに限らず、人間以外の動物の種別等であってもよく、また、車両、ロボット等の物体であってもよい。 Specifically, in the analysis device 30, the feature extraction module in the model performs a feature amount extraction step of extracting the feature amount (step S1 in FIG. 2). Subsequently, the model uses the feature amount extracted by the feature extraction module to perform an attribute estimation step (step S2 in FIG. 2) for estimating the attribute to which the object in the image belongs, or a collation for matching the object with the object to be detected. A step (step S3 in FIG. 2) is performed, and the analysis result is output. Attributes include the gender and age of the person, as well as the skeleton and gait. Further, the attribute is not limited to humans, and may be a type of animal other than humans, or may be an object such as a vehicle or a robot.

図２の例では、モデルは、入力された画像ｘの人物の属性を「男性」であると推定する。また、モデルは、入力された画像ｘの人物と検出対象の人物とを照合し、画像ｘの人物と検出対象の人物とは「他人」であると解析する。 In the example of FIG. 2, the model estimates that the attribute of the person in the input image x is "male". Further, the model collates the input person of the image x with the person to be detected, and analyzes that the person of the image x and the person to be detected are "others".

学習装置２０は、学習用の画像データを用いてモデルの学習を行う。学習装置２０は、画像の特徴量を抽出し、抽出した特徴量を基に画像に所望の被写体が撮像されているか否かを解析する、ディープニューラルネットワークで構成されたモデルの学習を実行する。 The learning device 20 learns a model using image data for learning. The learning device 20 extracts a feature amount of an image and analyzes whether or not a desired subject is captured in the image based on the extracted feature amount, and executes learning of a model configured by a deep neural network.

そして、画像処理装置１０は、所望の被写体が撮像されているか否かの解析のために用いられる画像を処理する。画像処理装置１０は、学習装置２０に、モデルの学習に使用する学習用の画像データ（学習用データ）を提供する。画像処理装置は、学習に使用する画像に対して、所定の画像処理を実行し、画像処理後の画像を学習装置２０に出力する。なお、画像処理装置１０は、解析装置３０に、解析対象となる解析用の画像データ（解析用画像）を提供してもよい。 Then, the image processing device 10 processes an image used for analyzing whether or not a desired subject is captured. The image processing device 10 provides the learning device 20 with image data for learning (learning data) used for learning the model. The image processing device executes predetermined image processing on the image used for learning, and outputs the image after the image processing to the learning device 20. The image processing device 10 may provide the analysis device 30 with image data for analysis (image for analysis) to be analyzed.

具体的には、画像処理装置１０は、学習装置２０がモデルの学習に使用する学習用の画像を生成する。画像処理装置１０は、公開されている画像データセットの切出済画像を取得し、これらの切出済画像から、所望の属性を有する画像を抽出する。例えば、画像処理装置１０は、解析対象の画像を撮像する監視カメラで撮像されている画像の背景、或いは、監視カメラで撮影されている背景と同種の背景を有する画像を抽出する。また、例えば、画像処理装置１０は、解析対象の画像を撮像する監視カメラで撮影される領域において被写体が有しやすい性質を有した被写体を含む画像を抽出する。なお、画像データセットは、画像ごとに、オブジェクトの属性及びオブジェクトの識別情報を含むオブジェクト情報が付与されたものである。 Specifically, the image processing device 10 generates a learning image used by the learning device 20 for learning the model. The image processing device 10 acquires cut-out images of a publicly available image data set, and extracts an image having a desired attribute from these cut-out images. For example, the image processing apparatus 10 extracts a background of an image captured by a surveillance camera that captures an image to be analyzed, or an image having a background similar to the background captured by the surveillance camera. Further, for example, the image processing device 10 extracts an image including a subject having a property that the subject tends to have in a region photographed by a surveillance camera that captures an image to be analyzed. In the image data set, object information including object attributes and object identification information is added to each image.

そして、画像処理装置１０は、抽出した画像の枚数が目的の枚数に達しない場合、公開されている画像データセットの切出済画像のうち、所望の属性以外の画像の背景または被写体の性質を変換することで所望の属性を有する画像を生成する。なお、目的の枚数は、例えば、解析を行うモデルの解析精度等に対応して設定される。 Then, when the number of extracted images does not reach the target number, the image processing device 10 determines the properties of the background or the subject of the image other than the desired attribute among the cut out images of the published image data set. The conversion produces an image with the desired attributes. The target number of sheets is set, for example, according to the analysis accuracy of the model to be analyzed.

具体的に、画像処理装置１０は、画像の背景を変換する場合、画像の背景を、監視カメラで撮影されている背景、または、監視カメラで撮影されている背景と同種の背景に変換する。画像処理装置１０は、画像の被写体を変換する場合、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換する。画像処理装置１０は、抽出または変換した画像であって、所望の属性を有する画像を、学習用データとして、学習装置２０または解析装置３０に出力する。 Specifically, when the background of the image is converted, the image processing device 10 converts the background of the image into a background taken by the surveillance camera or a background of the same type as the background taken by the surveillance camera. When converting the subject of an image, the image processing device 10 converts the property of the subject into a property that the subject is likely to have in the area photographed by the surveillance camera. The image processing device 10 outputs an extracted or converted image having a desired attribute to the learning device 20 or the analysis device 30 as learning data.

このように、本実施の形態１では、学習装置２０の前段の画像処理装置１０において、画像データセットの切出済画像から、所望の背景、または、所望の性質を有する被写体を含む画像を、抽出または生成する。このため、本実施の形態１では、学習用データの各画像の背景または被写体の性質を統一して、機械学習時に推定させる要素を減らし、本来推定すべき被写体を適切に学習させて、解析精度の向上を図る。 As described above, in the first embodiment, in the image processing device 10 in the previous stage of the learning device 20, an image including a desired background or a subject having a desired property is obtained from the cut-out image of the image data set. Extract or generate. Therefore, in the first embodiment, the nature of the background or the subject of each image of the learning data is unified, the elements to be estimated at the time of machine learning are reduced, the subject to be originally estimated is appropriately learned, and the analysis accuracy is improved. To improve.

［画像処理装置］
次に、画像処理装置１０の構成について説明する。図３は、画像処理装置１０の構成の一例を示すブロック図である。図３に示すように、画像処理装置１０は、入出力部１１、記憶部１２及び制御部１３を有する。 [Image processing device]
Next, the configuration of the image processing device 10 will be described. FIG. 3 is a block diagram showing an example of the configuration of the image processing device 10. As shown in FIG. 3, the image processing device 10 has an input / output unit 11, a storage unit 12, and a control unit 13.

入出力部１１は、情報の入力を受け付け、また、情報の出力を行う。入出力部１１は、例えば、ネットワーク等を介して接続された他の装置との間で、各種情報を送受信する通信インタフェースである。入出力部１１は、ＬＡＮ（Local Area Network）やインターネットなどの電気通信回線を介した他の装置（例えば、学習装置２０または解析装置３０）と制御部１３（後述）との間の通信を行う。また、入出力部１１は、ユーザによる入力操作に対応して、画像処理装置１０に対する各種指示情報の入力を受け付ける、マウスやキーボード等のデバイス装置である。入出力部１１は、例えば、液晶ディスプレイなどによって実現され、画像処理装置１０によって表示制御された画面が表示出力される。 The input / output unit 11 accepts the input of information and outputs the information. The input / output unit 11 is a communication interface for transmitting / receiving various information to / from another device connected via a network or the like. The input / output unit 11 communicates between another device (for example, the learning device 20 or the analysis device 30) and the control unit 13 (described later) via a telecommunication line such as a LAN (Local Area Network) or the Internet. .. Further, the input / output unit 11 is a device device such as a mouse or a keyboard that accepts input of various instruction information to the image processing device 10 in response to an input operation by the user. The input / output unit 11 is realized by, for example, a liquid crystal display, and a screen whose display is controlled by the image processing device 10 is displayed and output.

記憶部１２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子によって実現され、画像処理装置１０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部１２は、画像データ１２１、変換用データ１２２、抽出画像１２３及び変換済画像１２４を有する。 The storage unit 12 is realized by semiconductor memory elements such as RAM (Random Access Memory) and flash memory (Flash Memory), and a processing program for operating the image processing device 10 and data used during execution of the processing program can be stored. It will be remembered. The storage unit 12 has image data 121, conversion data 122, extracted image 123, and converted image 124.

画像データ１２１は、例えば、公開された画像のデータセットである。画像データ１２１は、複数の画像のデータセットであってもよい。 The image data 121 is, for example, a data set of published images. The image data 121 may be a data set of a plurality of images.

変換用データ１２２は、変換部１３２（後述）による変換処理において、画像の背景を、所望の背景または所望の背景と同種の背景に変換する際に要するデータであり、所望の背景または所望の背景と同種の背景が写る画像データ等である。そして、変換用データ１２２は、変換部１３２における変換処理において、画像の被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換する際に要するデータであり、監視カメラで撮影される領域において被写体が有しやすい性質を有した被写体が写る画像データ等である。 The conversion data 122 is data required for converting the background of an image into a desired background or a background of the same type as the desired background in the conversion process by the conversion unit 132 (described later), and is a desired background or a desired background. It is image data etc. that shows the same kind of background as. The conversion data 122 is data required for converting the property of the subject of the image into the property that the subject is likely to have in the area photographed by the surveillance camera in the conversion process of the conversion unit 132. It is image data or the like in which a subject having a property that the subject tends to have in the area to be photographed is captured.

抽出画像１２３は、抽出部１３１（後述）による抽出処理によって、画像データ１２１から抽出された画像である。抽出画像１２３は、所望の属性を有する画像である。例えば、抽出画像１２３は、監視カメラで撮像されている画像の背景、或いは、監視カメラで撮影されている背景と同種の背景を有する画像である。または、抽出画像１２３は、監視カメラで撮影される領域において被写体が有しやすい性質を有した被写体を含む画像である。 The extracted image 123 is an image extracted from the image data 121 by an extraction process by the extraction unit 131 (described later). The extracted image 123 is an image having a desired attribute. For example, the extracted image 123 is an image having a background of an image captured by a surveillance camera or a background of the same type as the background captured by the surveillance camera. Alternatively, the extracted image 123 is an image including a subject having a property that the subject tends to have in the area photographed by the surveillance camera.

変換済画像１２４は、変換部１３２による変換によって生成された画像である。変換済画像は、背景が、監視カメラで撮像されている画像の背景、或いは、監視カメラで撮影されている背景と同種の背景に変換された画像である。または、変換済画像１２４は、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換された画像である。 The converted image 124 is an image generated by conversion by the conversion unit 132. The converted image is an image in which the background is converted into the background of the image captured by the surveillance camera or the background of the same type as the background captured by the surveillance camera. Alternatively, the converted image 124 is an image obtained by converting the properties of the subject into properties that the subject tends to have in the area photographed by the surveillance camera.

制御部１３は、画像処理装置１０全体を制御する。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部１３は、抽出部１３１及び変換部１３２を有する。 The control unit 13 controls the entire image processing device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). Further, the control unit 13 has an internal memory for storing programs and control data that specify various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units by operating various programs. The control unit 13 has an extraction unit 131 and a conversion unit 132.

抽出部１３１は、画像データ１２１から、所望の属性を有する画像を抽出する。抽出部１３１は、画像処理装置１０は、解析対象の画像を撮像する監視カメラで撮像されている画像の背景、或いは、監視カメラで撮影されている背景と同種の背景を有する画像を抽出画像１２３として抽出する。 The extraction unit 131 extracts an image having a desired attribute from the image data 121. The extraction unit 131 extracts the background of the image captured by the surveillance camera that captures the image to be analyzed, or the image having the same type of background as the background captured by the surveillance camera. Extract as.

例えば、画像の背景は、エスカレータ、エレベータ等である。監視カメラがエスカレータまたはエレベータを撮影する場合、被写体の姿勢は、ほとんどの場合、直立状態である。監視カメラが撮像される領域がエスカレータまたはエレベータである場合、抽出部１３１は、骨格推定や、所定の背景が写る領域をピックアップするアテンション方式などを用いて、背景がエスカレータまたはエレベータであるとともに、直立状態の被写体が撮像された画像を抽出する。 For example, the background of the image is an escalator, an elevator, or the like. When a surveillance camera shoots an escalator or elevator, the posture of the subject is almost always upright. When the area where the surveillance camera is imaged is an escalator or an elevator, the extraction unit 131 uses skeleton estimation, an attention method for picking up an area where a predetermined background is captured, and the like, and the background is an escalator or an elevator and is upright. The image of the subject in the state is extracted.

そして、被写体の性質は、被写体の外観的性質であり、具体的には、姿勢、服装、表情、髪型、または、所持品である。抽出部１３１は、画像処理装置１０は、解析対象の画像を撮像する監視カメラで撮影される領域において被写体が有しやすい性質を有した被写体を含む画像を抽出する。例えば、監視カメラが式場に設置されている場合、被写体の服装は礼装である。監視カメラが撮像される領域が式場である場合、抽出部１３１は、礼服を着た被写体が撮像された画像を抽出する。 The nature of the subject is the appearance property of the subject, specifically, the posture, clothes, facial expressions, hairstyle, or belongings. The image processing device 10 extracts an image including a subject having a property that the subject tends to have in the area photographed by the surveillance camera that captures the image to be analyzed. For example, when a surveillance camera is installed in a wedding hall, the subject's dress is formal wear. When the area where the surveillance camera is imaged is the ceremony hall, the extraction unit 131 extracts the image of the subject wearing the formal wear.

変換部１３２は、解析のために用いられる画像の背景及び／または被写体を変換する。変換部１３２は、抽出部１３１が抽出した抽出画像１２３の枚数が目的の枚数に達しない場合、画像の背景及び／または被写体を変換した変換済画像１２４を生成する。 The conversion unit 132 converts the background and / or subject of the image used for analysis. When the number of the extracted images 123 extracted by the extraction unit 131 does not reach the target number, the conversion unit 132 generates the converted image 124 in which the background and / or the subject of the image is converted.

変換部１３２は、画像の背景を変換する場合、画像の背景を、監視カメラで撮影されている背景、または、監視カメラで撮影されている背景と同種の背景に変換する。図４は、変換部１３２の処理を説明する図である。 When converting the background of the image, the conversion unit 132 converts the background of the image into a background taken by the surveillance camera or a background of the same type as the background taken by the surveillance camera. FIG. 4 is a diagram illustrating the processing of the conversion unit 132.

例えば、監視カメラがエスカレータを撮影する場合、画像の背景は、エスカレータとなる。そして、この場合、被写体の姿勢は、直立状態である。そこで、図４に示すように、変換部１３２は、背景がエスカレータでないが直立状態の被写体Ｈ１が写る画像Ｇ１を画像データ１２１から取得する。そして、変換部１３２は、この画像Ｇ１の背景を、エスカレータＢ１が写る背景に変換した画像Ｇ１´を生成する。なお、変換部１３２は、監視カメラが撮像するエスカレートと同一のエスカレータが写る背景が変換データ中にない場合には、類似するエスカレータの背景を用いて背景の変換を行なってもよい。 For example, when a surveillance camera shoots an escalator, the background of the image is the escalator. In this case, the posture of the subject is in an upright state. Therefore, as shown in FIG. 4, the conversion unit 132 acquires an image G1 in which the subject H1 in an upright state is captured from the image data 121, although the background is not an escalator. Then, the conversion unit 132 generates an image G1 ′ in which the background of the image G1 is converted into a background in which the escalator B1 appears. If the conversion data does not include a background in which the same escalator as the escalator captured by the surveillance camera appears, the conversion unit 132 may convert the background using the background of a similar escalator.

そして、変換部１３２は、画像の被写体を変換する場合、被写体の性質を、監視カメラで撮影される領域において被写体が有しやすい性質に変換する。被写体の性質は、被写体の外観的性質であり、具体的には、姿勢、服装、表情、髪型、または、所持品である。図５は、変換部１３２の処理を説明する図である。 Then, when converting the subject of the image, the conversion unit 132 converts the property of the subject into a property that the subject is likely to have in the area photographed by the surveillance camera. The nature of the subject is the appearance property of the subject, specifically, the posture, clothes, facial expressions, hairstyle, or belongings. FIG. 5 is a diagram illustrating the processing of the conversion unit 132.

例えば、監視カメラが式場に設置されている場合、被写体の服装は礼装である。そこで、図５に示すように、変換部１３２は、平服である被写体Ｈ２が写る画像Ｇ２を画像データ１２１から取得する。そして、変換部１３２は、被写体Ｈ２の服装を、平服から礼服に変換した画像Ｇ２´を生成する。また、例えば、変換部１３２は、素顔の被写体が写る画像を画像データ１２１から取得して、被写体の顔を、素顔から、化粧を施した顔に変換した画像を生成してもよい。 For example, when a surveillance camera is installed in a wedding hall, the subject's dress is formal wear. Therefore, as shown in FIG. 5, the conversion unit 132 acquires the image G2 in which the subject H2, which is a plain dress, is captured from the image data 121. Then, the conversion unit 132 generates an image G2'that is obtained by converting the clothes of the subject H2 from plain dress to formal wear. Further, for example, the conversion unit 132 may acquire an image in which a subject with a real face is captured from the image data 121, and generate an image in which the face of the subject is converted from the real face to a face with makeup applied.

変換部１３２は、画像のオブジェクトの属性及びオブジェクトの識別情報を他の属性或いは識別情報に変換した場合には、変換内容に合わせて、変換済画像１２４のオブジェクト情報を変更する。画像処理装置１０は、抽出画像１２３及び変換済画像１２４を学習用データとして学習装置２０に出力する。 When the conversion unit 132 converts the object attribute of the image and the object identification information into another attribute or identification information, the conversion unit 132 changes the object information of the converted image 124 according to the conversion content. The image processing device 10 outputs the extracted image 123 and the converted image 124 to the learning device 20 as learning data.

［学習装置］
次に、学習装置２０の構成について説明する。図６は、学習装置２０の構成の一例を示すブロック図である。図６に示すように、学習装置２０は、入出力部２１、記憶部２２及び制御部２３を有する。 [Learning device]
Next, the configuration of the learning device 20 will be described. FIG. 6 is a block diagram showing an example of the configuration of the learning device 20. As shown in FIG. 6, the learning device 20 has an input / output unit 21, a storage unit 22, and a control unit 23.

入出力部２１は、図３に示す入出力部１１と同様の機能を有し、情報の入出力や他の装置（例えば、画像処理装置１０及び解析装置３０）との通信を行う。 The input / output unit 21 has the same function as the input / output unit 11 shown in FIG. 3, and performs input / output of information and communication with other devices (for example, the image processing device 10 and the analysis device 30).

記憶部２２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子によって実現され、学習装置２０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部２２は、画像処理装置１０が、抽出画像１２３及び変換済画像１２４を、学習用データ２２１として記憶する。この抽出画像１２３は、画像処理装置１０によって、公開データセットの画像から抽出された所望の属性を有する画像である。また、変換済画像１２４は、背景及び／または被写体の性質を所望の属性に合うように変換された画像である。また、記憶部２２は、モデル２２２を有する。 The storage unit 22 is realized by semiconductor memory elements such as RAM (Random Access Memory) and flash memory (Flash Memory), and stores a processing program for operating the learning device 20, data used during execution of the processing program, and the like. Will be done. In the storage unit 22, the image processing device 10 stores the extracted image 123 and the converted image 124 as learning data 221. The extracted image 123 is an image having desired attributes extracted from the image of the public data set by the image processing apparatus 10. Further, the converted image 124 is an image in which the properties of the background and / or the subject are converted to match the desired attributes. Further, the storage unit 22 has a model 222.

モデル２２２は、特徴抽出モジュールで画像の特徴量を抽出し、抽出した特徴量を基に画像に所望の被写体が撮像されているか否かを解析モジュールで解析するモデルである。モデル２２２は、ディープニューラルネットワークで構成される。モデル２２２は、抽出した特徴量を用いて、画像内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行う。モデル２２２の各種パラメータは、後述する学習部２３１による学習用データ２２１を用いた学習によって調整される。 The model 222 is a model in which the feature amount of the image is extracted by the feature extraction module, and whether or not a desired subject is captured in the image is analyzed by the analysis module based on the extracted feature amount. Model 222 is composed of a deep neural network. The model 222 uses the extracted features to estimate the attributes to which the subject in the image belongs and to collate the subject with the subject to be detected. Various parameters of the model 222 are adjusted by learning using the learning data 221 by the learning unit 231 described later.

制御部２３は、図３に示す制御部１３と同様の機能を有し、学習装置２０全体を制御する。制御部２３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部２３は、学習部２３１を有する。 The control unit 23 has the same function as the control unit 13 shown in FIG. 3 and controls the entire learning device 20. The control unit 23 functions as various processing units by operating various programs. The control unit 23 has a learning unit 231.

学習部２３１は、特徴抽出モジュールにおいて学習用データ２２１から抽出された特徴量に基づく画像の画像解析を学習する。 The learning unit 231 learns image analysis of an image based on the feature amount extracted from the learning data 221 in the feature extraction module.

このように、学習装置２０は、画像処理装置１０によって、学習対象の画像データ１２１から予め抽出された、所望の背景または所望の性質を有する被写体を含む抽出画像１２３、または、画像データ１２１の画像を所望の背景に変換された変換済画像１２４及び所望の被写体の性質となるように被写体の性質が変換された変換済画像１２４を用いて学習を行っている。抽出画像１２３及び変換済画像１２４は、各画像の背景または被写体の性質を統一されているため、学習装置２０では、機械学習時に推定させる要素が減り、本来推定すべき被写体を適切にモデルに学習させ、モデルの解析精度の向上を向上することができる。 As described above, the learning device 20 is an image of the extracted image 123 or the image data 121 including a subject having a desired background or desired properties, which is previously extracted from the image data 121 to be learned by the image processing device 10. The learning is performed using the converted image 124 converted into a desired background and the converted image 124 obtained by converting the properties of the subject so as to have the properties of the desired subject. Since the extracted image 123 and the converted image 124 have the same characteristics of the background or the subject of each image, the learning device 20 reduces the elements to be estimated at the time of machine learning, and appropriately learns the subject to be estimated from the model. It is possible to improve the analysis accuracy of the model.

［解析装置］
次に、解析装置３０の構成について説明する。図７は、解析装置３０の構成の一例を示すブロック図である。図７に示すように、解析装置３０は、入出力部３１、記憶部３２及び制御部３３を有する。 [Analyzer]
Next, the configuration of the analysis device 30 will be described. FIG. 7 is a block diagram showing an example of the configuration of the analysis device 30. As shown in FIG. 7, the analysis device 30 includes an input / output unit 31, a storage unit 32, and a control unit 33.

入出力部３１は、図３に示す入出力部１１と同様の機能を有し、情報の入出力や他の装置（例えば、画像処理装置１０及び学習装置２０）との通信を行う。 The input / output unit 31 has the same function as the input / output unit 11 shown in FIG. 3, and performs input / output of information and communication with other devices (for example, the image processing device 10 and the learning device 20).

記憶部３２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子によって実現され、解析装置３０を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが記憶される。記憶部３２は、解析用画像３２１、モデル３２２、及び、画像に写ったオブジェクトの分類結果或いは画像に写ったオブジェクトの属性の推定結果を示す解析結果３２３を有する。 The storage unit 32 is realized by semiconductor memory elements such as RAM (Random Access Memory) and flash memory (Flash Memory), and stores a processing program for operating the analysis device 30, data used during execution of the processing program, and the like. Will be done. The storage unit 32 has an analysis image 321 and a model 322, and an analysis result 323 showing a classification result of the object captured in the image or an estimation result of the attribute of the object captured in the image.

制御部３３は、図３に示す制御部１３と同様の機能を有し、解析装置３０全体を制御する。制御部３３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部３３は、解析部３３１を有する。 The control unit 33 has the same function as the control unit 13 shown in FIG. 3 and controls the entire analysis device 30. The control unit 33 functions as various processing units by operating various programs. The control unit 33 has an analysis unit 331.

解析部３３１は、モデル３２２を用いて、特徴抽出モジュールにおける特徴量抽出処理と、特徴抽出モジュールが抽出した特徴量を用いて、解析用画像内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行う。 Using the model 322, the analysis unit 331 uses the feature amount extraction process in the feature extraction module and the feature amount extracted by the feature extraction module to estimate the attributes to which the subject belongs in the analysis image and to estimate the subject and the detection target. Match with the subject.

このように、解析装置３０は、解析時に、各画像の背景または被写体の性質を統一された抽出画像１２３及び変換済画像１２４を学習して精度を高めたモデル３２２を用いるため、精度の高い解析を実行することができる。 As described above, since the analysis device 30 uses the model 322 which has improved the accuracy by learning the extracted image 123 and the converted image 124 in which the properties of the background or the subject of each image are unified at the time of analysis, the analysis is highly accurate. Can be executed.

［画像処理の処理手順］
次に、画像処理装置１０が実行する画像処理の処理手順について説明する。図８は、実施の形態１に係る画像処理の処理手順を示すフローチャートである。 [Image processing procedure]
Next, a processing procedure for image processing executed by the image processing device 10 will be described. FIG. 8 is a flowchart showing a processing procedure of image processing according to the first embodiment.

図８に示すように、画像処理装置１０は、画像データ１２１から、所望の属性を有する画像を抽出する（ステップＳ１１）。画像処理装置１０は、抽出した画像が目的枚数に達した場合（ステップＳ１２：Ｙｅｓ）、抽出画像１２３を学習用データとして学習装置２０に出力する（ステップＳ１５）。 As shown in FIG. 8, the image processing apparatus 10 extracts an image having a desired attribute from the image data 121 (step S11). When the number of extracted images reaches the target number (step S12: Yes), the image processing device 10 outputs the extracted images 123 as learning data to the learning device 20 (step S15).

一方、抽出した画像が目的枚数に達していない場合（ステップＳ１２：Ｎｏ）、画像処理装置１０は、画像データから、所望の属性以外の画像を抽出し（ステップＳ１３）、抽出した画像に対し、画像の背景及び／または被写体を変換する変換処理を施す（ステップＳ１４）。 On the other hand, when the number of extracted images has not reached the target number (step S12: No), the image processing apparatus 10 extracts an image other than the desired attribute from the image data (step S13), and the extracted image is referred to. A conversion process for converting the background and / or the subject of the image is performed (step S14).

そして、画像処理装置１０は、抽出した画像及び変換した画像が目的枚数に達した場合（ステップＳ１２：Ｙｅｓ）、抽出画像１２３及び変換済画像１２４を学習用データとして学習装置２０に出力する（ステップＳ１５）。また、画像処理装置１０は、抽出した画像及び変換した画像が目的枚数に達していない場合（ステップＳ１２：Ｎｏ）、抽出した画像及び変換した画像が目的枚数に達するまで、ステップＳ１３，Ｓ１４の処理を繰り返す。 Then, when the extracted image and the converted image reach the target number (step S12: Yes), the image processing device 10 outputs the extracted image 123 and the converted image 124 to the learning device 20 as learning data (step). S15). Further, when the extracted image and the converted image do not reach the target number of images (step S12: No), the image processing apparatus 10 processes the extracted images and the converted images in steps S13 and S14 until the target number of images is reached. repeat.

［実施の形態１の効果］
モデル２２２の学習用画像に、本実施の形態を適用した場合の解析精度と未適用の場合の解析精度を評価した。表１に、その評価結果を示す。Rank-1及びmAPは、いずれも０〜１００％の値を取り、値が高いほど照合精度が良好であることを示す。 [Effect of Embodiment 1]
The analysis accuracy when the present embodiment was applied to the learning image of the model 222 and the analysis accuracy when the present embodiment was not applied were evaluated. Table 1 shows the evaluation results. Rank-1 and mAP both take a value of 0 to 100%, and the higher the value, the better the collation accuracy.

表１では、画像データ１２１の画像を全て採用して学習した場合の照合精度と、画像データ１２１のうち背景がエスカレータのみとした抽出画像１２３及び変換済画像１２４とを採用して学習した場合の照合精度とを示す。表１に示すように、画像データ１２１のうち背景がエスカレータのみとした抽出画像１２３及び変換済画像１２４とを採用して学習した方が、画像データ１２１の画像を全て採用して学習した場合と比して、Rank-1及びmAPのいずれについても最も良好であった。 In Table 1, the collation accuracy when all the images of the image data 121 are adopted and learned, and the extracted image 123 and the converted image 124 whose background is only the escalator among the image data 121 are adopted and learned. Indicates the collation accuracy. As shown in Table 1, the case where the learning is performed by adopting the extracted image 123 and the converted image 124 whose background is only the escalator among the image data 121 is the case where all the images of the image data 121 are adopted and learned. In comparison, both Rank-1 and mAP were the best.

本実施の形態１では、画像処理装置１０によって、学習用データとして、画像処理装置１０によって、学習対象の画像データ１２１から予め抽出された、所望の背景または所望の性質を有する被写体を含む抽出画像１２３、または、画像データ１２１の画像を所望の背景に変換された変換済画像１２４及び所望の被写体の性質となるように被写体の性質が変換された変換済画像１２４を用いて学習を行っている。このように、本実施の形態１では、学習の際に、画像の背景または被写体の性質が統一された抽出画像１２３及び変換済画像１２４を用いる。このため、本実施の形態１によれば、機械学習時に推定させる要素が減り、本来推定すべき被写体を適切にモデルに学習させ、モデルの解析精度の向上を向上することができる。 In the first embodiment, an extracted image including a subject having a desired background or a desired property, which is previously extracted from the image data 121 to be learned by the image processing device 10 as learning data by the image processing device 10. Learning is performed using the converted image 124 in which the image of 123 or the image data 121 is converted into a desired background and the converted image 124 in which the properties of the subject are converted so as to have the properties of the desired subject. .. As described above, in the first embodiment, the extracted image 123 and the converted image 124 in which the properties of the background or the subject of the image are unified are used at the time of learning. Therefore, according to the first embodiment, the elements to be estimated at the time of machine learning are reduced, the subject to be originally estimated can be appropriately learned by the model, and the improvement of the analysis accuracy of the model can be improved.

［実施の形態２］
次に、実施の形態２について説明する。実施の形態２では、モデルとして、アテンションモデルを用いる場合について説明する。アテンションモデルとは、切出済画像から複数の領域をピックアップし、ピックアップした複数の領域ごとに特徴量を抽出し、抽出した各特徴量を統合して、画像内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行うモデルである。本実施の形態２では、画像処理装置１０が処理した抽出画像１２３及び変換済画像１２４を用いて、モデルの学習またはモデルを用いた解析を行う。 [Embodiment 2]
Next, the second embodiment will be described. In the second embodiment, a case where an attention model is used as a model will be described. The attention model is to pick up multiple areas from the clipped image, extract the feature amount for each of the picked up areas, integrate each extracted feature amount, and estimate the attribute to which the subject in the image belongs. This is a model that collates the subject with the subject to be detected. In the second embodiment, model learning or analysis using the model is performed using the extracted image 123 and the converted image 124 processed by the image processing device 10.

［概要］
まず、図１４を用いて、本実施の形態の学習装置２の概要を説明する。ここでの学習の対象は、画像解析を行うディープニューラルネットワークであるものとする。このディープニューラルネットワークは、解析対象の画像から、オブジェクトの映っている部分を切り出す切出モジュール（図１４において図示省略）と、切り出した部分の特徴量を抽出する特徴量抽出モジュールと、抽出した特徴量に基づき、切り出した部分の解析を行う解析モジュールとを備えるものとする。 [Overview]
First, the outline of the learning device 2 of the present embodiment will be described with reference to FIG. The target of learning here is assumed to be a deep neural network that performs image analysis. This deep neural network has a cutout module (not shown in FIG. 14) that cuts out the part where the object is reflected from the image to be analyzed, a feature amount extraction module that extracts the feature amount of the cut out part, and the extracted features. It shall be equipped with an analysis module that analyzes the cut out portion based on the quantity.

特徴量抽出モジュールは、画像から特徴量を抽出する複数のモジュールから構成される。このモジュールは、例えば、HA-CNN等で用いられるlocal branchである。以下、特徴量抽出モジュールを構成するモジュールはlocal branchである場合を例に説明する。この特徴量抽出モジュールは、global branchを含んでいてもよい。また、解析モジュールは、画像に映ったオブジェクトの分類を行う分類モジュールと、当該オブジェクトの属性を推定する属性推定モジュールとを備える場合を例に説明する。 The feature amount extraction module is composed of a plurality of modules for extracting feature amounts from an image. This module is, for example, a local branch used in HA-CNN and the like. Hereinafter, the case where the module constituting the feature quantity extraction module is a local branch will be described as an example. This feature extraction module may include a global branch. Further, a case where the analysis module includes a classification module for classifying objects reflected in an image and an attribute estimation module for estimating the attributes of the object will be described as an example.

学習装置は、特徴量抽出モジュールのlocal branchそれぞれに、当該local branchが担当する（ピックアップすべき）サブオブジェクトを割り当てる。このサブオブジェクトは、オブジェクトを構成するオブジェクトである。 The learning device assigns a sub-object in charge of (to be picked up) to each local branch of the feature quantity extraction module. This sub-object is the object that makes up the object.

例えば、オブジェクトが人物である場合、当該オブジェクトのサブオブジェクトは上半身や下半身等である。例えば、学習装置は、図１４の符号４０１に示すlocal branchが担当するサブオブジェクトとして人物の上半身を割り当て、符号４０２に示すlocal branchが担当するサブオブジェクトとして人物の下半身を割り当てる。 For example, when an object is a person, the sub-objects of the object are the upper body, the lower body, and the like. For example, the learning device allocates the upper body of the person as the sub-object in charge of the local branch shown by reference numeral 401 in FIG. 14, and the lower body of the person as the sub-object in charge of the local branch shown in reference numeral 402.

その後、学習装置は、特徴量抽出モジュールのlocal branchそれぞれがピックアップすべき領域の学習を行う。例えば、学習装置は、抽出画像１２３及び変換済画像１２４の１枚１枚に対して各local branchがピックアップすべきサブオブジェクトが存在する領域（local branchがピックアップすべき領域）を示した情報を用いて、local branchそれぞれがピックアップすべきサブオブジェクトの領域の学習を行う。 After that, the learning device learns the area to be picked up by each local branch of the feature quantity extraction module. For example, the learning device uses information indicating an area (area to be picked up by the local branch) in which a sub-object to be picked up by each local branch exists for each of the extracted image 123 and the converted image 124. Then, each local branch learns the area of the sub-object to be picked up.

例えば、学習装置は、図１４の符号４０１に示すlocal branchが担当するサブオブジェクトの領域と、当該local branchがピックアップした領域との間に誤差があれば、学習装置は、誤差を低減するよう当該local branchのパラメータ値の調整を行う。また、符号４０２に示すlocal branchが担当するサブオブジェクトの領域と、当該local branchがピックアップした領域との間に誤差があれば、学習装置は、誤差を低減するよう当該local branchのパラメータ値の調整を行う。このような調整を繰り返すことにより、local branchそれぞれは、自身に割り当てられたサブオブジェクトの領域を正確にピックアップできるようになる。このような調整（学習）を、説明の便宜上、特徴量抽出モジュールの直接的な反省と呼ぶ。また、学習装置は、分析モジュールによる分析精度をより向上させるためには、local branchそれぞれがどの領域をピックアップすればよいのかの学習（間接的な反省）も行う。 For example, if there is an error between the area of the sub-object in charge of the local branch shown by reference numeral 401 in FIG. 14 and the area picked up by the local branch, the learning device causes the learning device to reduce the error. Adjust the parameter value of the local branch. If there is an error between the area of the sub-object in charge of the local branch indicated by reference numeral 402 and the area picked up by the local branch, the learning device adjusts the parameter value of the local branch so as to reduce the error. I do. By repeating such adjustments, each local branch can accurately pick up the area of the sub-object assigned to it. Such adjustment (learning) is called a direct reflection of the feature quantity extraction module for convenience of explanation. In addition, the learning device also learns (indirect reflection) which area should be picked up by each local branch in order to further improve the analysis accuracy by the analysis module.

このように学習装置が、特徴量抽出モジュールの学習にあたり、上記の間接的な反省に加え、直接的な反省も行うことで、上記の間接的な反省のみで学習を行うよりも、学習に必要な画像数やエポック数を大幅に低減することができる。 In this way, when learning the feature quantity extraction module, the learning device is necessary for learning rather than learning only by the above indirect reflection by performing direct reflection in addition to the above indirect reflection. The number of images and the number of epochs can be significantly reduced.

［学習装置］
次に、図１５を用いて、学習装置の構成例を説明する。学習装置２２０は、入出力部２１と、記憶部２２と、制御部２２３とを備える。 [Learning device]
Next, a configuration example of the learning device will be described with reference to FIG. The learning device 220 includes an input / output unit 21, a storage unit 22, and a control unit 223.

記憶部２２は、入出力部２１経由で入力された抽出画像１２３及び変換済画像１２４を含む学習用データ２２１、制御部２２３による学習により得られたディープニューラルネットワークのモデル２２２２を記憶する。モデル２２２２は、例えば、上記のディープニューラルネットワークで用いられる各種モジュール（割当モジュール、特徴量抽出モジュール、分析モジュール）のパラメータ値等を示した情報である。このモデルの情報は、制御部２２３による学習処理により適宜更新される。 The storage unit 22 stores the learning data 221 including the extracted image 123 and the converted image 124 input via the input / output unit 21, and the model 2222 of the deep neural network obtained by learning by the control unit 223. The model 2222 is information showing, for example, parameter values of various modules (allocation module, feature amount extraction module, analysis module) used in the above deep neural network. The information of this model is appropriately updated by the learning process by the control unit 223.

上記の抽出画像１２３及び変換済画像１２４は、例えば、画像ごとに、当該画像においてサブオブジェクトが存在する領域（つまり、local branchがピックアップすべき領域）の情報を付与したものである。このサブオブジェクトが存在する領域の情報（例えば、座標）は、手動で付与してもよいし、自動で付与してもよい。 In the above-mentioned extracted image 123 and converted image 124, for example, information on the area where the sub-object exists in the image (that is, the area to be picked up by the local branch) is added to each image. Information (for example, coordinates) of the area where this sub-object exists may be given manually or automatically.

例えば、学習装置２２０が特徴量抽出モジュールにおいて、人物の上半身と下半身という２つのサブオブジェクトをピックアップすると定め、画像においてこれらのサブオブジェクトが存在する領域の情報（例えば、座標）を自動で付与する場合を考える。 For example, when the learning device 220 determines that the feature quantity extraction module picks up two sub-objects, the upper body and the lower body of a person, and automatically adds information (for example, coordinates) of the area where these sub-objects exist in the image. think of.

この場合、例えば、人物の全身が映った画像（図１６の符号６０１参照）と、上半身が映った画像（図１６の符号６０２参照）と、下半身が映った画像（図１６の符号６０３参照）とを予め用意しておく。 In this case, for example, an image showing the whole body of a person (see reference numeral 601 in FIG. 16), an image showing the upper body (see reference numeral 602 in FIG. 16), and an image showing the lower body (see reference numeral 603 in FIG. 16). And be prepared in advance.

そして、学習装置２２０は、これらの画像について、人物の全身が映った画像に対しては「画像の上半分が上半身で、画像の下半分が下半身」、上半身が映った画像に対しては「画像の全体が上半身で、下半身は存在せず」、下半身が映った画像に対しては「画像の全体が下半身で、上半身は存在せず」と判断する。その後、学習装置２２０は、上記の判断結果に基づき、各画像において上半身の存在する領域と下半身の存在する領域とを、例えば、矩形領域の四辺の座標で付与する。そして、学習装置２２０は、各サブオブジェクトの存在する領域の座標を付与した画像を、部分画像として記憶部１２に格納する。なお、学習装置２２０は、上半身が映った画像と下半身が映った画像とを用意する際、図１７に示すように、全身の映った画像を上下２つに分割することにより用意してもよい。 Then, the learning device 220 describes these images as "the upper half of the image is the upper half of the image and the lower half of the image is the lower half of the image" for the image showing the whole body of the person, and "for the image showing the upper half of the image". It is judged that "the whole image is the upper body and the lower body does not exist", and for the image in which the lower body is reflected, "the whole image is the lower body and the upper body does not exist". After that, based on the above determination result, the learning device 220 assigns the region where the upper body exists and the region where the lower body exists in each image, for example, with the coordinates of the four sides of the rectangular region. Then, the learning device 220 stores the image to which the coordinates of the area where each sub-object exists are given as a partial image in the storage unit 12. When preparing an image showing the upper body and an image showing the lower body, the learning device 220 may be prepared by dividing the image showing the whole body into two upper and lower parts as shown in FIG. ..

図１５の説明に戻る。制御部２２３は、サブオブジェクト割当部２２３１と、学習部２２３２とを備える。 Returning to the description of FIG. The control unit 223 includes a sub-object allocation unit 2231 and a learning unit 2232.

サブオブジェクト割当部２２３１は、特徴量抽出モジュールを構成するlocal branchごとに、当該local branchが担当するサブオブジェクトを割り当てる。つまり、サブオブジェクト割当部２２３１は、local branchごとに、当該local branchが、オブジェクトを構成するサブオブジェクト群のうち、どのサブオブジェクトをピックアップし、特徴量を抽出するかを割り当てる。ここで特徴量抽出モジュールにおいてピックアップするサブオブジェクトの数、種類は任意の数、種類でよい。 The sub-object allocation unit 2231 allocates the sub-object in charge of the local branch for each local branch constituting the feature amount extraction module. That is, the sub-object allocation unit 2231 assigns, for each local branch, which sub-object among the sub-object groups constituting the object is picked up and the feature amount is extracted. Here, the number and types of sub-objects to be picked up in the feature amount extraction module may be any number and type.

例えば、ディープニューラルネットワークが扱うオブジェクトが人物である場合において、特徴量抽出モジュールがピックアップするサブオブジェクトの数を２個としたとき、サブオブジェクト割当部２２３１は、１本目のlocal branchに人物の上半身を割り当て、２本目のlocal branchに人物の下半身を割り当てる。また、同様に、特徴量抽出モジュールにおいてピックアップするサブオブジェクトの数を２個とした場合、サブオブジェクト割当部２２３１は、１本目のlocal branchに人物の右半身を割り当て、２本目のlocal branchに人物の左半身を割り当ててもよい。 For example, when the object handled by the deep neural network is a person and the number of sub-objects picked up by the feature extraction module is 2, the sub-object allocation unit 2231 puts the upper body of the person in the first local branch. Assign the lower body of the person to the second local branch. Similarly, when the number of sub-objects to be picked up in the feature amount extraction module is 2, the sub-object allocation unit 2231 assigns the right half of the person to the first local branch and the person to the second local branch. You may assign the left half of the body.

さらに、特徴量抽出モジュールにおいてピックアップするサブオブジェクトの数を３個とした場合、サブオブジェクト割当部２２３１は、例えば、１本目のlocal branchに人物の顔面を割り当て、２本目のlocal branchに人物の顔面を除く上半身を割り当て、３本目のlocal branchに人物の下半身を割り当てる。 Further, when the number of sub-objects to be picked up in the feature amount extraction module is 3, the sub-object allocation unit 2231 assigns a person's face to the first local branch, for example, and assigns a person's face to the second local branch. Allocate the upper body excluding, and assign the lower body of the person to the third local branch.

学習部２２３２は、サブオブジェクト割当部２２３１により各local branchに割り当てられたサブオブジェクトの領域について、前記した間接的反省（第２の学習）に加え、部分画像を用いた直接的反省（第１の学習）を行う。 In addition to the indirect reflection (second learning) described above, the learning unit 2232 uses a partial image to directly reflect on the area of the sub-object assigned to each local branch by the sub-object allocation unit 2231 (first). (Learning).

つまり、学習部２２３２は、画像ごとに当該画像におけるサブオブジェクトの領域を示す情報を用いて、local branchそれぞれが当該local branchに割り当てられたサブオブジェクトの領域を精度よくピックアップできるようlocal branchそれぞれの学習（第１の学習）を行い、また、local branchそれぞれによりピックアップされたサブオブジェクトの特徴量を用いた画像分析の結果を用いて、当該画像分析の分析精度をより向上させるようlocal branchそれぞれの学習（第２の学習）を行う。 That is, the learning unit 2232 learns each local branch so that each local branch can accurately pick up the area of the sub-object assigned to the local branch by using the information indicating the area of the sub-object in the image for each image. (First learning) is performed, and the learning of each local branch is performed so as to further improve the analysis accuracy of the image analysis by using the result of the image analysis using the feature amount of the sub-object picked up by each local branch. (Second learning) is performed.

なお、学習部２２３２が、各local branchの直接的反省（第１の学習）を行う場合の損失関数は、例えば、以下のようなものが考えられる。 The loss function when the learning unit 2232 directly reflects on each local branch (first learning) is, for example, as follows.

例えば、各local branchがピックアップする領域の形状が矩形であり、ｉ本目のlocal branchが実際にピックアップした矩形領域の座標が（x₀,x₁,y₀,y₁）であり（図１８参照）、ｉ本目のlocal branchがピックアップすべき矩形領域の座標が以下のように与えられた場合を考える。 For example, the shape of the area picked up by each local branch is rectangular, and the coordinates of the rectangular area actually picked up by the i-th local branch are (x ₀ , x ₁ , y ₀ , y ₁ ) (see FIG. 18). ), Consider the case where the coordinates of the rectangular area to be picked up by the i-th local branch are given as follows.

この場合、学習部２２３２は、ｉ本目のlocal branchに直接伝播する損失関数として、例えば以下の式（１）を用いる。 In this case, the learning unit 2232 uses, for example, the following equation (1) as a loss function that propagates directly to the i-th local branch.

学習部２２３２は、直接的反省および間接的反省により得られた特徴量抽出モジュールのパラメータ値を用いて、記憶部１２内のモデルを更新する。 The learning unit 2232 updates the model in the storage unit 12 by using the parameter values of the feature amount extraction module obtained by the direct reflection and the indirect reflection.

［解析装置］
次に、実施の形態２における解析装置の構成について説明する。図１９は、実施の形態２における解析装置の構成の一例を示すブロック図である。解析装置３２０は、入出力部３１と、記憶部３２と、制御部２３３とを備える。 [Analyzer]
Next, the configuration of the analysis device according to the second embodiment will be described. FIG. 19 is a block diagram showing an example of the configuration of the analysis device according to the second embodiment. The analysis device 320 includes an input / output unit 31, a storage unit 32, and a control unit 233.

記憶部３２は、解析用画像３２１、学習装置２２０による学習によってパラメータが最適化されたアテンションモデルであるモデル２３２２（アテンションモデル）、及び、画像に写ったオブジェクトの分類結果或いは画像に写ったオブジェクトの属性の推定結果を示す解析結果２３２３を有する。 The storage unit 32 includes an image 321 for analysis, a model 2322 (attention model) which is an attention model whose parameters are optimized by learning by a learning device 220, and a classification result of an object captured in the image or an object captured in the image. It has an analysis result 2323 showing the estimation result of the attribute.

制御部２３３は、図８に示す制御部３３と同様の機能を有し、解析装置２３０全体を制御する。制御部２３３は、各種のプログラムが動作することにより各種の処理部として機能する。制御部２３３は、サブオブジェクト割当部２３３１及び解析部２３３２を有する。 The control unit 233 has the same function as the control unit 33 shown in FIG. 8 and controls the entire analysis device 230. The control unit 233 functions as various processing units by operating various programs. The control unit 233 has a sub-object allocation unit 2331 and an analysis unit 2332.

サブオブジェクト割当部２３３１は、解析用画像３２１から、モデル２３２２の各モジュールに、対応する領域を割り当てる。ピックアップする領域、及び、ピックアップした領域の各モジュールへの割り当ては、学習装置２２０における学習によってそれぞれ最適化されている。 The sub-object allocation unit 2331 allocates a corresponding area to each module of the model 2322 from the analysis image 321. The area to be picked up and the allocation of the area to be picked up to each module are optimized by learning in the learning device 220, respectively.

解析部２３３２は、モデル２３２２を用いて、各モジュールにおける領域ごとの特徴量を抽出し、各モジュールが抽出した特徴量を用いて、解析用画像３２１内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行う。 The analysis unit 2332 uses the model 2322 to extract the feature amount for each region in each module, and uses the feature amount extracted by each module to estimate the attribute to which the subject belongs in the analysis image 321 and detect the subject. Match with the target subject.

［学習処理の処理手順］
図１５を用いて、上記の学習装置２２０の処理手順の例を説明する。まず、学習装置２２０のサブオブジェクト割当部２２３１は、学習対象のディープニューラルネットワークの特徴量抽出モジュールにおける各local branchへのサブオブジェクトの割り当てを行う（ステップＳ２０１）。その後、学習部２２３２は、上記の特徴量抽出モジュールの各local branchの学習を行う（ステップＳ２０２）。すなわち、学習部２２３２は、分析モジュールから逆伝搬されてきた誤差を用いた各local branchの間接的反省に加え、記憶部２２の学習用データの画像を用いた各local branchの直接的反省を行う。 [Processing procedure of learning process]
An example of the processing procedure of the learning apparatus 220 will be described with reference to FIG. First, the sub-object allocation unit 2231 of the learning device 220 assigns sub-objects to each local branch in the feature amount extraction module of the deep neural network to be learned (step S201). After that, the learning unit 2232 learns each local branch of the feature quantity extraction module (step S202). That is, the learning unit 2232 performs indirect reflection of each local branch using the error back-propagated from the analysis module, and direct reflection of each local branch using the image of the learning data of the storage unit 22. ..

［解析処理の処理手順］
次に、図１６を用いて、上記の解析装置３２０の処理手順の例を説明する。まず、サブオブジェクト割当部２３３１は、解析用画像３２１から、モデル２３２２の各モジュールに対応する領域に割り当てる（ステップＳ２１１）。ピックアップする領域、及び、ピックアップした領域の各モジュールへの割り当ては、学習装置２２０における学習によってそれぞれ最適化されている。 [Processing procedure for analysis processing]
Next, an example of the processing procedure of the above-mentioned analysis device 320 will be described with reference to FIG. First, the sub-object allocation unit 2331 allocates the analysis image 321 to the area corresponding to each module of the model 2322 (step S211). The area to be picked up and the allocation of the area to be picked up to each module are optimized by learning in the learning device 220, respectively.

解析部２３３２は、解析処理として、モデル２３２２を用いて、各モジュールにおける領域ごとの特徴量を抽出し、各モジュールが抽出した特徴量を用いて、解析用画像３２１内の被写体が属する属性の推定や被写体と検出対象の被写体との照合を行う（ステップＳ２１２）。 As an analysis process, the analysis unit 2332 uses the model 2322 to extract the feature amount for each region in each module, and uses the feature amount extracted by each module to estimate the attribute to which the subject in the analysis image 321 belongs. Or the subject and the subject to be detected are collated (step S212).

［実施の形態２の効果］
上記の学習装置２２０は、特徴量抽出モジュールのlocal branchそれぞれがピックアップすべき領域を所与のものとし、さらにその誤差を損失関数として計上して、直接的な反省も行う。これにより、特徴量抽出モジュールのlocal branchそれぞれは、オブジェクトの映り方が不完全な画像に対しても、当該オブジェクトの狙った部位（サブオブジェクト）を正確にピックアップすることができる。その結果、特徴量抽出モジュールは各サブオブジェクトの特徴量を精度よく抽出できるので、解析モジュールが当該オブジェクトの分析（例えば、分類、属性推定、照合等）を行う際の精度を向上させることができる。 [Effect of Embodiment 2]
In the above learning device 220, the area to be picked up by each of the local branches of the feature quantity extraction module is given, and the error is recorded as a loss function, and direct reflection is also performed. As a result, each of the local branches of the feature amount extraction module can accurately pick up the target part (sub-object) of the object even for an image in which the object is incompletely reflected. As a result, since the feature amount extraction module can accurately extract the feature amount of each sub-object, it is possible to improve the accuracy when the analysis module analyzes the object (for example, classification, attribute estimation, collation, etc.). ..

例えば、監視カメラで撮影された映像に対し、映像に映った人物の自動解析を行うディープニューラルネットワークの学習に、本実施の形態２の学習装置２２０による学習を適用すれば、「迷子になった赤い服を着た5歳の女の子を探したい」、または、「この写真の犯人を捜したい」という要求があった場合に、従来は目視で扱うしかなかった「身体の一部しか映っていない画像」に対しても自動解析を行うことができる。 For example, if the learning by the learning device 220 of the second embodiment is applied to the learning of the deep neural network that automatically analyzes the person reflected in the image with respect to the image taken by the surveillance camera, "I got lost." When there is a request "I want to find a 5-year-old girl in red clothes" or "I want to find the criminal in this photo", I had to handle it visually in the past. Automatic analysis can also be performed on "images".

監視カメラで撮影された映像に対し、人物自動解析を行うディープニューラルネットワークの学習に、本実施の形態２の学習装置２２０による学習を適用した場合と、従来技術（HA-CNN）による学習を適用した場合との比較結果を図１７に示す。 When learning by the learning device 220 of the second embodiment is applied to learning of a deep neural network that automatically analyzes a person for an image taken by a surveillance camera, and learning by a conventional technique (HA-CNN) is applied. The result of comparison with the case of the above is shown in FIG.

ここでは、それぞれのディープニューラルネットワークに対し、画像の中から、「ボーダーのズボンの人物」（図１７の左側の「この人物を探せ」に示す画像の人物）に似ている上位５枚の画像を探すよう指示した。 Here, for each deep neural network, the top five images that resemble the "person in the border pants" (the person in the image shown in "Find this person" on the left side of FIG. 17) from the images. Instructed to look for.

この場合、比較例である従来技術（HA-CNN)により学習したディープニューラルネットワークは、本来「ボーダーのズボンの人物」を探すべきところ、上記の上位５枚の画像の中には「ボーダーのＴシャツの人物」や「ボーダーのワンピースの人物」が含まれている。これは、比較元の画像（図２２の「この人物を探せ」に示す画像）に、人物の下半身しか映っておらず、ディープニューラルネットワークにおいて画像上の領域と部位の紐づけに失敗したためと考えられる。 In this case, the deep neural network learned by the conventional technique (HA-CNN), which is a comparative example, should originally search for "a person with border pants", but in the above top five images, "Border T" is included. Includes "shirt person" and "border one-piece person". It is considered that this is because only the lower half of the person is shown in the comparison source image (the image shown in "Find this person" in FIG. 22), and the deep neural network fails to link the area and the part on the image. Be done.

一方で、本実施の形態２の学習装置２２０により学習したディープニューラルネットワークは、画像の中から探した上記の上位５枚の画像の中に「ボーダーのズボンの人物」のみが含まれており、「ボーダーのＴシャツの人物」や「ボーダーのワンピースの人物」が含まれていない。このことから本実施の形態２の学習装置２２０により学習したディープニューラルネットワークは、不完全な画像であっても精度よく検索できることが示された。 On the other hand, in the deep neural network learned by the learning device 220 of the second embodiment, only the "person in the border trousers" is included in the above top five images searched from the images. "Border T-shirt person" and "Border one piece person" are not included. From this, it was shown that the deep neural network learned by the learning device 220 of the second embodiment can accurately search even an incomplete image.

つまり、従来技術では、ディープニューラルネットワークの特徴量抽出モジュールにおいて各local branchがどのサブオブジェクトを担当すべきかを事前に決めていなかった。そのため、各local branchがどのサブオブジェクトを担当すべきかは、後続の分析モジュールからの間接的な反省に頼らざるを得なかった。その結果、各local branchが、不完全な画像でも精度よく特徴量を抽出できるよう学習を積むためは、学習用データを多数用意したり、長時間の学習時間を用意したりする必要があった。 That is, in the prior art, it was not decided in advance which sub-object each local branch should be in charge of in the feature quantity extraction module of the deep neural network. Therefore, which sub-object each local branch should be responsible for had to rely on indirect reflection from subsequent analysis modules. As a result, in order for each local branch to learn so that features can be extracted accurately even with incomplete images, it was necessary to prepare a large amount of training data and prepare a long learning time. ..

一方、本実施の形態２の学習装置２２０は、特徴量抽出モジュールにおける各local branchがどのサブオブジェクトを担当すべきかを事前に決めておく。これにより、学習装置２２０は、前記した各local branchの間接的な反省に加え、直接的な反省も行えるようになる。その結果、現実的に確保できる量の学習用データや学習時間により、特徴量抽出モジュールの各local branchが、不完全な画像でも精度よく特徴量を抽出できるよう学習することができる。 On the other hand, the learning device 220 of the second embodiment determines in advance which sub-object each local branch in the feature quantity extraction module should be in charge of. As a result, the learning device 220 can perform direct reflection in addition to the indirect reflection of each local branch described above. As a result, each local branch of the feature quantity extraction module can learn to accurately extract the feature quantity even if the image is incomplete, by using the amount of learning data and the learning time that can be realistically secured.

［実施の形態のシステム構成について］
上記で示した画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０の機能の分散および統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 [About the system configuration of the embodiment]
The components of the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 shown above are functional conceptual ones, and do not necessarily have to be physically configured as shown in the figure. That is, the specific form of the distribution and integration of the functions of the image processing device 10, the learning devices 20, 220 and the analysis devices 30, 230 is not limited to those shown in the illustration, and all or part of them may be used in various loads or usage conditions. Depending on the unit, it can be functionally or physically distributed or integrated.

また、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０においておこなわれる各処理は、全部または任意の一部が、ＣＰＵおよびＣＰＵにより解析実行されるプログラムにて実現されてもよい。また、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 Further, each process performed by the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 may be realized by a CPU and a program in which any part of the analysis is executed by the CPU. Further, each process performed by the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 may be realized as hardware by wired logic.

また、実施の形態１において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的に行うこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述および図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 Further, among the processes described in the first embodiment, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the process described as being performed manually can be automatically performed by a known method. In addition, the above-mentioned and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be appropriately changed unless otherwise specified.

［プログラム］
図１８は、プログラムが実行されることにより、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 [program]
FIG. 18 is a diagram showing an example of a computer in which the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 are realized by executing the program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these parts is connected by a bus 1080.

メモリ１０１０は、ＲＯＭ１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 Memory 1010 includes ROM 1011 and RAM 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected to, for example, the display 1130.

ハードディスクドライブ１０９０は、例えば、ＯＳ（Operating System）１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０の各処理を規定するプログラムは、コンピュータ１０００により実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、画像処理装置１０、学習装置２０，２２０及び解析装置３０，２３０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS (Operating System) 1091, an application program 1092, a program module 1093, and program data 1094. That is, the program that defines each processing of the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 is implemented as a program module 1093 in which a code that can be executed by the computer 1000 is described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing the same processing as the functional configuration in the image processing device 10, the learning devices 20, 220, and the analysis devices 30, 230 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施の形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 Further, the setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 and executes them as needed.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

以上、本発明者によってなされた発明を適用した実施の形態について説明したが、本実施の形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施の形態に基づいて当業者等によりなされる他の実施の形態、実施例および運用技術等は全て本発明の範疇に含まれる。 Although the embodiment to which the invention made by the present inventor is applied has been described above, the present invention is not limited by the description and the drawings which form a part of the disclosure of the present invention according to the present embodiment. That is, other embodiments, examples, operational techniques, and the like made by those skilled in the art based on the present embodiment are all included in the scope of the present invention.

１解析システム
１０画像処理装置
１１，２１，３１入出力部
１２，２２，３２記憶部
１３，２３，３３，２２３，２３３制御部
２０，２２０学習装置
３０，２３０解析装置
１２１画像データ
１２２変換用データ
１２３抽出画像
１２４変換済画像
１３１抽出部
１３２変換部
２２２，３２２，２２２２，２３２２モデル
２３１，２２３２学習部
３２１解析用画像
３２３，２３２３解析結果
３３１，２３３２解析部
２２３１，２３３１サブオブジェクト割当部 1 Analysis system 10 Image processing device 11,21,31 Input / output unit 12,22,32 Storage unit 13,23,33,223,233 Control unit 20,220 Learning device 30,230 Analysis device 121 Image data 122 Conversion data 123 Extracted image 124 Converted image 131 Extracted part 132 Converted part 222,322,2222,2232 Model 231,232 Learning part 321 Analysis image 323,2323 Analysis result 331,2332 Analysis part 2231,231 Sub-object allocation part

Claims

An image processing device that processes an image used for analyzing whether or not a desired subject is captured in an image captured by a surveillance camera.
It has a converter that converts the background and / or subject of the image used for the analysis.
When converting the background of the image, the conversion unit converts the background of the image into a background photographed by the surveillance camera or a background of the same type as the background photographed by the surveillance camera. An image processing apparatus characterized in that when converting a subject of an image, the property of the subject is converted into a property that the subject is likely to have in an area photographed by the surveillance camera.

The image processing apparatus according to claim 1, wherein the property of the subject is an appearance property.

The image processing apparatus according to claim 2, wherein the appearance property is a posture, clothes, facial expressions, hairstyles, or personal belongings.

It is an image processing method executed by an image processing device that processes an image used for analyzing whether or not a desired subject is captured in an image captured by a surveillance camera.
It comprises a conversion step of transforming the background and / or subject of the image used for the analysis.
In the conversion step, when converting the background of the image, the background of the image is converted into a background photographed by the surveillance camera or a background of the same type as the background photographed by the surveillance camera. An image processing method characterized in that when converting a subject of an image, the property of the subject is converted into a property that the subject is likely to have in an area photographed by the surveillance camera.

The computer is made to perform a conversion step of converting the background and / or the subject of the image used for analyzing whether or not the desired subject is captured in the image captured by the surveillance camera. When converting the background, the background of the image is converted to the background taken by the surveillance camera or the background of the same type as the background photographed by the surveillance camera, and the subject of the image is converted. An image processing program that converts the properties of the subject into properties that the subject is likely to have in the area photographed by the surveillance camera.