JP2022082493A

JP2022082493A - Pedestrian re-identification method for random shielding recovery based on noise channel

Info

Publication number: JP2022082493A
Application number: JP2021087114A
Authority: JP
Inventors: 黄徳双; De Shuang Huang; 張焜; Kun Zhang
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-11-23
Filing date: 2021-05-24
Publication date: 2022-06-02
Anticipated expiration: 2041-05-24
Also published as: CN112434599B; JP7136500B2; CN112434599A

Abstract

To provide a pedestrian re-identification method for random shielding recovery based on a noise channel.SOLUTION: A method comprises the steps of: carrying out data partitioning and preprocessing on a reference data set, and then, constructing a CAN network structure, and after undergoing the data partitioning and the preprocessing on the reference data set, training a basic network main body feature extraction structure by using a training set after data expansion, and obtaining the trained basic network main body feature extraction structure; constructing a noise channel structure of a label error; based on the trained basic network main body feature extraction structure, a noise channel structure, and the CAN network structure, comprehensively establishing a pedestrian re-identification network of random shielding recovery based on a noise channel; and identifying an actual original image to be detected by using the pedestrian re-identification network.SELECTED DRAWING: Figure 1

Description

本発明は、コンピュータ視覚技術分野に関し、特にノイズチャネルに基づくランダム遮蔽回復の歩行者再識別方法に関する。 The present invention relates to the field of computer visual technology, and particularly to a pedestrian reidentification method for random obstruction recovery based on noise channels.

分布式マルチカメラ監視システムの基本的タスクは、異なる位置と異なる時間に人とカメラ視界とを関連することである。それは、歩行者再識別問題と呼ばれ、更に具体的には、歩行者再識別は、主に「ターゲット歩行者がどこにいたか」又は「ターゲット歩行者が監視ネットワークにおいてキャッチされた後にどこに行ったか」という問題を解決するためである。それは、多くのキーアプリケーション、例えば長時間のマルチカメル追跡と立証捜索等をサポートする。実際には、各カメラヘッドは、異なる角度と距離から、異なる光条件、遮蔽度と異なる静的状態と動的状態の背景で撮影を行うことが可能である。それは、歩行者再識別タスクにいくつかの大きなチャレンジをもたらす。それとともに、未知の距離にあるカメラで観察された歩行者は、混雑した背景、低い解像度等の条件の制限が存在する可能性があるため、例えば顔認識のような従来のバイオメトリクスに依存する歩行者再識別技術は、実行可能でも信頼性もない。 The basic task of a distributed multi-camera surveillance system is to relate a person to the camera's field of view at different locations and at different times. It is called the pedestrian re-identification problem, and more specifically, where the pedestrian re-identification went primarily "where was the target pedestrian" or "where the target pedestrian went after being caught in the surveillance network". This is to solve the problem. It supports many key applications such as long-term multicamel tracking and proof search. In practice, each camera head is capable of shooting from different angles and distances, with different light conditions, different degrees of shielding and different static and dynamic backgrounds. It poses some major challenges to the pedestrian re-identification task. At the same time, pedestrians observed by cameras at unknown distances rely on traditional biometrics, such as face recognition, because of the potential for congested backgrounds, low resolution and other condition restrictions. Pedestrian re-identification techniques are neither feasible nor reliable.

従来の歩行者再識別技術は、主に特徴発見と類似尺度の二つの態様に分けられる。一般的な特徴は、主にカラー特徴、テクスチャ特徴、形状特徴及びより高いレベルの属性特徴、行動語意特徴等を含む。類似尺度に対して、ユークリッド距離が最初に用いられ、その後いくつかの監督のある類似性の判別方法も提案されている。 Conventional pedestrian re-identification techniques can be divided into two main modes: feature discovery and similarity scale. General features mainly include color features, texture features, shape features and higher level attribute features, behavioral meaning features, and the like. For the similarity measure, the Euclidean distance is used first, and then some supervised methods of determining similarity have been proposed.

ディープラーニングの発展に伴い、ディープラーニングモデルに基づく方法は、既に歩行者再識別の分野を占めており、歩行者再識別のための深度モデルは、現段階で主にｉｄｅｎｔｉｆｉｃａｔｉｏｎｍｏｄｅｌ、ｖｅｒｉｆｉｃａｔｉｏｎｍｏｄｅｌ及びｔｒｉｐｌｅｔｍｏｄｅｌの三種類に分けられる。Ｉｄｅｎｔｉｆｉｃａｔｉｏｎｍｏｄｅｌは、他のタスク上の分類モデルと同様であり、一枚の画像を所定してからそのラベルを出力し、このモデルは、単一画像のラベル情報を十分に活用することができる。Ｖｅｒｉｆｉｃａｔｉｏｎｍｏｄｅｌは、二枚の画像を入力として、その後それらが同じ歩行者であるか否かを入力する。Ｖｅｒｉｆｉｃａｔｉｏｎｍｏｄｅｌは、単一画像のラベル情報を使用せずに弱いラベル（二人の歩行者の関係）を使用する。同様に、ｔｒｉｐｌｅｔｍｏｄｅｌは、三枚の画像を入力として、クラス内距離を引き寄せ、クラス間距離を引き離すが、単一画像のラベル情報も使用しない。 With the development of deep learning, methods based on deep learning models have already occupied the field of pedestrian re-identification, and depth models for pedestrian re-identification are mainly identification model, verification model and triplet at this stage. It can be divided into three types: model. The Identity model is similar to the classification model on other tasks, in which one image is specified and then the label is output, and this model can fully utilize the label information of a single image. The Verification model takes two images as inputs and then inputs whether they are the same pedestrian or not. The Verification model uses a weak label (relationship between two pedestrians) without using the label information of a single image. Similarly, the triple model takes three images as inputs, pulls in-class distances, and pulls distances between classes, but does not use the label information of a single image.

特徴抽出の面で、深度モデルは、従来の人工で特徴を設計する方式を捨て、コンボリューショナルニューラルネットワークに基づいてネットワークモデルと構造モジュールを設計することで自動的に特徴を学習する。典型的なネットワーク構造は、ＧｏｏｇｌｅＮｅｔ、ＲｅｓＮｅｔとＤｅｎｓｅＮｅｔ等を有する。一般的な特徴抽出構造は、ｉｎｃｅｐｔｉｏｎ構造、特徴ピラミッド及びアテンション構造等を有する。 In terms of feature extraction, the depth model automatically learns features by designing network models and structural modules based on convolutional neural networks, abandoning the traditional artificial feature design method. Typical network structures include GoogleNet, ResNet and DenseNet and the like. A general feature extraction structure has an insertion structure, a feature pyramid, an attention structure, and the like.

この背景で、本発明は、ノイズチャネルに基づくランダム遮蔽回復のネットワークモデルを設計し、マルチスケール表徴学習は、判別力特徴（全域と局部を含む）を抽出して空間関係学習を補強することができる。ランダムバッチマスク対策は、ランダム遮蔽とアテンションメカニズムを採用し、局部詳細の特徴が抑制されるという状況を緩和する。 Against this background, the present invention designs a network model of random shading recovery based on noise channels, and multi-scale symbolic learning can reinforce spatial relational learning by extracting discriminant features (including global and local). can. Random batch mask countermeasures employ random occlusion and attention mechanism to mitigate the situation where the characteristics of local details are suppressed.

本発明の目的は、上記従来技術に存在する欠陥を克服するためのノイズチャネルに基づくランダム遮蔽回復の歩行者再識別方法を提供することである。 An object of the present invention is to provide a pedestrian re-identification method for random shielding recovery based on noise channels to overcome the defects existing in the prior art.

本発明の目的は、以下の技術的解決手段によって実現することができる。 The object of the present invention can be realized by the following technical solutions.

ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別方法であって、該方法は、
参照用データセットに対してデータ区分及び前処理を行った後、遮蔽回復のためのＣＡＮネットワーク構造を構築し、且つそれを利用して参照用データセットにおいてデータ区分及び前処理を経た後に得られるトレーニングセットに対してデータ拡充を行い、データ拡充が行われた後のトレーニングセットを利用して基礎ネットワーク主体特徴抽出構造に対してトレーニングを行い、トレーニング済みの基礎ネットワーク主体特徴抽出構造を得るステップ１と、
データ拡充によるラベル誤差を減らすためのノイズチャネル構造を構築するステップ２と、
トレーニング済みの基礎ネットワーク主体特徴抽出構造、ノイズチャネル構造及び遮蔽回復のためのＣＡＮネットワーク構造に基づき、ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別ネットワークを総合的に確立して得るステップ３と、
ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別ネットワークを利用して実際の測定対象のオリジナル画像に対して識別を行うステップ４とを含む。 A pedestrian re-identification method for random shielding recovery based on noise channels.
Obtained after performing data classification and preprocessing on the reference data set, constructing a CAN network structure for shielding recovery, and using it to perform data classification and preprocessing on the reference data set. Step 1 to expand the data for the training set, train the basic network-based feature extraction structure using the training set after the data expansion, and obtain the trained basic network-based feature extraction structure. When,
Step 2 to build a noise channel structure to reduce label error due to data expansion, and
Step 3 to comprehensively establish and obtain a pedestrian re-identification network for random shielding recovery based on noise channels based on the trained basic network-based feature extraction structure, noise channel structure and CAN network structure for shielding recovery.
It includes step 4 of identifying the original image of the actual measurement target using the pedestrian re-identification network of random shielding recovery based on the noise channel.

更に、前記ステップ１は、
参照用データセットをトレーニングセットとテストセットに区分した後、トレーニングセットからランダムに画像データを抽出し且つ前処理操作を行うステップ１０１と、
遮蔽回復のためのＣＡＮネットワーク構造を構築し且つそれを利用してトレーニングセットに対して更にデータ拡充を行うステップ１０２と、
トレーニングネットワークモデルに必要なパラメータと対応式を設定するステップ１０３と、
設定を完了した後に前処理操作とデータ拡充を経た後の画像データを基礎ネットワーク主体特徴抽出構造に入力し、トレーニング済みの基礎ネットワーク主体特徴抽出構造を得るステップ１０４を含む。 Further, the step 1 is
Step 101, in which the reference data set is divided into a training set and a test set, image data is randomly extracted from the training set, and a preprocessing operation is performed.
Step 102 to construct a CAN network structure for shield recovery and use it to further expand the data for the training set.
Step 103, which sets the parameters and correspondences required for the training network model,
Includes step 104 to input the image data after the pre-processing operation and data expansion after the setting is completed into the basic network-based feature extraction structure to obtain the trained basic network-based feature extraction structure.

更に、前記ステップ１０１における参照用データセットは、Ｍａｒｋｅｔ１５０１データセットであり、前記ステップ１０１における前処理操作は、水平反転、付加的ノイズ又はランダム消去を含み、前記ステップ１０４における基礎ネットワーク主体特徴抽出構造は、ＲｅｓＮｅｔ５０ネットワーク構造である。 Further, the reference dataset in step 101 is the Market1501 dataset, the preprocessing operation in step 101 includes horizontal inversion, additional noise or random erasure, and the basic network-based feature extraction structure in step 104 , ResNet50 network structure.

更に、前記１０４において、前処理操作とデータ拡充を経た後の画像データを基礎ネットワーク主体特徴抽出構造に入力してトレーニングを行うプロセスにおいて、Ａｄａｍ最適化手法を用いてパラメータを自動的に調整し、Ｄｒｏｐｏｕｔ対策を用いてオーバーフィッティング状況の発生を避け、ＢａｔｃｈＮｏｒｍａｌｉｚａｔｉｏｎを用いてネットワークの収束速度を上げる。 Further, in the above 104, in the process of inputting the image data after the preprocessing operation and the data expansion into the basic network-based feature extraction structure for training, the parameters are automatically adjusted by using the Adam optimization method. Avoid the occurrence of overfitting situations by using Dropout measures, and increase the convergence speed of the network by using Batch Normalization.

更に、前記ステップ１０３は、具体的には、トレーニング総サイクルｅｐｏｃｈを１５０に設定し、重み付け減衰パラメータｗｅｉｇｈｔｄｅｃａｙを０．０００５に設定し、バッチサイズｂａｔｃｈｓｉｚｅを１８０に設定し、学習率更新方式を設定することを含み、その対応する記述式は、以下の数式１であり、式において、

が学習率である。 Further, in step 103, specifically, the total training cycle epoch is set to 150, the weighted attenuation parameter weight decay is set to 0.0005, the batch size batch size is set to 180, and the learning rate update method is set. The corresponding descriptive formula, including setting, is the following formula 1, in the formula.

Is the learning rate.

更に、前記ステップ１における遮蔽回復のためのＣＡＮネットワーク構造は、オリジナルデータセットを学習し且つ画像を生成するための生成器ネットワークと、入力画像がリアルであるか否か、即ち該入力データがオリジナルデータに属するか、それとも前記生成器によって生成されるかを判定するための判別器とで構成され、対応する数学記述式は、以下の数式２であり、式において、ｘが遮蔽画像であり、ｙがターゲット画像であり、ＤとＧがそれぞれ判別器ネットワークと生成器ネットワークを表す。 Further, the CAN network structure for shielding recovery in step 1 includes a generator network for learning the original data set and generating an image, and whether or not the input image is real, that is, the input data is original. It is composed of a discriminator for determining whether it belongs to data or is generated by the generator, and the corresponding mathematical description formula is the following formula 2, in which x is a shielded image. y is the target image, and D and G represent the discriminator network and the generator network, respectively.

更に、前記ステップ２において前記ノイズチャネル構造を利用してデータ拡充によるラベル誤差を減らすプロセスは、具体的には、
生成される画像データに対応するオリジナルラベルと、前記ノイズチャネル構造を利用して観察して得られるノイズラベルとの間の移行確率に対して、分布を所定するステップ２０１と、
ＥＭアルゴリズムを利用して分布に対して暗示パラメータを求めて得て、且つそれを利用してデータ拡充によるラベル誤差を減らすステップ２０２とを含む。 Further, specifically, the process of reducing the label error due to data expansion by utilizing the noise channel structure in the step 2 is specifically described.
Step 201, which determines the distribution for the transition probability between the original label corresponding to the generated image data and the noise label obtained by observing using the noise channel structure.
It includes step 202 of finding and obtaining suggestive parameters for the distribution using an EM algorithm and using them to reduce label errors due to data expansion.

更に、前記ステップ２０１における分布は、その記述式は、以下の数式３であり、式において、

Further, the descriptive formula of the distribution in step 201 is the following formula 3, and in the formula,

更に、前記ステップ２０２においてＥＭアルゴリズムを利用して分布に対して暗示パラメータを求めて得るプロセスには、

Further, in the process of obtaining the suggestive parameter for the distribution by using the EM algorithm in the step 202, the process may be performed.

前記更新パラメータ

は、その対応する記述式は、以下の数式５であり、式において、

The update parameter

The corresponding descriptive formula is the following formula 5, and in the formula,

更に、前記ＥＭアルゴリズムにおいて採用されるターゲット関数は、その対応する記述式は、以下の数式６であり、式において、

は、ＥＭアルゴリズムに採用されるターゲット関数を表す。 Further, the target function adopted in the EM algorithm has the corresponding descriptive formula 6 below, and in the formula,

Represents the target function used in the EM algorithm.

従来技術と比べて、本発明は、以下の利点を有する。
（１）本発明は、ディープラーニング技術を用いて、まずトレーニングセット画像に対して反転、切り取り等の前処理操作を行い、その後基礎的ネットワークモデル（ＲｅｓＮｅｔ５０）を介して特徴抽出を行い、ＲｅｓＮｅｔ５０ネットワークを介して抽出して得られる高次元特徴に対してランダムバッチマスクトレーニング対策及びマルチスケール表徴学習を行い、それによってより判別力を有し、より詳細な、歩行者の空間関連性を含む特徴情報を取得し、更に多損失関数を用いてネットワークの融合共同トレーニングを行う。
（２）本発明は、回復後の遮蔽画像を用いてデータセットを拡充し、且つラベルノイズチャネルを導入し、拡充データによる誤差を緩和し、ネットワークのロバスト性を向上させる。 The present invention has the following advantages over the prior art.
(1) The present invention uses a deep learning technique to first perform preprocessing operations such as inversion and cropping on a training set image, and then perform feature extraction via a basic network model (ResNet50) to perform a ResNet50 network. Random batch mask training measures and multi-scale symbol learning are performed for high-dimensional features obtained by extracting through the above, thereby having more discriminative power and more detailed feature information including pedestrian spatial relevance. And then use the multi-loss function to perform network fusion joint training.
(2) The present invention expands the data set by using the occluded image after recovery, introduces a label noise channel, alleviates the error due to the expanded data, and improves the robustness of the network.

本発明の実施例によるノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術のネットワーク全体のフレーム図である。It is a frame diagram of the whole network of the pedestrian re-identification technique of random shield recovery based on the noise channel according to the embodiment of the present invention. 本発明の実施例によるノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術のネットワークトレーニングのフローチャートである。It is a flowchart of the network training of the pedestrian re-identification technique of the random shield recovery based on the noise channel by the Example of this invention. 本発明の実施例によるノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術の結果評価フローチャートである。It is a result evaluation flowchart of the pedestrian re-identification technique of random shield recovery based on the noise channel by the Example of this invention.

以下は、本発明の実施例における添付図面を結び付けながら、本発明の実施例における技術的解決手段を明瞭且つ完全に記述し、明らかに、記述される実施例は、本発明の一部の実施例であり、全部の実施例ではない。本発明における実施例に基づき、当業者が創造的な労力を払わない前提で得られるすべての他の実施例は、いずれも本発明の保護範囲に属する。 The following clearly and completely describes the technical solutions in the examples of the invention, linking the accompanying drawings in the embodiments of the invention, and the clearly described examples are examples of a portion of the invention. It is an example, not all examples. Based on the examples in the present invention, all other examples obtained on the premise that those skilled in the art do not make creative efforts belong to the scope of protection of the present invention.

本発明は、ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術であり、複数の参照用データセット上のより正確で効率的な歩行者再認識タスクを実現する。歩行者再認識のタスクは、重複視野がない異なるカメラによって収集される歩行者画像又はビデオサンプルの関係付けの処理プロセスであり、即ち異なる位置でのカメラによって異なる時刻に撮影される歩行者が同一の歩行者であるか否かを識別する。従来の歩行者再識別は、主に歩行者特徴発見と歩行者類似度の判別の二つのステップを含んでいる。 The present invention is a pedestrian re-identification technique for random shielding recovery based on noise channels, which provides a more accurate and efficient pedestrian re-recognition task on multiple reference datasets. The task of pedestrian re-recognition is the process of associating pedestrian images or video samples collected by different cameras with no overlapping vision, i.e. the same pedestrian being photographed by cameras at different locations at different times. Identify whether or not you are a pedestrian. Traditional pedestrian re-identification mainly involves two steps: pedestrian feature discovery and pedestrian similarity determination.

ディープラーニングに基づく歩行者再識別アルゴリズムと比べて、本発明は、ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別方法を提案する。オリジナル画像に遮蔽ブロックをランダムに追加し、ＧＡＮモデルを用いて修復し、その後修復された画像を用いてオリジナルトレーニングセットを拡張する。補強されるデータセットを用いてベースラインモデルをトレーニングし、且つノイズチャネルを介して拡張画像のラベル誤差を緩和する。 Compared to a pedestrian re-discrimination algorithm based on deep learning, the present invention proposes a pedestrian re-discrimination method for random shielding recovery based on noise channels. Shielding blocks are randomly added to the original image, repaired using the GAN model, and then the repaired image is used to extend the original training set. Train the baseline model with augmented datasets and mitigate extended image labeling errors via noise channels.

１、基本的技術的解決手段
本発明は、ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術に関し、図１に示すように、その主な実現構造は、以下の部分に依存する。
１）オリジナルデータセットに対するトレーニングセットとテストセットとの区分、
２）基礎的ネットワーク主体特徴抽出構造、
３）ノイズチャネル構造、
４）遮蔽回復のためのＣＡＮネットワーク構造、
５）反復ステップサイズ調整方法、反復ステップサイズ初期値、学習関数選択等を含むネットワークの超パラメータ調整、
６）異なる構造に対して異なる損失関数を使用する損失関数の選択、及び、
７）ＰｙＴｏｒｃｈとＰｙｔｈｏｎ及び一部のアシストライブラリに基づく全技術方法の編集。 1. Basic Technical Solution The present invention relates to a pedestrian re-identification technique for random shielding recovery based on noise channels, and as shown in FIG. 1, its main realization structure depends on the following parts.
1) Classification of training set and test set for original data set,
2) Basic network-based feature extraction structure,
3) Noise channel structure,
4) CAN network structure for shield recovery,
5) Network hyperparameter adjustment including iterative step size adjustment method, iterative step size initial value, learning function selection, etc.
6) Selection of a loss function that uses a different loss function for different structures, and
7) Editing of all technical methods based on PyTorch and Python and some assist libraries.

以上の７つのステップにおけるステップ１）は、具体的には、参照用データセットをトレーニングセットとテストセットに区分することを含む。データセットＭａｒｋｅｔ１５０１を例にし、そのうち７５１人の歩行者ＩＤ、合計１２９３６枚の画像をトレーニングセットとして、別の７５０人の歩行者ＩＤ及び一部の背景画像、合計１９７３２枚をトレーニングセットとする。 Specifically, step 1) in the above seven steps includes dividing the reference data set into a training set and a test set. Taking the data set Market1501 as an example, 751 pedestrian IDs and a total of 12936 images are used as a training set, and another 750 pedestrian IDs and some background images, a total of 19732 images, are used as a training set.

この基礎で、更にデータセット処理を行い、トレーニングセットの一部を更に分けてテストセットとすることで、トレーニングプロセスを制御し、効率的に最適な状態を得る。テストセットをｑｕｅｒｙとｇａｌｌｅｒｙの二つ部分に分ける。 Based on this basis, further data set processing is performed, and a part of the training set is further divided into a test set to control the training process and efficiently obtain the optimum state. Divide the test set into two parts, query and gallery.

クエリセット及び候補セットにおける画像に対して既にトレーニングされたネットワークを用いて特徴抽出を行い、抽出された特徴に対してそれぞれ二つずつユークリッド距離を計算して距離の順位付けを行う。候補セットにおいて、クエリセットにおけるターゲット距離に近い画像を得る。 Feature extraction is performed on the images in the query set and the candidate set using the network already trained, and two Euclidean distances are calculated for each of the extracted features to rank the distances. In the candidate set, get an image close to the target distance in the query set.

以上の７つのステップにおけるステップ２）は、具体的には、成熟し且つ性能が比較的に高いネットワークを選択して実験を行い且つ結果の探究比較を行うことを含む。ＲｅｓＮｅｔ５０ネットワーク構造を用いて、ＲｅｓＮｅｔが短絡接続によって残差に対して学習を行ってネットワーク深度が深くなることによる退化問題を解決する。 Step 2) in the above seven steps specifically includes selecting a mature and relatively high-performance network, conducting an experiment, and exploring and comparing the results. Using the ResNet50 network structure, ResNet learns from the residuals by short-circuit connection to solve the degradation problem caused by the deepening of the network depth.

以上の７つのステップにおけるステップ３）は、具体的には、生成される画像に対して、オリジナルラベルがリアルラベルであることを直接的に考えられないステップと、観察されたノイズラベルに対して、ノイズラベルとリアルラベルの前の移行確率を学習する必要があるステップと、すべてのトレーニング画像に対して、オリジナルデータのラベルがクリーンであるが、生成されるデータのラベルが雑音であると考えられるステップと、観察ラベルに対して、分布を所定し、ＥＭアルゴリズムを用いて暗示パラメータを求めるステップとを含む。 Specifically, step 3) in the above seven steps is for a step in which the original label cannot be directly considered to be a real label for the generated image and for the observed noise label. For all training images, the original data label is clean, but the generated data label is considered noise, with the steps that need to learn the transition probability before the noise label and the real label. It includes a step of determining the distribution for the observation label and finding the implied parameters using the EM algorithm.

以上の７つのステップにおけるステップ４）は、具体的には、生成対抗ネットワーク（ＧＡＮ）が二人ゼロサムゲームの考え方を採用し、それが生成ネットワークと判別ネットワークの二つの部分で構成されることを含む。ＧＡＮは、オリジナルデータセットを学習し且つ画像を生成するために用いられ、判別器ネットワークは、入力画像がリアル（オリジナルデータセット）であるか又は偽物（生成器ネットワークによって生成される）であるかを判定するために用いられる。同時に二つのネットワークをトレーニングする。目的は、判別モデルが生成される画像のリアル性を区別できないようにすることである。本発明の技術的解決手段において、条件ＧＡＮ［１５］を用いて、ターゲットを最適化する数学表現式は、以下の数式７であり、式において、ｘが遮蔽画像であり、ｙがターゲット画像であり、ＤとＧがそれぞれ判別器ネットワークと生成器ネットワークを表す。 Step 4) in the above seven steps specifically states that the Generative Adversarial Network (GAN) adopts the concept of a two-player zero-sum game, which consists of two parts, a generated network and a discriminant network. include. The GAN is used to learn the original dataset and generate the image, and the discriminator network is whether the input image is real (original dataset) or fake (generated by the generator network). Is used to determine. Train two networks at the same time. The purpose is to make it indistinguishable from the realism of the image in which the discriminant model is generated. In the technical solution of the present invention, the mathematical expression formula for optimizing the target using the condition GAN [15] is the following formula 7, in which x is a shielded image and y is a target image. Yes, D and G represent the discriminator network and the generator network, respectively.

本発明の技術的解決手段において、ＲｅｓＮｅｔ５０ネットワーク構造に対して、ＳＧＤパラメータ選択が難しいことを解決するために、Ａｄａｍ最適化手法を用いてパラメータを自動的に調整する。Ｄｒｏｐｏｕｔ対策を用いてオーバーフィッティング状況の発生を避け、ＢａｔｃｈＮｏｒｍａｌｉｚａｔｉｏｎを用いてネットワークの収束速度を上げる。 In the technical solution of the present invention, in order to solve the difficulty of selecting SGD parameters for the ResNet50 network structure, the parameters are automatically adjusted by using the Adam optimization method. Avoid the occurrence of overfitting situations by using Dropout measures, and increase the convergence speed of the network by using Batch Normalization.

そのうち、ネットワーク超パラメータの調整及び初期化は、多くの実験経験に基づき、その特徴は、トレーニング総サイクル（ｅｐｏｃｈ）を１５０に設定し、重み付け減衰パラメータ（ｗｅｉｇｈｔｄｅｃａｙ）を０．０００５に設定し、バッチサイズ（ｂａｔｃｈｓｉｚｅ）を１８０に設定し、学習率更新方式が以下の数式８であり、式において、

が学習率であることである。 Among them, the adjustment and initialization of network hyperparameters is based on a lot of experimental experience, and its features are that the total training cycle (epoch) is set to 150 and the weighted decay parameter (weight decay) is set to 0.0005. The batch size is set to 180, and the learning rate update method is the following formula 8, and in the formula,

Is the learning rate.

以上の７つのステップにおけるステップ７）は、具体的には、ＰｙＴｏｒｃｈが動的画像の形式を採用し、自分のネットワーク構築の考え方を実現しやすいことを含む。 Specifically, step 7) in the above seven steps includes that PyTorch adopts a dynamic image format and it is easy to realize the idea of building its own network.

２．実際の実施
本発明の実施例は、以下のように実現され、ノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術であり、前記技術は、以下を含む。
参照用データセットに対してデータ前処理を行ってデータ拡充を行う必要があり、以下のようないくつかのデータ処理方式を使用する。
1)データセットにおいてランダムに複数の画像を抽出して付加的ガウスノイズ処理を行う。
2)データセットにおいてランダムに複数の画像を抽出し、その上に一つの長方形の遮蔽ブロックをランダムに追加し、且つ２ｃｍから５ｃｍの領域の長さと幅をランダムに選択する。長方形がＰｅｒｓｏｎ画像を可能な限り遮蔽するように、画像を左から右へ三つの列に分け、且つ中央列においてマトリックスの中心をランダムに選択する。遮蔽ブロックのＲ、Ｇ及びＢチャネルのピクセル値は０２５５であり、且つデータセットにおける平均値である。Ｍａｒｋｅｔ－１５０１データセットにおいて、ピクセルの平均値は、８９．３、１０２．５及び９８．７であり、ＣｙｃｌｅＧＡＮによって遮蔽画像に対して回復を行う。 2. 2. Practical Implementation An embodiment of the present invention is realized as follows, and is a pedestrian re-identification technique of random shielding recovery based on a noise channel, the technique including the following.
It is necessary to perform data pre-processing on the reference data set to expand the data, and some data processing methods such as the following are used.
1) Randomly extract multiple images from the dataset and perform additional Gaussian noise processing.
2) Randomly extract multiple images in the dataset, randomly add one rectangular occlusion block on top of it, and randomly select the length and width of the area from 2 cm to 5 cm. The image is divided into three columns from left to right, and the center of the matrix is randomly selected in the center column so that the rectangle obscures the Person image as much as possible. The pixel values of the R, G and B channels of the shield block are 0255 and are average values in the dataset. In the Market-1501 dataset, the average values of the pixels are 89.3, 102.5 and 98.7, and the cycle GAN recovers the obscured image.

トレーニングデータにおいてランダムに複数枚の画像を抽出して水平反転、付加的ノイズ、ランダム消去等の処理を行う。それとともに、Ｍａｒｋｅｔ１５０１データセットにおける６つのｃａｍｅｒａに対して、異なるｃａｍｅｒａ間の画像をＣｙｃｌｅＧＡＮを用いてｃａｍｅｒａスタイルマイグレーションを行い、データセットを倍増させる。 A plurality of images are randomly extracted from the training data and processed for horizontal inversion, additional noise, random erasure, and the like. At the same time, for 6 cameras in the Market1501 dataset, images between different cameras are subjected to camera style migration using Cycle GAN to double the dataset.

データセットに対して対応する組織と上記データ処理を行った後、パラメータ及び時間面の配慮により、ＲｅｓＮｅｔ５０を基準ネットワークモデルとして使用し、画像をコンボリューショナルニューラルネットワーク（ＲｅｓＮｅｔ５０）に入力して特徴抽出を行う。Ｍａｒｋｅｔ１５０１は、データ量が比較的に大きな歩行者データセットに属するため、ＩｍａｇｅＮｅｔにおいて予めトレーニングされたネットワークモデルを用いて抽出を行う。 After performing the above data processing with the corresponding organization for the dataset, ResNet50 is used as a reference network model in consideration of parameters and time, and the image is input to the convolutional neural network (ResNet50) for feature extraction. I do. Since Market1501 belongs to a pedestrian data set with a relatively large amount of data, extraction is performed using a network model pre-trained in ImageNet.

ネットワークトレーニング全体に対して、ｉｄｅｎｔｉｆｉｃａｔｉｏｎｌｏｓｓとｒａｎｋｅｄｌｉｓｔｌｏｓｓを融合させる方式で共同トレーニングを行い、モデル全体は、三つのブランチの特徴学習構造を含む。各ブランチ特徴によって画像の特徴図を抽出して得て、その後共同の損失によってネットワークトレーニング、重み付け更新を行う。 Collaborative training is conducted for the entire network training in a manner that fuses the identity fixation loss and the ranked list loss, and the entire model contains the feature learning structure of three branches. The feature diagram of the image is extracted and obtained by each branch feature, and then network training and weighting update are performed by the joint loss.

ラベルノイズチャネルに対して、生成される画像に対して、オリジナルラベルがリアルラベルであることを直接に考えられない。観察されたノイズラベルに対して、ノイズラベルとリアルラベルの前の移行確率を学習する必要があり、オリジナルデータのラベルがクリーンであるが、生成されるデータのラベルがノイズであると考えられる。観察ラベルに対して、以下の分布（数９）を定義する。 For the label noise channel, it is not directly possible that the original label is a real label for the generated image. For the observed noise label, it is necessary to learn the transition probability before the noise label and the real label, and the label of the original data is clean, but the label of the generated data is considered to be noise. The following distribution (Equation 9) is defined for the observation label.

式において、

In the formula

分布を所定し、ＥＭアルゴリズムによって暗示パラメータを計算し、Ｅステップで、パラメータを固定し且つ移行確率を予測する。 The distribution is defined, the implied parameters are calculated by the EM algorithm, and in the E step, the parameters are fixed and the transition probability is predicted.

式において、

In the formula

Ｍステップで、パラメータを更新する。 Update the parameters in M step.

最後に、ターゲット関数は、以下の数式１２として表示することができ、式において、

は、ＥＭアルゴリズムにおいて採用されるターゲット関数を表す。 Finally, the target function can be displayed as Equation 12 below, in the equation:

Represents the target function adopted in the EM algorithm.

本発明は、Ｍａｒｋｅｔ－１５０１データセットにおいて現段階で最も良い識別結果を達成し、Ｍａｒｋｅｔ－１５０１データセットにおける結果が表１に示される。 The present invention has achieved the best identification results at this stage in the Market-1501 dataset, and the results in the Market-1501 dataset are shown in Table 1.

図３に示すように、評価計算によって、本発明によって提案されるノイズチャネルに基づくランダム遮蔽回復の歩行者再識別技術は、Ｍａｒｋｅｔ１５０１データセット（ｒｅ－ｒａｎｋｉｎｇを使用せず）においてｍＡＰが７０．１であり、ｒａｎｋ１が８６．６であり、ｒａｎｋ５が９４．６である。それとともに、他のデータセットにおいてよい実験効果も取得した。 As shown in FIG. 3, by evaluation calculation, the noise channel-based random shielding recovery pedestrian reidentification technique proposed by the present invention has a mAP of 70.1 in the Market1501 dataset (without re-ranking). The rank1 is 86.6 and the rank5 is 94.6. At the same time, good experimental effects were obtained in other datasets.

以上に記載しているのは、本発明の具体的な実施形態に過ぎないが、本発明の保護範囲は、これに限定されるものではなく、当業者であれば、本発明によって掲示された技術的範囲内において、様々な等価な修正又は置換を容易に想到でき、これらの修正又は置換は、いずれも本発明の保護範囲内に含まれるべきである。従って、本発明の保護範囲は、請求項の保護範囲に準ずるものとする。 Although the above description is only a specific embodiment of the present invention, the scope of protection of the present invention is not limited thereto, and those skilled in the art can post it by the present invention. Within the technical scope, various equivalent modifications or substitutions can be easily conceived, all of which should be included within the scope of the invention. Therefore, the scope of protection of the present invention shall be in accordance with the scope of protection of the claims.

Claims

A pedestrian re-identification method for random shielding recovery based on noise channels.
Obtained after performing data classification and preprocessing on the reference data set, constructing a CAN network structure for shielding recovery, and using it to perform data classification and preprocessing on the reference data set. Step 1 to expand the data for the training set, train the basic network-based feature extraction structure using the training set after the data expansion, and obtain the trained basic network-based feature extraction structure. When,
Step 2 to build a noise channel structure to reduce label error due to data expansion, and
Step 3 to comprehensively establish and obtain a pedestrian re-identification network for random shielding recovery based on noise channels based on the trained basic network-based feature extraction structure, noise channel structure and CAN network structure for shielding recovery. as well as,
A noise channel-based random obstruction recovery walk that includes step 4 of identifying the original image of the actual measurement target using a noise channel-based random obstruction recovery pedestrian reidentification network. Person reidentification method.

The step 1 is
Step 101, in which the reference data set is divided into a training set and a test set, image data is randomly extracted from the training set, and a preprocessing operation is performed.
Step 102 to construct a CAN network structure for shield recovery and use it to further expand the data for the training set.
Step 103 to set the necessary parameters and correspondences for the training network model, and
The feature is that the image data after the preprocessing operation and the data expansion after the setting is completed is input to the basic network-based feature extraction structure, and the step 104 to obtain the trained basic network-based feature extraction structure is included. The pedestrian re-identification method for random shielding recovery based on the noise channel according to claim 1.

The reference dataset in step 101 is the Market1501 dataset, the preprocessing operation in step 101 includes horizontal inversion, additional noise or random erasure, and the basic network-based feature extraction structure in step 104 is ResNet50. The pedestrian re-identification method for random shielding recovery based on a noise channel according to claim 2, wherein the network structure is used.

In step 104, in the process of inputting the image data after the preprocessing operation and data expansion into the basic network-based feature extraction structure for training, the parameters are automatically adjusted using the Adam optimization method, and Dropout is performed. The pedestrian re-identification method for random shielding recovery based on a noise channel according to claim 2, wherein measures are used to avoid the occurrence of overfitting situations, and Batch Normalization is used to increase the convergence speed of the network.

Specifically, in step 103, the total training cycle epoch is set to 150, the weighted attenuation parameter weight decay is set to 0.0005, the batch size batch size is set to 180, and the learning rate update method is set. The corresponding descriptive formula is Formula 1.

In the formula

The pedestrian re-identification method for random shielding recovery based on the noise channel according to claim 2, wherein is a learning rate.

The CAN network structure for shielding recovery in step 1 includes a generator network for learning the original data set and generating an image, and whether or not the input image is real, that is, the input data becomes the original data. It is composed of a discriminator for determining whether it belongs or is generated by the generator, and the corresponding mathematical description formula is the mathematical formula 2.

The noise channel-based random shielding recovery according to claim 1, wherein x is a shielding image, y is a target image, and D and G represent a discriminator network and a generator network, respectively. Pedestrian re-identification method.

Specifically, the process of reducing the label error due to data expansion by utilizing the noise channel structure in the step 2 is specifically described.
Step 201, which determines the distribution for the transition probability between the original label corresponding to the generated image data and the noise label obtained by observing using the noise channel structure.
The noise channel according to claim 1, comprising: Pedestrian reidentification method for random shielding recovery based on.

The descriptive formula for the distribution in step 201 is mathematical formula 3.

In the formula

In the process of obtaining the suggestive parameters for the distribution using the EM algorithm in step 202,
It includes fixing the suggestive parameters θ and ω in the E step to predict the transition probability, and updating the parameter θ in the M step. can be,

In the formula

The update parameter

The corresponding descriptive formula is Formula 5.

In the formula

The target function adopted in the EM algorithm has the corresponding descriptive formula of the formula 6.

In the formula

The pedestrian re-identification method for random shielding recovery based on the noise channel according to claim 9, wherein the pedestrian re-identification method represents a target function adopted in the EM algorithm.