JP6830707B1

JP6830707B1 - Person re-identification method that combines random batch mask and multi-scale expression learning

Info

Publication number: JP6830707B1
Application number: JP2020138754A
Authority: JP
Inventors: 徳双黄; 永伍
Original assignee: 同▲済▼大学
Priority date: 2020-01-23
Filing date: 2020-08-19
Publication date: 2021-02-17
Anticipated expiration: 2040-08-19
Also published as: CN111259850B; JP2021117969A; CN111259850A

Abstract

【課題】ランダムバッチマスクとマルチスケール表現学習を融合した人物再同定方法を提供する。【解決手段】人物再同定方法は、人物再同定訓練ネットワークを構築するステップと、予め設定された訓練パラメータでネットワークハイパーパラメータ調整を行い、学習ネットワークを得るステップと、マルチスケール表現学習及びランダムバッチマスクブランチをブロックし、テストネットワークを得て、テストセットをテストネットワークに入力して、対応するテスト同定結果を得るステップと、テスト同定結果の正確率が所定値以上であるか否かを判断し、ＹＥＳと判断すると実際のデータセットを学習ネットワークに入力し、さもないとネットワークを再訓練するステップと、マルチスケール表現学習及びランダムバッチマスクブランチをブロックし、応用ネットワークを得て、照会画像を応用ネットワークに入力して、対応する同定結果を得るステップと、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide a person reidentification method in which a random batch mask and multi-scale expression learning are fused. SOLUTION: The person re-identification method includes a step of constructing a person re-identification training network, a step of adjusting network hyperparameters with preset training parameters to obtain a learning network, and multi-scale expression learning and a random batch mask. The steps to block the branch, get the test network, enter the test set into the test network to get the corresponding test identification result, and determine if the accuracy rate of the test identification result is greater than or equal to a given value. If YES, the actual data set is input to the training network, otherwise the steps to retrain the network, block multiscale representation learning and random batch mask branches, obtain the application network, and apply the query image to the application network. To provide a step of inputting to and obtaining the corresponding identification result. [Selection diagram] Fig. 1

Description

本発明はコンピュータのパターン同定及び画像処理の技術分野に関し、特にランダムバッチマスクとマルチスケール表現学習を融合した人物再同定方法に関する。 The present invention relates to the technical fields of computer pattern identification and image processing, and particularly to a person reidentification method that combines random batch mask and multi-scale expression learning.

人物再同定（ＰｅｒｓｏｎＲｅ−ｉｄｅｎｔｉｆｉｃａｔｉｏｎ、ＰＲｅＩＤ）は、コンピュータビジョン技術を使用して画像又はビデオシーケンスに特定の人物が存在するか否かを判断する技術であり、画像検索のサブ課題の一つとして広く考えられており、１つの監視人物画像を与えると、機器間を跨いで該人物画像を自動で検索することができる。現在、都市では、公共治安分野で使用されるカメラは多数配置されており、ほぼ数十メートル〜数百メートルあたり１個程度になり、それにもかかわらず、異なるカメラの間にカバレッジできない領域が存在する。人物再同定の目的は、１つのカメラで撮影された対象が、カメラの視野を離れた後、どこへ行ったかを決定することであり、ややビデオ検索と似ており、つまり、ほかのカメラで収集されたビデオから対象を見つかることであり、人物再同定のタスクは、重複視野領域のない異なるカメラで収集された人物画像又はビデオサンプル同士の接続関係を構築する処理プロセスであり、すなわち、異なる位置のカメラで異なる時刻に撮影された人物が同一人物であるか否かを同定する。 Person re-identification (PReID) is a technique for determining whether or not a specific person is present in an image or video sequence using computer vision technology, and is one of the sub-tasks of image search. It is widely considered that when one monitored person image is given, the person image can be automatically searched across the devices. Currently, in cities, many cameras used in the field of public security are deployed, about one per tens to hundreds of meters, and nevertheless, there is an area that cannot be covered between different cameras. To do. The purpose of person reidentification is to determine where an object shot by one camera went after leaving the field of view of the camera, somewhat similar to video search, that is, with another camera. Finding an object in the collected video, the task of person reidentification is the process of building connections between person images or video samples collected by different cameras without overlapping visual fields, i.e. different. Identifies whether the people taken at different times by the camera at the position are the same person.

従来の人物再同定研究はデータセットに基づいて行われ、つまり、複数のカメラを設置し、人物画像を収集し、その後、手動マーク又は自動マークを行う。これらの画像の一部は訓練及び学習用、一部は同定用である。同定精度を向上させるために、同定アルゴリズムは主に２つの部分に分けられ、一方は、より良好な画像特徴を抽出するためのものであり、他方は異なる特徴間の距離を効果的に計算比較するためのものである。 Traditional person reidentification studies are based on datasets, that is, multiple cameras are installed, person images are collected, and then manual or automatic marking is performed. Some of these images are for training and learning, and some are for identification. In order to improve the identification accuracy, the identification algorithm is mainly divided into two parts, one for extracting better image features and the other for effectively calculating and comparing the distances between different features. It is for doing.

画像特徴を抽出する際に、従来の方式は、深層学習モデルを使用し、畳み込みニューラルネットワークに基づいて特徴を自動で学習し、且つアテンションメカニズムを使用して特徴を抽出することが多いが、このような方式は通常、画像中の顔特徴又はほかの突出した特徴のみを重点的に抽出し、手又は足等の抑制されたローカル特徴を抽出しない結果、これらの抑制された重要なローカル詳細特徴を効果的に抽出できないため、後続同定の正確率を確保できない。 When extracting image features, the conventional method often uses a deep learning model, automatically learns the features based on a convolutional neural network, and extracts the features using an attention mechanism. Such methods usually focus on extracting only facial features or other prominent features in the image and not the suppressed local features such as hands or feet, resulting in these suppressed important local detail features. Cannot be effectively extracted, so the accuracy rate of subsequent identification cannot be ensured.

中国特許出願公開第１０８６４７５８８号明細書Chinese Patent Application Publication No. 1086475888

本発明は物品カテゴリ同定方法、装置、コンピュータ機器及び記憶媒体に関する。方法は、被同定物品画像を取得し、前記被同定物品画像のエッジマスク情報を抽出するステップと、前記エッジマスク情報に応じて、前記被同定物品画像をカット処理し、前記カット処理された被同定物品画像及び予め設定された物品カテゴリ同定モデルに応じて、前記被同定物品画像のカテゴリを同定し、同定結果を出力するステップと、を含み、前記予め設定された物品カテゴリ同定モデルは、訓練サンプル画像を訓練及び転移学習して得られたモデルであり、事前訓練及び転移学習して得られた物品カテゴリ同定モデルによって、物品の正確度の制御要件を向上させ、物品カテゴリ同定モデルを使用して被同定物品画像を同定することで、物品同定結果を正確に取得することができる。 The present invention relates to article category identification methods, devices, computer devices and storage media. The method is a step of acquiring an image of an article to be identified and extracting edge mask information of the image of the article to be identified, and cutting the image of the article to be identified according to the edge mask information, and the cut-processed subject. The preset article category identification model includes training to identify the category of the identified article image according to the identified article image and the preset article category identification model and output the identification result. A model obtained by training and transfer learning of sample images. The article category identification model obtained by pre-training and transfer learning improves the control requirements for article accuracy, and uses the article category identification model. By identifying the image of the article to be identified, the article identification result can be accurately obtained.

本発明は、上記従来技術の欠陥を克服するためにランダムバッチマスクとマルチスケール表現学習を融合した人物再同定方法を提供することを目的とする。 An object of the present invention is to provide a person reidentification method in which a random batch mask and multi-scale expression learning are fused in order to overcome the above-mentioned defects of the prior art.

本発明の目的は、以下の技術案によって実現できる。ランダムバッチマスクとマルチスケール表現学習を融合した人物再同定方法であって、
基準データセットを取得し、基準データセットに対してデータ拡張を行うステップＳ１と、
データ拡張後の基準データセットを訓練セット及びテストセットに分けるステップＳ２と、
ＲｅｓＮｅｔ５０畳み込みニューラルネットワークに基づいて、順次接続されたアテンション学習モジュール、特徴抽出モジュール及び同定出力モジュールを含む人物再同定訓練ネットワークを構築し、特徴抽出モジュールは特徴処理ブランチ、マルチスケール表現学習ブランチ及びランダムバッチマスクブランチを含み、前記特徴処理ブランチはグローバル平均プーリング及びバッチ正規化処理を含むステップＳ３と、
訓練セットを人物再同定訓練ネットワークに入力し、予め設定された訓練パラメータでネットワークハイパーパラメータ調整を行い、人物再同定学習ネットワークを得るステップＳ４と、
人物再同定学習ネットワーク中の特徴抽出モジュールのマルチスケール表現学習ブランチ及びランダムバッチマスクブランチをブロックし、人物再同定テストネットワークを得て、テストセットを人物再同定テストネットワークに入力し、対応するテスト同定結果を出力するステップＳ５と、
テスト同定結果の正確率を計算し、同定結果の正確率が所定値以上であるか否かを判断し、ＹＥＳと判断すると、ステップＳ７を実行し、さもないと、ステップＳ４に戻るステップＳ６と、
実際のデータセットを取得し、実際のデータセットを人物再同定学習ネットワークに入力し、実際のデータセットに対応する画像特徴を学習するステップＳ７と、
人物再同定学習ネットワーク中の特徴抽出モジュールのマルチスケール表現学習ブランチ及びランダムバッチマスクブランチをブロックし、人物再同定応用ネットワークを得て、照会画像を人物再同定応用ネットワークに入力し、該照会対象に対応する同定結果を出力するステップＳ８と、を含む。 The object of the present invention can be realized by the following technical proposals. A person reidentification method that combines random batch mask and multi-scale expression learning.
Step S1 to acquire the reference data set and expand the data to the reference data set,
Step S2, which divides the reference data set after data expansion into a training set and a test set,
Based on the ResNet50 convolutional neural network, a person reidentification training network including a sequentially connected attention learning module, feature extraction module and identification output module is constructed, and the feature extraction module is a feature processing branch, a multi-scale expression learning branch and a random batch. The feature processing branch includes the mask branch, and the feature processing branch includes step S3 including global average pooling and batch normalization processing.
Step S4, in which the training set is input to the person reidentification training network, network hyperparameters are adjusted with preset training parameters, and the person reidentification learning network is obtained.
Block the multi-scale feature learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network to obtain the person reidentification test network, enter the test set into the person reidentification test network, and enter the corresponding test identification. Step S5 to output the result and
The accuracy rate of the test identification result is calculated, it is determined whether or not the accuracy rate of the identification result is equal to or higher than a predetermined value, and if YES, step S7 is executed, otherwise step S6 returns to step S4. ,
Step S7, in which the actual data set is acquired, the actual data set is input to the person reidentification learning network, and the image features corresponding to the actual data set are learned.
Block the multi-scale expression learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network, obtain the person reidentification application network, input the query image into the person reidentification application network, and use it as the query target. Includes step S8, which outputs the corresponding identification result.

さらに、前記ステップＳ１では、データ拡張は具体的には、
基準データセットから複数の画像をランダムに抽出して水平反転処理を行うステップＳ１１と、
基準データセットから複数の画像をランダムに抽出してガウス、ごま塩ノイズ処理を行うステップＳ１２と、を含む。 Further, in step S1, the data expansion is specifically carried out.
Step S11, in which a plurality of images are randomly extracted from the reference data set and horizontal inversion processing is performed,
It includes step S12, in which a plurality of images are randomly extracted from the reference data set and Gaussian and salt and pepper noise processing is performed.

さらに、前記ステップＳ３では、アテンション学習モジュールは対象の特徴表現を強化するように、３段階に分けられ、
前記特徴処理ブランチはラベル損失・ランク付けリスト損失合同訓練を採用し、画像のグローバル情報を取得し、
前記マルチスケール表現学習ブランチは２組のラベル損失訓練を採用し、画像中のローカル詳細特徴及び空間情報の相関性を取得し、
前記ランダムバッチマスクブランチはラベル損失訓練を採用し、画像中の抑制されるローカル特徴を捕捉する。 Further, in step S3, the attention learning module is divided into three stages so as to strengthen the feature expression of the object.
The feature processing branch employs label loss / ranking list loss joint training to acquire global information on images.
The multi-scale feature learning branch employs two sets of label loss training to acquire the correlation of local detailed features and spatial information in an image.
The random batch mask branch employs label loss training to capture suppressed local features in the image.

さらに、前記ランダムバッチマスクブランチは具体的には、サイズがランダムで且つ位置領域がランダムな遮蔽ブロックを設定し、該遮蔽ブロックで画像の一部を遮蔽することによって、遮蔽されていないローカル情報を捕捉する。 Further, the random batch mask branch specifically sets an occluded block having a random size and a random position area, and occludes a part of the image with the occluded block to obtain unshielded local information. Capture.

さらに、前記アテンション学習モジュールはチャネルアテンションモジュール及び空間アテンションモジュールを含み、前記チャネルアテンションモジュールは１層の平均プーリング演算、１層の多層パーセプトロン、１層の線形層及び１層のバッチ正規化層からなり、有効チャネルの重みを大きくし、無効チャネルの重みを小さくすることに用いられ、
前記空間アテンションモジュールは２個の１＊１畳み込み層及び２個の３＊３畳み込み層を含み、前記１＊１畳み込み層は畳み込み特徴マップの次元を減少させることに用いられ、前記３＊３畳み込み層は特徴を効果的に抽出することに用いられる。 Further, the attention learning module includes a channel attention module and a spatial attention module, and the channel attention module consists of one layer of average pooling calculation, one layer of multi-layer perceptron, one layer of linear layer and one layer of batch normalization layer. , Used to increase the weight of valid channels and decrease the weight of invalid channels,
The spatial attention module includes two 1 * 1 convolution layers and two 3 * 3 convolution layers, the 1 * 1 convolution layer being used to reduce the dimensions of the convolution feature map, the 3 * 3 convolution. Layers are used to effectively extract features.

さらに、前記アテンション学習モジュールは具体的には、
Furthermore, the attention learning module is specifically described as

さらに、前記マルチスケール表現学習ブランチでは、２組のラベル損失訓練はそれぞれ小スケール特徴訓練及び大スケール特徴訓練である。 Furthermore, in the multi-scale feature learning branch, the two sets of label loss training are small-scale feature training and large-scale feature training, respectively.

さらに、前記ラベル損失訓練は交差エントロピー損失関数
In addition, the label loss training is a cross entropy loss function.

さらに、前記ランク付けリスト損失訓練はランク付けリスト損失トリプレット関数
In addition, the ranking list loss training is a ranking list loss triplet function.

さらに、前記特徴抽出モジュールの融合損失関数は具体的には、
Furthermore, the fusion loss function of the feature extraction module is specifically

従来技術に比べて、本発明は以下の利点を有する。 The present invention has the following advantages over the prior art.

１、本発明はランダムバッチマスク方式を採用し、画像中の抑制されるローカル詳細特徴を学習することができ、マルチスケール表現学習方式を採用し、多スケール特徴ベクトル中の小スケール特徴及び大スケール特徴を分離訓練し、空間情報の相関性を効果的に強化でき、さらに特徴抽出の包括性及び信頼性を確保させ、後続同定の正確度の向上に寄与する。 1. The present invention adopts a random batch mask method, can learn suppressed local detailed features in an image, adopts a multi-scale expression learning method, and employs a multi-scale feature vector for small-scale features and large-scale features. Features can be separated and trained, the correlation of spatial information can be effectively strengthened, the comprehensiveness and reliability of feature extraction can be ensured, and the accuracy of subsequent identification can be improved.

２、本発明は特徴抽出モジュールに識別損失及びランク付けリストトリプレット損失の２種の損失関数を組み合わせて、特徴間の距離を計測し、画像の訓練又は学習中、クラス間距離を大きくするとともにクラス内距離を小さくすることができ、それにより画像特徴の有効性を向上させる。 2. The present invention combines a feature extraction module with two types of loss functions, identification loss and ranking list triplet loss, to measure the distance between features, increase the distance between classes during image training or learning, and increase the class. The internal distance can be reduced, thereby improving the effectiveness of the image features.

３、本発明は、訓練又は学習では、特徴処理、ランダムバッチマスク及びマルチスケール表現学習の３ブランチの方式を採用して、画像特徴を効果的かつ全面的に抽出し、テスト又は実際の応用では、ブロックランダムバッチマスク及びマルチスケール表現学習ブランチの方式を採用して、同定正確率を確保するうえにネットワークオーバーヘッドを節約し、同定速度を向上させることができる。 3. The present invention employs a three-branch method of feature processing, random batch mask and multi-scale expression learning in training or learning to effectively and completely extract image features, and in test or practical application. , Block random batch mask and multi-scale expression learning branch method can be adopted to ensure identification accuracy rate, save network overhead and improve identification speed.

本発明の方法の概略フローチャートである。It is a schematic flowchart of the method of this invention. 本発明のネットワークの全体ブロック図である。It is a block diagram of the whole network of this invention. 人物再同定ネットワークの訓練又は学習の概略フローチャートである。It is a schematic flowchart of training or learning of a person reidentification network. ランダムバッチマスクの設計アルゴリズムの概略図である。It is the schematic of the design algorithm of a random batch mask. 人物再同定ネットワークのテスト又は応用の概略フローチャートである。It is a schematic flowchart of the test or application of the person reidentification network.

以下、図面及び具体的な実施例を参照しながら本発明を詳細説明する。 Hereinafter, the present invention will be described in detail with reference to the drawings and specific examples.

実施例
図１に示すように、ランダムバッチマスクとマルチスケール表現学習を融合した人物再同定方法は、
基準データセットを取得し、基準データセットに対してデータ拡張を行うステップＳ１と、
データ拡張後の基準データセットを訓練セット及びテストセットに分けるステップＳ２と、
ＲｅｓＮｅｔ５０畳み込みニューラルネットワークに基づいて、順次接続されたアテンション学習モジュール、特徴抽出モジュール及び同定出力モジュールを含む人物再同定訓練ネットワークを構築し、特徴抽出モジュールは特徴処理ブランチ、マルチスケール表現学習ブランチ及びランダムバッチマスクブランチを含み、前記特徴処理ブランチはグローバル平均プーリング及びバッチ正規化処理を含むステップＳ３と、
訓練セットを人物再同定訓練ネットワークに入力し、予め設定された訓練パラメータでネットワークハイパーパラメータ調整を行い、人物再同定学習ネットワークを得るステップＳ４と、
人物再同定学習ネットワーク中の特徴抽出モジュールのマルチスケール表現学習ブランチ及びランダムバッチマスクブランチをブロックし、人物再同定テストネットワークを得て、テストセットを人物再同定テストネットワークに入力し、対応するテスト同定結果を出力するステップＳ５と、
テスト同定結果の正確率を計算し、同定結果の正確率が所定値以上であるか否かを判断し、ＹＥＳと判断すると、ステップＳ７を実行し、さもないと、ステップＳ４に戻るステップＳ６と、
実際のデータセットを取得し、実際のデータセットを人物再同定学習ネットワークに入力し、実際のデータセットに対応する画像特徴を学習するステップＳ７と、
人物再同定学習ネットワーク中の特徴抽出モジュールのマルチスケール表現学習ブランチ及びランダムバッチマスクブランチをブロックし、人物再同定応用ネットワークを得て、照会画像を人物再同定応用ネットワークに入力し、該照会対象に対応する同定結果を出力するステップＳ８と、を含む。 Example As shown in FIG. 1, a person reidentification method in which a random batch mask and multi-scale expression learning are fused is used.
Step S1 to acquire the reference data set and expand the data to the reference data set,
Step S2, which divides the reference data set after data expansion into a training set and a test set,
Based on the ResNet50 convolutional neural network, a person reidentification training network including a sequentially connected attention learning module, feature extraction module and identification output module is constructed, and the feature extraction module is a feature processing branch, a multi-scale expression learning branch and a random batch. The feature processing branch includes the mask branch, and the feature processing branch includes step S3 including global average pooling and batch normalization processing.
Step S4, in which the training set is input to the person reidentification training network, network hyperparameters are adjusted with preset training parameters, and the person reidentification learning network is obtained.
Block the multi-scale feature learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network to obtain the person reidentification test network, enter the test set into the person reidentification test network, and enter the corresponding test identification. Step S5 to output the result and
The accuracy rate of the test identification result is calculated, it is determined whether or not the accuracy rate of the identification result is equal to or higher than a predetermined value, and if YES, step S7 is executed, otherwise step S6 returns to step S4. ,
Step S7, in which the actual data set is acquired, the actual data set is input to the person reidentification learning network, and the image features corresponding to the actual data set are learned.
Block the multi-scale expression learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network, obtain the person reidentification application network, input the query image into the person reidentification application network, and use it as the query target. Includes step S8, which outputs the corresponding identification result.

本発明はランダムバッチマスク（ＲａｎｄｏｍＢａｔｃｈＦｅａｔｕｒｅＭａｓｋ、ＲＢＦＭ）訓練戦略及びマルチスケール表現学習（Ｍｕｌｔｉ−ｓｃａｌｅＦｅａｔｕｒｅＲｅｐｒｅｓｅｎｔａｔｉｏｎｓＬｅａｒｎｉｎｇ）方法を採用して人物画像中の、人物の空間相関性を含むより顕著で詳細な特徴情報を抽出し、ランダムバッチマスク学習ブランチ及びマルチスケール表現学習方法ブランチはネットワークの訓練及び学習段階のみで使用され、ネットワークのテスト及び実用段階ではブロックして使用せず、図２に示すように、本発明はＲｅｓＮｅｔ−５０を特徴抽出ネットワークとして採用し、ＲｅｓＮｅｔ５０の特徴抽出プロセスの段階１（Ｓｔａｇｅ１）、段階２（Ｓｔａｇｅ２）、段階３（Ｓｔａｇｅ３）では、まず、アテンション学習モジュール（ＡｔｔｅｎｔｉｏｎＬｅａｒｎｉｎｇＭｏｄｕｌｅ）を導入して対象の特徴表現を強化し、その後、ＲｅｓＮｅｔ５０の段階４（Ｓｔａｇｅ４）の特徴ベクトルを特徴処理ブランチ、ランダムバッチマスク訓練学習ブランチ及びマルチスケール表現学習方法ブランチを経過させ、特徴処理ブランチは識別損失（ＩｄｅｎｔｉｆｉｃａｔｉｏｎＬｏｓｓ）・ランク付けリスト損失（ＲａｎｋｅｄＬｉｓｔＬｏｓｓ）合同訓練を採用して人物のグローバル情報を取得し、主にＧＡＰ（グローバル平均プーリング）及びＢＮ（バッチ正規化処理）を含み、ランダムバッチマスク学習ブランチはラベル損失訓練を採用して、抑制されるローカル特徴を捕捉し、それによって特徴抽出能力を向上させ、マルチスケール表現学習はラベル損失訓練を採用して人物画像中のローカル詳細特徴及び空間情報の相関性を取得し、このような革新的な学習戦略は、特徴抽出能力及び同定性能をさらに向上させることができ、３本のブランチは合計で４個の識別損失及び１個のランク付けリスト損失を採用して特徴間の距離を計測する。 The present invention employs a random batch mask (Random Batch Feature Mask, RBFM) training strategy and a multi-scale Feature Learning Representations Learning method to make the person more prominent and detailed, including spatial correlation of the person in the person image. Random batch mask learning branch and multi-scale expression learning method branch are used only in the training and learning stage of the network, and are not blocked and used in the test and practical stage of the network, as shown in Fig. 2. In addition, the present invention employs ResNet-50 as a feature extraction network, and in step 1 (Stage 1), step 2 (Stage 2), and step 3 (Stage 3) of the feature extraction process of ResNet 50, first, the attention learning module (Stage 3) Attention Learning Model) was introduced to strengthen the feature expression of the target, and then the feature vector of stage 4 (Stage 4) of ResNet50 was passed through the feature processing branch, the random batch mask training learning branch, and the multi-scale expression learning method branch. The feature processing branch employs joint training of Identification Loss and Ranking List Loss to acquire global information of a person, mainly GAP (Global Average Pooling) and BN (Batch Normalization Processing). ), The random batch mask learning branch employs label loss training to capture suppressed local features, thereby improving feature extraction capabilities, and multi-scale feature learning employs label loss training to capture portraits. Acquiring the correlation of local detailed features and spatial information in, such an innovative learning strategy can further improve feature extraction ability and identification performance, and 3 branches total 4 identifications. The distance between features is measured by adopting the loss and one ranking list loss.

具体的な応用では、まず、基準データセットを使用してネットワーク訓練及びテストを順次行い、ネットワーク訓練によって学習ネットワークを得て、学習ネットワーク中のランダムバッチマスクブランチ及びマルチスケール表現学習ブランチをブロックし、テストネットワークを得て、テストネットワークが同定正確度の所定値に達した後、実際のデータセットを学習ネットワークに入力して特徴学習を行い、その後、学習ネットワーク中のランダムバッチマスクブランチ及びマルチスケール表現学習ブランチをブロックし、応用ネットワークを得て、最後に、応用ネットワークによって被照会画像に対して人物再同定を行う。人物再同定ネットワークの訓練プロセスは図３に示され、基準データセットについて、以下のデータ前処理方式によってデータ拡張を行う必要がある。 In a concrete application, first, network training and testing are sequentially performed using the reference data set, the learning network is obtained by network training, and the random batch mask branch and the multi-scale expression learning branch in the learning network are blocked. After obtaining the test network and the test network reaches a predetermined value of identification accuracy, the actual data set is input to the training network for feature learning, and then the random batch mask branch and multi-scale representation in the training network. The learning branch is blocked, the application network is obtained, and finally, the person reidentification is performed on the referenced image by the application network. The training process of the person re-identification network is shown in FIG. 3, and it is necessary to expand the data of the reference data set by the following data preprocessing method.

１）データセットから複数の画像をランダムに抽出して水平反転処理を行う。
２）データセットから複数の画像をランダムに抽出してガウス、ごま塩ノイズ処理を行う。 1) A plurality of images are randomly extracted from the data set and horizontally inverted.
2) A plurality of images are randomly extracted from the data set and Gaussian and salt and pepper noise processing is performed.

基準データセットに対して、対応する組織及び上記データ処理を行った後、画像を畳み込みニューラルネットワーク中（ＲｅｓＮｅｔ５０）に入力して特徴抽出を行う。 After performing the corresponding tissue and the above data processing on the reference data set, the image is input into the convolutional neural network (ResNet50) to perform feature extraction.

ネットワーク訓練の全過程について、特徴抽出部分では、ｉｄｅｎｔｉｆｉｃａｔｉｏｎｌｏｓｓ（識別損失）とｒａｎｋｅｄｌｉｓｔｌｏｓｓ（ランク付けリスト損失）を融合する方式を使用して合同訓練を行い、３つのブランチの特徴学習構造を含み、各ブランチ特徴によって画像の特徴マップを抽出し、その後、合同した損失関数によってネットワーク訓練、重み更新を行う。 For the entire process of network training, the feature extraction part includes joint training using a method that fuses image loss (identification loss) and ranked list loss (ranking list loss), and includes the feature learning structure of three branches. , The feature map of the image is extracted by each branch feature, and then the network training and weight update are performed by the combined loss function.

アテンション学習モジュール（ＡｔｔｅｎｔｉｏｎＬｅａｒｎｉｎｇＭｏｄｕｌｅ）について、チャネルアテンションモジュール及び空間アテンションモジュールを含み、チャネルアテンションの主なアイデアは有効チャネルの重みを大きくし、無効チャネルの重みを小さくすることであり、チャネルアテンションモジュールは１層の平均プーリング演算、１層の多層パーセプトロン、１層の線形層及び１層のバッチ正規化層からなり、平均プーリング演算は以下の式に示される。 Regarding the Attention Learning Module, which includes a channel attention module and a spatial attention module, the main idea of channel attention is to increase the weight of the effective channel and decrease the weight of the invalid channel, and the channel attention module is It consists of one layer of average pooling calculation, one layer of multi-layer perceptron, one layer of linear layer and one layer of batch normalization layer, and the average pooling calculation is expressed by the following equation.

採用される平均プーリング層、多層パーセプトロン及び線形層は各チャネルアテンション及び調整チャネルアテンションの寸法を評価するためのものであり、チャネルアテンションの式は以下のように示される。 The average pooling layer, multi-layer perceptron and linear layer adopted are for evaluating the dimensions of each channel attention and adjustment channel attention, and the formula of the channel attention is shown as follows.

一方、空間アテンションモジュールは画像中の位置情報に着目し、ネットワークに、特徴マップ中のどの部分が空間的により高い応答を有する可能性があるかを理解させ、空間アテンションモジュールは４個の畳み込み層を含み、そのうち、２個の畳み込み層は１＊１畳み込みで畳み込み特徴マップの次元を削減し、２個の３＊３畳み込みは次元削減後、特徴を効果的に抽出することに用いられ、空間アテンションは以下のように示される。
Spatial attention modules, on the other hand, focus on location information in the image, letting the network understand which parts of the feature map may have a higher spatial response, and spatial attention modules have four convolution layers. Of these, two convolution layers reduce the dimension of the convolution feature map with 1 * 1 convolution, and two 3 * 3 convolutions are used to effectively extract features after dimension reduction, space. Attention is shown as follows.

最終的なアテンション学習モジュールはチャネルアテンションと空間アテンションを組み合わせて、
The final attention learning module combines channel attention and spatial attention,

ランダムバッチマスクブランチは主に、抑制されるローカル詳細特徴を学習し、モデルの特徴抽出能力を向上させ、ランダムバッチマスク訓練戦略は主に特徴ベクトルの訓練中、サイズがランダムで、位置領域がランダムな遮蔽ブロックを設定してローカル詳細情報を捕捉することであり、その設計アルゴリズムは図４に示される。 The random batch mask branch mainly learns the local detailed features to be suppressed and improves the feature extraction ability of the model, and the random batch mask training strategy is mainly during the training of the feature vector, the size is random and the position area is random. It is to set a random shielding block to capture local detailed information, and its design algorithm is shown in FIG.

ネットワーク訓練及びネットワーク学習段階では、まず、ネットワークにおいて第４段階で出力されるＮ個の次元サイズがＣ×Ｈ×Ｗ（Ｃは特徴マップのチャネル数、Ｈ、Ｗはそれぞれ特徴マップの高さ及び幅である）の特徴マップに応じて、高さマスク比率値Ｒ_ｈ及び幅マスク比率値Ｒ_ｗをランダムに生成し、
その後、高さマスク比率値Ｒ_ｈと入力特徴マップの高さＨを乗算してマスク高さＨ_ｍを得て、幅マスク比率値Ｒ_ｗと入力特徴マップの幅Ｗを乗算してマスク幅Ｗ_ｎを得て、
さらに０〜（Ｈ−Ｈ_ｍ）数値の間にある整数Ｘ_ａ、及び０〜（Ｗ−Ｗ_ｎ）数値の間にある整数Ｙ_ｂをランダムに生成し、
Ｈ行Ｗ列の数値がすべて１のマトリックスＰを生成し、マトリックスＰ中のＸ_ａ〜Ｘ_ａ＋Ｈ_ｍ及びＹ_ｂ〜Ｙ_ｂ＋Ｗ_ｎの領域にすべて０を付値し、マスクマトリックスＰ’を得て、
最後に、得られたマスクマトリックスＰ’を入力されたＮ個の特徴マップと対応付けて乗算し、すなわち、入力されたＮ個の特徴マップに対応付けてマスク処理を行う。 In the network training and network learning stages, first, the N dimension sizes output in the fourth stage of the network are C × H × W (C is the number of channels of the feature map, and H and W are the height of the feature map and W, respectively. The height mask ratio value R _h and the width mask ratio value R _w are randomly generated according to the feature map of (width).
After that, the height mask ratio value R _h is multiplied by the height H of the input feature map to obtain the mask height H _m , and the width mask ratio value R _w is multiplied by the width W of the input feature map to obtain the mask width W. _{Get n} ,
Furthermore, an integer X _a between 0 (H-H _m ) numbers and an integer Y _b between 0- (W-W _n ) numbers are randomly generated.
A matrix P in which the numerical values in rows and columns of H and W are all 1 is generated, and 0 is assigned to all the regions of X _{a to} X _a + H _m and Y _{b to} Y _b + W _{n in} the matrix P, and the mask matrix P'is set. Get,
Finally, the obtained mask matrix P'is associated with the input N feature maps and multiplied, that is, the mask processing is performed in association with the input N feature maps.

ネットワークテスト及び実用段階では、ランダムバッチマスク学習ブランチをブロックする。 In the network test and practical stages, block the random batch mask learning branch.

マルチスケール表現学習ブランチは、マルチスケールグループ畳み込み戦略を採用し、ＲｅｓＮｅｔ５０中のＳｔａｇｅ−４中の特徴ベクトルを分割し、複数のグループの畳み込みカーネルのサイズ３＊３でグループ特徴に対して特徴抽出を行い、マルチスケール特徴ベクトル中の小スケール特徴及び大スケール特徴を分離訓練し、それにより人物画像中のより顕著で詳細な特徴を抽出し、空間情報の相関性を強化することができる。 The multi-scale expression learning branch adopts a multi-scale group convolution strategy, divides the feature vector in Stage-4 in ResNet50, and extracts features for group features with a size 3 * 3 of the convolution kernel of multiple groups. It is possible to separate and train small-scale features and large-scale features in a multi-scale feature vector, thereby extracting more prominent and detailed features in a portrait image and enhancing the correlation of spatial information.

ｉｄｅｎｔｉｆｉｃａｔｉｏｎｌｏｓｓ（識別損失）は、一般な識別タスクで使用される損失関数と同じであり、一般には交差エントロピー損失関数を使用し、具体的な式は以下のように示される。 The identity loss (discrimination loss) is the same as the loss function used in general discriminating tasks, generally using the cross entropy loss function, and the specific formula is shown as follows.

人物データセット中の訓練セットとテストセットが共通部分を有さないため、人物再同定はｏｎｅ−ｓｈｏｔ学習タスクとして考えられ、その結果、モデル訓練の過剰適合現象を引き起こしてしまう。一方、Ｌａｂｅｌｓｍｏｏｔｈｉｎｇは識別タスクで過剰適合を回避する一般的な方法である。
ｒａｎｋｅｄｌｉｓｔｌｏｓｓ（ランク付けリスト損失）について、陽性サンプルと陰性サンプルを区別するために、陰性サンプル間の距離をある閾値αよりも大きく設定し、且つ陽性サンプル間の距離をα−ｄよりも小さく設定し、すなわち、陽性サンプルと陰性サンプルの間に少なくとも間隔ｄを有する。 Since the training set and the test set in the person dataset have no intersection, person re-identification can be considered as a one-shot learning task, resulting in overfitting of model training. Label smoothing, on the other hand, is a common method of avoiding overfitting in identification tasks.
For ranked list loss, the distance between negative samples is set to be greater than a certain threshold α and the distance between positive samples is smaller than α-d in order to distinguish between positive and negative samples. Set, i.e. have at least an interval d between the positive and negative samples.

トリプレット損失、交差エントロピー損失を使用してネットワークの特徴抽出エネルギーを合同訓練し、融合された損失関数は以下のように表現される。 The loss function fused by jointly training the feature extraction energy of the network using triplet loss and cross entropy loss is expressed as follows.

本実施例では、ネットワーク訓練プロセスの訓練パラメータは、訓練エポック（ｅｐｏｃｈ）を１２０、重み減衰パラメータ（ｗｅｉｇｈｔｄｅｃａｙ）を０．０００５、バッチサイズ（ｂａｔｃｈｓｉｚｅ）を３２に設定し、学習率更新方式は以下の通りである。
In this embodiment, the training parameters of the network training process are set to 120 for the training epoch (epoch), 0.0005 for the weight decay parameter (weight decay), and 32 for the batch size (batch size), and the learning rate update method is It is as follows.

ネットワーク訓練を経て学習ネットワークを得て、学習ネットワーク中のランダムバッチマスクブランチ及びマルチスケール表現学習ブランチをブロックし、テストネットワークを得て、具体的なテストプロセスは５に示され、特徴抽出時、特徴処理ブランチのみを使用して特徴抽出を行う。本発明に係る方法を採用し、本実施例はＣＵＨＫ０３−Ｌａｂｅｌｅｄデータセットにおいてほかの同定方法と比較し、対応する同定結果は表１に示される。 Obtaining a learning network through network training, blocking the random batch mask branch and multi-scale expression learning branch in the learning network, obtaining a test network, the specific test process is shown in 5, and during feature extraction, features Feature extraction is performed using only the processing branch. Adopting the method according to the present invention, this example is compared with other identification methods in the CUHK03-Labeled dataset, and the corresponding identification results are shown in Table 1.

表１のデータからわかるように、本発明に係る人物再同定方法は、従来のほかの同定方法よりもＲａｎｋ−１正確率値及びｍＡＰ値がいずれも優れ、本発明は画像特徴、特に画像中の抑制される重要なローカル詳細特徴を全面的かつ効果的に抽出することで、後続同定の正確度を向上させることができる。 As can be seen from the data in Table 1, the person re-identification method according to the present invention is superior to other conventional identification methods in both Rank-1 accuracy rate value and mAP value, and the present invention has image features, particularly in an image. The accuracy of subsequent identification can be improved by comprehensively and effectively extracting the important local detail features that are suppressed.

Claims

A person reidentification method that combines random batch mask and multi-scale expression learning.
Step S1 to acquire the reference data set and expand the data to the reference data set,
Step S2, which divides the reference data set after data expansion into a training set and a test set,
Based on the ResNet50 convolutional neural network, a person reidentification training network including a sequentially connected attention learning module, feature extraction module and identification output module is constructed, and the feature extraction module is a feature processing branch, a multi-scale expression learning branch and a random batch. The feature processing branch includes the mask branch, and the feature processing branch includes step S3 including global average pooling and batch normalization processing.
Step S4, in which the training set is input to the person reidentification training network, network hyperparameters are adjusted with preset training parameters, and the person reidentification learning network is obtained.
Block the multi-scale feature learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network to obtain the person reidentification test network, enter the test set into the person reidentification test network, and enter the corresponding test identification. Step S5 to output the result and
The accuracy rate of the test identification result is calculated, it is determined whether or not the accuracy rate of the identification result is equal to or higher than a predetermined value, and if YES, step S7 is executed, otherwise step S6 returns to step S4. ,
Step S7, in which the actual data set is acquired, the actual data set is input to the person reidentification learning network, and the image features corresponding to the actual data set are learned.
Block the multi-scale expression learning branch and random batch mask branch of the feature extraction module in the person reidentification learning network, obtain the person reidentification application network, input the query image into the person reidentification application network, and use it as the query target. A method for reidentifying a person by fusing a random batch mask and multi-scale expression learning, which comprises step S8 for outputting the corresponding identification result.

Specifically, in step S1, the data expansion is performed.
Step S11, in which a plurality of images are randomly extracted from the reference data set and horizontal inversion processing is performed,
The person reproduction that combines the random batch mask and multi-scale expression learning according to claim 1, further comprising step S12 in which a plurality of images are randomly extracted from a reference data set to perform gauss and sesame salt noise processing. Identification method.

In step S3, the attention learning module is divided into three stages so as to strengthen the feature expression of the object.
The feature processing branch employs label loss / ranking list loss joint training to acquire global information on images.
The multi-scale feature learning branch employs two sets of label loss training to acquire the correlation of local detailed features and spatial information in an image.
The method for reidentifying a person by fusing the random batch mask and multi-scale expression learning according to claim 1, wherein the random batch mask branch employs label loss training and captures suppressed local features in an image. ..

Specifically, the random batch mask branch captures unobstructed local information by setting an occluded block having a random size and a random position region, and occluding a part of the image with the occluded block. The method for reidentifying a person by fusing the random batch mask according to claim 3 and multi-scale expression learning.

The attention learning module includes a channel attention module and a spatial attention module, and the channel attention module consists of one layer of average pooling calculation, one layer of multi-layer perceptron, one layer of linear layer and one layer of batch normalization layer, and is effective. Used to increase the weight of channels and decrease the weight of invalid channels,
The spatial attention module includes two 1 * 1 convolution layers and two 3 * 3 convolution layers, the 1 * 1 convolution layer being used to reduce the dimensions of the convolution feature map, the 3 * 3 convolution. The method for reidentifying a person by fusing the random batch mask and multi-scale expression learning according to claim 3, wherein the layer is used for effectively extracting features.

Specifically, the attention learning module
The method for reidentifying a person by fusing the random batch mask according to claim 5 and multi-scale expression learning.

In the multi-scale expression learning branch, the person who fused the random batch mask according to claim 3 and the multi-scale expression learning according to claim 3, wherein the two sets of label loss training are the small scale feature training and the large scale feature training, respectively. Reidentification method.

The label loss training is a cross entropy loss function.

The method for reidentifying a person by fusing the random batch mask and multi-scale expression learning according to claim 7, wherein the method is used.

The ranking list loss training is a ranking list triplet loss function.
The person reidentification method in which the random batch mask according to claim 8 and multi-scale expression learning are fused.

Specifically, the fusion loss function of the feature extraction module is
The method for reidentifying a person by fusing the random batch mask according to claim 9 and multi-scale expression learning.