CN115205890A - Method and system for re-identifying pedestrians of non-motor vehicles - Google Patents

Method and system for re-identifying pedestrians of non-motor vehicles Download PDF

Info

Publication number
CN115205890A
CN115205890A CN202210524755.6A CN202210524755A CN115205890A CN 115205890 A CN115205890 A CN 115205890A CN 202210524755 A CN202210524755 A CN 202210524755A CN 115205890 A CN115205890 A CN 115205890A
Authority
CN
China
Prior art keywords
pedestrian
local
human body
image
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210524755.6A
Other languages
Chinese (zh)
Inventor
鞠蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Boya Jizhi Intelligent Technology Co ltd
Original Assignee
Nanjing Boya Jizhi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Boya Jizhi Intelligent Technology Co ltd filed Critical Nanjing Boya Jizhi Intelligent Technology Co ltd
Priority to CN202210524755.6A priority Critical patent/CN115205890A/en
Publication of CN115205890A publication Critical patent/CN115205890A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for re-identifying pedestrians of non-motor vehicles, which comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene; carrying out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image characteristics and the local human body image; training a preset non-motor vehicle pedestrian re-recognition network model by using the preprocessed global image features and local human body images; and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle. In order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the invention trains a non-motor vehicle pedestrian re-identification model by using the preprocessed global pedestrian image and the local human body image, and adopts a characteristic fusion module to self-adaptively distribute weight for the global characteristic and the local characteristic, thereby solving the problem of non-motor vehicle pedestrian re-identification.

Description

Method and system for re-identifying pedestrians of non-motor vehicles
Technical Field
The invention relates to the technical field of pedestrian re-identification, in particular to a non-motor vehicle pedestrian re-identification method and system.
Background
The pedestrian re-identification means that the same person is matched under different cameras, and the method is basically applied to monitoring scenes. Most of the existing pedestrian re-identification data sets are concentrated on pedestrians in a camera view at present, but in a real monitoring scene, the pedestrians are not only walking, but also have a large amount of riding behaviors, such as riding bicycles, electric vehicles, motorcycles and other non-motor vehicles; in criminal investigation, a large number of non-motor vehicle theft incidents occur, and therefore it is of vital importance to identify non-motor vehicle pedestrians on a large scale.
In order to adapt to real scenes, it is necessary to additionally consider non-motor vehicles when re-identifying pedestrians. Through the analysis of the non-motor vehicle pedestrians, the non-motor vehicle pedestrians generally have a definite structure and can be divided into two parts, namely a human body and a non-motor vehicle. The non-motor vehicle pedestrian weight recognition is significantly different from the traditional pedestrian and vehicle weight recognition. When different vehicles have the same model and color, they look very similar, resulting in greater inter-class similarity. Also, when a pedestrian wears similar clothing (e.g., the same uniform and jersey), difficulty is also presented to the pedestrian in re-identification. However, when a non-motor vehicle is combined with a pedestrian, the information of the non-motor vehicle (such as model and color) is combined with the information of the pedestrian (such as clothes, carried objects and the like), so that the similarity in the class is greatly reduced.
In the above, the pedestrian re-identification of the non-motor vehicle under the same condition is considered, and if the motion state of the pedestrian changes, the pedestrian information and the non-motor vehicle information need to be considered separately.
Based on the above problems, a need exists for a method and a system for re-identifying pedestrians and non-motor vehicles, which simultaneously consider pedestrian and non-motor vehicle information.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made keeping in mind the above problems occurring in the prior art.
Therefore, the invention aims to provide a non-motor vehicle pedestrian re-identification method, which aims to better solve the problem of non-motor vehicle pedestrian re-identification.
In order to solve the technical problems, the invention provides the following technical scheme: a method for re-identifying pedestrians of non-motor vehicles comprises the following steps,
the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos under different cameras in the same scene;
step two: carrying out human body detection on the non-motor vehicle pedestrian re-identification data set according to a pre-trained human body detector to obtain a local human body image, and preprocessing global image features and the local human body image;
step three: training a preset non-motor vehicle pedestrian re-recognition network model by using the global image features and the local human body images preprocessed in the second step;
step four: and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) And finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the human body detector in the second step is a yolov5s detector, and the pretreatment comprises the following specific operations: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and global image features and multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, and the finer the granularity is, the larger the N is, the more the granularity is, the two branches are respectively responsible for extracting multi-granularity local features; the local attention module adopts a hierarchical attention network HAN;
the feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the specific training mode in the third step process is as follows: first global image features
Figure RE-808356DEST_PATH_IMAGE001
Input into a simple binary network to obtain a Bx 2 characteristic
Figure RE-86891DEST_PATH_IMAGE002
According to
Figure RE-300222DEST_PATH_IMAGE002
Giving weight to the global image feature and the local human body feature, if the determination target is a riding human, the local human body part thereofWe should get higher attention and finally we fuse the global image features and local human features:
Figure RE-121547DEST_PATH_IMAGE003
+
Figure RE-657571DEST_PATH_IMAGE004
wherein,
Figure RE-423402DEST_PATH_IMAGE005
and
Figure RE-906336DEST_PATH_IMAGE006
respectively representing global image features and local features
Figure RE-175643DEST_PATH_IMAGE007
Finally, the method is used for the pedestrian re-identification of the non-motor vehicles;
in order to make the network have better identification capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:
Figure RE-351409DEST_PATH_IMAGE008
wherein,
Figure RE-604536DEST_PATH_IMAGE009
which represents the cross-entropy loss of the entropy,
Figure RE-425249DEST_PATH_IMAGE009
a loss of a triplet is represented as,
Figure RE-549063DEST_PATH_IMAGE010
and
Figure RE-426889DEST_PATH_IMAGE011
representing the weight, cross entropy loss, of both loss functions separatelyThe loss function is expressed as:
Figure RE-308257DEST_PATH_IMAGE012
wherein
Figure RE-929732DEST_PATH_IMAGE013
Indicates the number of pictures for the minimum batch process,
Figure RE-580156DEST_PATH_IMAGE014
representation feature
Figure RE-363304DEST_PATH_IMAGE015
C represents the number of categories;
the triple loss respectively represents an anchor sample, a negative sample and a positive sample, wherein the anchor sample is a sample randomly selected from a training data set, the positive sample and the anchor sample belong to the same class, the negative sample and the anchor sample belong to different classes, the purpose of triple loss function learning is to enable the intra-class difference of the same class to be minimum and the inter-class difference of the different classes to be maximum, and the loss function can be represented as:
Figure RE-731968DEST_PATH_IMAGE016
wherein,
Figure RE-891554DEST_PATH_IMAGE010
the edge-over-parameter is represented,
Figure RE-662064DEST_PATH_IMAGE017
Figure RE-616114DEST_PATH_IMAGE018
and
Figure RE-472074DEST_PATH_IMAGE019
respectively representing anchor sample features, positive sample features and negative sample features.
The system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for acquiring a sample of the non-motor vehicle;
the non-motor vehicle pedestrian re-recognition system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
The invention has the beneficial effects that: the invention provides a method and a system for re-identifying pedestrians in non-motor vehicles, which consider the difference from the traditional pedestrian re-identification, and firstly re-construct a data set for re-identifying pedestrians in non-motor vehicles according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:
fig. 1 is a schematic flow chart of a pedestrian re-identification method for a non-motor vehicle according to the present invention.
Fig. 2 is a system block diagram of a pedestrian re-identification system of a non-motor vehicle according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures of the present invention are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Example 1
Referring to fig. 1, for a first embodiment of the present invention, there is provided a non-motor pedestrian re-identification method, including the steps of:
the method comprises the following steps: the method comprises the following steps of constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene, wherein the specific mode is as follows:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) Finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set;
step two: according to a pre-trained human body detector (yolov 5s detector), human body detection is carried out on a non-motor vehicle pedestrian heavy identification data set to obtain a local human body image, and global image features and the local human body image are preprocessed in the following mode: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, randomly increasing some shielding and rotating samples to enhance training data through random horizontal overturning, random erasing, random cutting and normalization of image pixel values;
step three: the global image features and the local human body images preprocessed in the second step are used for training a preset non-motor vehicle pedestrian re-recognition network model, in order to improve the recognition performance of the non-motor vehicle pedestrian re-recognition network model, the network model is recognized in a mode that three models are matched with each other, and the non-motor vehicle pedestrian re-recognition network model is matched with each other in the mode that the three models are matched with each other, and the method comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module;
step four: performing feature extraction on a target image to be recognized by using the trained non-motor vehicle pedestrian weight recognition network model, namely using the target image to be inquired as the input of the non-motor vehicle pedestrian weight recognition network model, respectively learning the global image feature and the local human body feature of the target, adaptively distributing weights for the global image feature and the local human body feature, and fusing the global image feature and the local human body feature to obtain a final feature descriptor of the target; and performing the same operation steps on all the pictures in the candidate picture library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate library as a final identification result.
To explain further, the TraDeS tracker focuses on using the tracking information to assist in detection and feeding back the detection results to the tracking. TraDes is constructed as a point cloud-based object detection network CenterNet, which is mainly composed of two models, a correlation model (CVA) and a dynamic guided feature warping (MFW) model. The CVA extracts re-id embedded features point by point through a backbone network to construct cost quantity, and the cost quantity stores similar embedded feature pairs matched with the two frames; then, a tracking offset is extracted according to the cost amount, namely the displacement of all points in time and space, and the tracking offset integrates all the characteristics for performing a simple two-round long-time data association; then, the MFW propagates the tracking offset as dynamic information from the previous frame to the current frame; and finally, combining the propagated characteristics and the characteristics of the current frame, and then detecting and segmenting. Here, the detection and tracking model is only directly applied, and thus, the description is not repeated.
The unsupervised method infomap is initially used for face clustering, construction of adjacent edges is accelerated through faiss, good clustering effect is obtained, and meanwhile clustering speed is improved.
While the YOLO series of detectors are classical One-Stage Object Detection (One-Stage Object Detection) structures. Yolov3 is divided into four parts of an input end, a Backbone network (Backbone), a neutral and an output end. The function of the backhaul is mainly used for extracting the characteristics of an input image for the use of a following network; the Neck is mainly used for reprocessing and reasonably utilizing important features extracted from the Backbone. The integral structure Yolov3 of the Yolov4 is the same, but a plurality of integration innovations are carried out on each substructure, and particularly, mosaic data enhancement, cmBN and SAT self-confrontation training are adopted at the input end; the Backbone adopts CSPDarknet53, a Mish activation function and Dropblock; an SPP module and an FPN + PAN structure are adopted in the Neck; the anchor frame mechanism of the output end is the same as Yolov3, but the Loss function during training adopts CIOU _ Loss, and the nms screened by the prediction frame adopts DIOU _ nms.
The structure of Yolov5 is very similar to that of Yolov4, but there are also some differences. Specifically, the input end adopts adaptive anchor frame calculation and adaptive picture scaling while adopting Mosaic data enhancement; a Focus structure and a CSP structure are adopted on the Backbone; the Neck and the Yolov4 both adopt the FPN + PAN structure, and the difference is that the Neck structure of the Yolov4 adopts common convolution operation, while the Neck structure of the Yolov5 adopts the CSP2 structure designed by using CSPnet for reference, so that the network feature fusion capability is enhanced; and the output end adopts GIOU _ Loss as a Loss function of the Bounding box. Yolov5 has four network models in total, namely four models of Yolov5s, yolov5m, yolov5l and Yolov5x, wherein the Yolov5s network is the network with the minimum depth and the minimum width of a characteristic diagram in a Yolov5 series, and therefore the speed is the fastest network. Although some precision is lost compared with the other three network structures, the Yolov5s network is finally selected as the human body detector because the task complexity is not very high and the requirement on the speed is high.
It should be particularly noted that, a global image feature extraction module (multiple granular Network, multi-Granularity Network) selects MGN as a basic skeleton, and an input global image passes through MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether an input picture is ridden, wherein if the judgment target is a riding person, higher weight is given to the local features, otherwise, higher weight is given to the global image features, so that the model can be helped to better solve the problem of pedestrian re-identification of the non-motor vehicles, and meanwhile, the model can be helped to obtain higher performance for pedestrian re-identification under the traditional condition;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and the global image features and the multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, the larger the granularity is, the finer the N is, the two branches are respectively responsible for extracting multi-granularity local features, and the global and multi-granularity local features are combined together, so that rich information and details can be obtained to represent the input global pedestrian image; the backbone network of the MGN uses the resnet50, and is divided into three branches from the latter half of the network, and the structure of the three branchesSimilarly, the downsampling rate is different, global branches are downsampled by using stride =2 convolution, 2048-dimensional feature vectors are generated by adopting global maximum pooling on the obtained feature map, and the feature vectors are compressed into 256-dimensional global feature vectors by 1 × 1 convolution
Figure RE-435351DEST_PATH_IMAGE020
(ii) a Two local branches are used to learn the local feature representation, and in order to preserve the receptive field that fits the local features, neither branch uses downsampling. Wherein, one local branch divides the characteristic diagram into 2 strips uniformly in the horizontal direction, which can be understood as dividing the pedestrian into an upper half body and a lower half body; the other partial branch divides the characteristic diagram into 3 strips uniformly in the horizontal direction, namely the pedestrian is divided into an upper part, a middle part and a lower part. The two local branches operate in a similar manner, first compressing the feature map before segmentation into a 256-dimensional global feature vector by 1 × 1 convolution
Figure RE-794788DEST_PATH_IMAGE021
And
Figure RE-199967DEST_PATH_IMAGE022
after the segmentation is finished, global pooling is firstly carried out on each strip, then dimension reduction operation is carried out, and two 256-dimensional local feature vectors are obtained by one local branch
Figure RE-277644DEST_PATH_IMAGE023
And
Figure RE-44612DEST_PATH_IMAGE024
the other local branch obtains three 256-dimensional local feature vectors
Figure RE-524135DEST_PATH_IMAGE025
Figure RE-85566DEST_PATH_IMAGE026
And
Figure RE-650540DEST_PATH_IMAGE027
finally, the 8 256-dimensional features are connected in series to form a 2048-dimensional feature which is used as the global feature of the input global pedestrian image.
The local attention module employs a hierarchical attention network HAN, which is originally used for text classification, with two hierarchical attention mechanisms that can focus different attention on content of different importance. The characteristic diagrams are provided with different channels, the meanings represented by the different channels are different, so that the contribution to final recognition is different, and meanwhile, different spatial positions of the characteristic diagrams also have different semantics, so that the expression of the local human body image on the channel and spatial dimensions is increased by adopting HAN. Specifically, the local human body image passes through the front half of the resnet50 to obtain a B × 2048 × 24 × 8 feature map, where B represents the batch size. The third channel of the feature map is segmented to obtain three feature maps
Figure RE-955619DEST_PATH_IMAGE028
The dimensionality is B multiplied by 2048 multiplied by 8, each feature map passes through a channel attention mechanism which comprises a generalized average pooling layer, a full connection layer for dimensionality reduction, a ReLU layer, a full connection layer for dimensionality increase and a sigmoid activation function, and the processed feature map becomes
Figure RE-24069DEST_PATH_IMAGE029
. For the features above channel c can be expressed as
Figure RE-756402DEST_PATH_IMAGE030
Spatial attention of the features is achieved by enhancing the peak response.
The feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
In the third step, the specific training mode is as follows: first global image features
Figure RE-808672DEST_PATH_IMAGE001
Input into a simple binary network to obtain a Bx 2 characteristic
Figure RE-651863DEST_PATH_IMAGE002
According to
Figure RE-105978DEST_PATH_IMAGE002
Giving weights to the global image features and the local human body features, if the target is a riding person, obtaining higher attention from the local human body part, and finally fusing the global image features and the local human body features to obtain the following result:
Figure RE-9212DEST_PATH_IMAGE003
+
Figure RE-814357DEST_PATH_IMAGE004
wherein,
Figure RE-195659DEST_PATH_IMAGE005
and
Figure RE-504281DEST_PATH_IMAGE006
respectively representing global image features and local features
Figure RE-581346DEST_PATH_IMAGE007
Finally, the method is used for the pedestrian re-identification of the non-motor vehicles;
in order to make the network have better recognition capability, a cross-entropy loss function (cross-entropy loss) for classification and a ternary loss function (triplet loss) for metric learning are used as the loss functions of the training process at the same time:
Figure RE-342629DEST_PATH_IMAGE008
wherein,
Figure RE-793201DEST_PATH_IMAGE009
which represents the cross-entropy loss in the entropy domain,
Figure RE-956330DEST_PATH_IMAGE009
a loss in the triplet is represented as,
Figure RE-201366DEST_PATH_IMAGE010
and
Figure RE-715524DEST_PATH_IMAGE011
the weights of the two kinds of loss functions are respectively expressed, and the cross entropy loss function is expressed as:
Figure RE-704209DEST_PATH_IMAGE031
wherein
Figure RE-721843DEST_PATH_IMAGE013
Indicates the number of pictures for the minimum batch process,
Figure RE-403360DEST_PATH_IMAGE014
representation feature
Figure RE-139235DEST_PATH_IMAGE015
C represents the number of categories;
the triple loss respectively represents an Anchor sample (Anchor), a Negative sample (Negative) and a Positive sample (Positive), wherein the Anchor sample is a sample randomly selected from a training data set, the Positive sample and the Anchor sample belong to the same class, the Negative sample and the Anchor sample belong to different classes, the triple loss function learning aims to enable the intra-class difference of the same class to tend to be minimum, and the inter-class difference of the different classes to tend to be maximum, and the loss function can be expressed as:
Figure RE-931611DEST_PATH_IMAGE016
wherein,
Figure RE-803752DEST_PATH_IMAGE010
the edge-over-parameter is represented,
Figure RE-656170DEST_PATH_IMAGE017
Figure RE-879341DEST_PATH_IMAGE018
and
Figure RE-209828DEST_PATH_IMAGE019
respectively representing anchor sample features, positive sample features and negative sample features.
Example 2
Referring to fig. 2, a second embodiment of the present invention, which is different from the first embodiment, is: the system for the pedestrian re-identification method of the non-motor vehicle in the embodiment is provided.
The system comprises: the system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;
the system comprises a sample acquisition unit, a data processing unit, a model training unit, a model application unit and a non-motor vehicle pedestrian re-recognition network model, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
According to the method and the system for re-identifying the non-motor vehicle pedestrians, the difference from the traditional pedestrian re-identification is considered, and firstly, a non-motor vehicle pedestrian re-identification data set is reconstructed according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (8)

1. A pedestrian re-identification method of a non-motor vehicle is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene;
step two: according to a pre-trained human body detector, performing human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image features and the local human body image;
step three: training a preset non-motor vehicle pedestrian re-recognition network model by using the global image features and the local human body images preprocessed in the second step;
step four: and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle.
2. The non-motor pedestrian re-identification method according to claim 1, characterized in that: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) And finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set.
3. The non-motor pedestrian re-identification method according to claim 2, characterized in that: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.
4. The non-motor pedestrian re-identification method according to claim 3, characterized in that: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.
5. A method of pedestrian re-identification of a non-motor vehicle as claimed in claim 1~4 wherein: the human body detector in the second step is a yolov5s detector, and the specific operation of pretreatment is as follows: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.
6. The non-motor pedestrian re-identification method according to claim 3, characterized in that: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and global image features and multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, and the finer the granularity is, the larger the N is, the more the granularity is, the two branches are respectively responsible for extracting multi-granularity local features; the local attention module adopts a hierarchical attention network HAN;
the feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
7. Non-automotive pedestrian re-identification as claimed in claim 6The method is characterized in that: the specific training mode in the third step process is as follows: first global image features
Figure RE-426693DEST_PATH_IMAGE001
Input into a simple binary network to obtain a Bx 2 characteristic
Figure RE-671730DEST_PATH_IMAGE002
According to
Figure RE-920308DEST_PATH_IMAGE002
Giving weights to the global image features and the local human body features, if the judgment target is a riding person, the local human body part of the riding person needs to obtain higher attention, and finally fusing the global image features and the local human body features:
Figure RE-908993DEST_PATH_IMAGE003
+
Figure RE-51261DEST_PATH_IMAGE004
wherein,
Figure RE-76986DEST_PATH_IMAGE005
and
Figure RE-671916DEST_PATH_IMAGE006
respectively representing global image features and local features
Figure RE-464291DEST_PATH_IMAGE007
Finally, the method is used for identifying the pedestrians of the non-motor vehicles;
in order to make the network have better recognition capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:
Figure RE-336432DEST_PATH_IMAGE008
wherein,
Figure RE-923271DEST_PATH_IMAGE009
which represents the cross-entropy loss in the entropy domain,
Figure RE-146442DEST_PATH_IMAGE009
a loss in the triplet is represented as,
Figure RE-211350DEST_PATH_IMAGE010
and
Figure RE-331140DEST_PATH_IMAGE011
the weights of the two kinds of loss functions are respectively expressed, and the cross entropy loss function is expressed as:
Figure RE-964247DEST_PATH_IMAGE012
wherein
Figure RE-DEST_PATH_IMAGE013
Indicating the minimum number of pictures to be batched,
Figure RE-127244DEST_PATH_IMAGE014
representation feature
Figure RE-871209DEST_PATH_IMAGE015
C represents the number of categories;
the triple loss respectively represents an anchor sample, a negative sample and a positive sample, wherein the anchor sample is a sample randomly selected from a training data set, the positive sample and the anchor sample belong to the same class, the negative sample and the anchor sample belong to different classes, the purpose of triple loss function learning is to enable the intra-class difference of the same class to be minimum and the inter-class difference of the different classes to be maximum, and the loss function can be represented as:
Figure RE-108155DEST_PATH_IMAGE016
wherein,
Figure RE-646584DEST_PATH_IMAGE010
which represents the edge-over-parameter,
Figure RE-968981DEST_PATH_IMAGE017
Figure RE-844533DEST_PATH_IMAGE018
and
Figure RE-139248DEST_PATH_IMAGE019
respectively representing anchor sample features, positive sample features and negative sample features.
8. A system for use in the method of pedestrian re-identification of non-motor vehicles according to any one of claims 1 to 7, wherein: the system comprises: the device comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;
the non-motor vehicle pedestrian re-recognition system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
CN202210524755.6A 2022-05-13 2022-05-13 Method and system for re-identifying pedestrians of non-motor vehicles Pending CN115205890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210524755.6A CN115205890A (en) 2022-05-13 2022-05-13 Method and system for re-identifying pedestrians of non-motor vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210524755.6A CN115205890A (en) 2022-05-13 2022-05-13 Method and system for re-identifying pedestrians of non-motor vehicles

Publications (1)

Publication Number Publication Date
CN115205890A true CN115205890A (en) 2022-10-18

Family

ID=83575258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210524755.6A Pending CN115205890A (en) 2022-05-13 2022-05-13 Method and system for re-identifying pedestrians of non-motor vehicles

Country Status (1)

Country Link
CN (1) CN115205890A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561372A (en) * 2023-07-03 2023-08-08 北京瑞莱智慧科技有限公司 Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium
CN116797781A (en) * 2023-07-12 2023-09-22 北京斯年智驾科技有限公司 Target detection method and device, electronic equipment and storage medium
CN117152851A (en) * 2023-10-09 2023-12-01 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN118522039A (en) * 2024-07-23 2024-08-20 南京信息工程大学 Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561372A (en) * 2023-07-03 2023-08-08 北京瑞莱智慧科技有限公司 Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium
CN116561372B (en) * 2023-07-03 2023-09-29 北京瑞莱智慧科技有限公司 Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium
CN116797781A (en) * 2023-07-12 2023-09-22 北京斯年智驾科技有限公司 Target detection method and device, electronic equipment and storage medium
CN116797781B (en) * 2023-07-12 2024-08-20 北京斯年智驾科技有限公司 Target detection method and device, electronic equipment and storage medium
CN117152851A (en) * 2023-10-09 2023-12-01 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN117152851B (en) * 2023-10-09 2024-03-08 中科天网(广东)科技有限公司 Face and human body collaborative clustering method based on large model pre-training
CN118522039A (en) * 2024-07-23 2024-08-20 南京信息工程大学 Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition

Similar Documents

Publication Publication Date Title
CN115205890A (en) Method and system for re-identifying pedestrians of non-motor vehicles
Deng et al. FEANet: Feature-enhanced attention network for RGB-thermal real-time semantic segmentation
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN111126258B (en) Image recognition method and related device
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
Chen et al. Accurate and efficient traffic sign detection using discriminative adaboost and support vector regression
CN111523410B (en) Video saliency target detection method based on attention mechanism
CN111047551A (en) Remote sensing image change detection method and system based on U-net improved algorithm
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN112541448B (en) Pedestrian re-identification method and device, electronic equipment and storage medium
CN111339832B (en) Face synthetic image detection method and device
CN109190472B (en) Pedestrian attribute identification method based on image and attribute combined guidance
Zhang et al. Attention to head locations for crowd counting
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
Mahmoodi et al. Violence detection in videos using interest frame extraction and 3D convolutional neural network
CN113139501A (en) Pedestrian multi-attribute identification method combining local area detection and multi-level feature capture
CN117373058A (en) Identification method for small-difference classroom behaviors
Ghazali et al. Pedestrian detection in infrared outdoor images based on atmospheric situation estimation
Tian et al. Domain adaptive object detection with model-agnostic knowledge transferring
CN114596548A (en) Target detection method, target detection device, computer equipment and computer-readable storage medium
Shi et al. Cpa-yolov7: Contextual and pyramid attention-based improvement of yolov7 for drones scene target detection
Liu et al. Enhanced pedestrian detection using deep learning based semantic image segmentation
CN113627383A (en) Pedestrian loitering re-identification method for panoramic intelligent security
Dong et al. Nighttime pedestrian detection with near infrared using cascaded classifiers
Naosekpam et al. Ifvsnet: intermediate features fusion based cnn for video subtitles identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination