CN115205890A - Method and system for re-identifying pedestrians of non-motor vehicles - Google Patents
Method and system for re-identifying pedestrians of non-motor vehicles Download PDFInfo
- Publication number
- CN115205890A CN115205890A CN202210524755.6A CN202210524755A CN115205890A CN 115205890 A CN115205890 A CN 115205890A CN 202210524755 A CN202210524755 A CN 202210524755A CN 115205890 A CN115205890 A CN 115205890A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- local
- human body
- image
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000012544 monitoring process Methods 0.000 claims abstract description 20
- 238000001514 detection method Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 26
- 230000007246 mechanism Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000011176 pooling Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000010923 batch production Methods 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 101100441251 Arabidopsis thaliana CSP2 gene Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011840 criminal investigation Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Social Psychology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Psychiatry (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for re-identifying pedestrians of non-motor vehicles, which comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene; carrying out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image characteristics and the local human body image; training a preset non-motor vehicle pedestrian re-recognition network model by using the preprocessed global image features and local human body images; and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle. In order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the invention trains a non-motor vehicle pedestrian re-identification model by using the preprocessed global pedestrian image and the local human body image, and adopts a characteristic fusion module to self-adaptively distribute weight for the global characteristic and the local characteristic, thereby solving the problem of non-motor vehicle pedestrian re-identification.
Description
Technical Field
The invention relates to the technical field of pedestrian re-identification, in particular to a non-motor vehicle pedestrian re-identification method and system.
Background
The pedestrian re-identification means that the same person is matched under different cameras, and the method is basically applied to monitoring scenes. Most of the existing pedestrian re-identification data sets are concentrated on pedestrians in a camera view at present, but in a real monitoring scene, the pedestrians are not only walking, but also have a large amount of riding behaviors, such as riding bicycles, electric vehicles, motorcycles and other non-motor vehicles; in criminal investigation, a large number of non-motor vehicle theft incidents occur, and therefore it is of vital importance to identify non-motor vehicle pedestrians on a large scale.
In order to adapt to real scenes, it is necessary to additionally consider non-motor vehicles when re-identifying pedestrians. Through the analysis of the non-motor vehicle pedestrians, the non-motor vehicle pedestrians generally have a definite structure and can be divided into two parts, namely a human body and a non-motor vehicle. The non-motor vehicle pedestrian weight recognition is significantly different from the traditional pedestrian and vehicle weight recognition. When different vehicles have the same model and color, they look very similar, resulting in greater inter-class similarity. Also, when a pedestrian wears similar clothing (e.g., the same uniform and jersey), difficulty is also presented to the pedestrian in re-identification. However, when a non-motor vehicle is combined with a pedestrian, the information of the non-motor vehicle (such as model and color) is combined with the information of the pedestrian (such as clothes, carried objects and the like), so that the similarity in the class is greatly reduced.
In the above, the pedestrian re-identification of the non-motor vehicle under the same condition is considered, and if the motion state of the pedestrian changes, the pedestrian information and the non-motor vehicle information need to be considered separately.
Based on the above problems, a need exists for a method and a system for re-identifying pedestrians and non-motor vehicles, which simultaneously consider pedestrian and non-motor vehicle information.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made keeping in mind the above problems occurring in the prior art.
Therefore, the invention aims to provide a non-motor vehicle pedestrian re-identification method, which aims to better solve the problem of non-motor vehicle pedestrian re-identification.
In order to solve the technical problems, the invention provides the following technical scheme: a method for re-identifying pedestrians of non-motor vehicles comprises the following steps,
the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos under different cameras in the same scene;
step two: carrying out human body detection on the non-motor vehicle pedestrian re-identification data set according to a pre-trained human body detector to obtain a local human body image, and preprocessing global image features and the local human body image;
step three: training a preset non-motor vehicle pedestrian re-recognition network model by using the global image features and the local human body images preprocessed in the second step;
step four: and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) And finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the human body detector in the second step is a yolov5s detector, and the pretreatment comprises the following specific operations: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and global image features and multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, and the finer the granularity is, the larger the N is, the more the granularity is, the two branches are respectively responsible for extracting multi-granularity local features; the local attention module adopts a hierarchical attention network HAN;
the feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the specific training mode in the third step process is as follows: first global image featuresInput into a simple binary network to obtain a Bx 2 characteristicAccording toGiving weight to the global image feature and the local human body feature, if the determination target is a riding human, the local human body part thereofWe should get higher attention and finally we fuse the global image features and local human features:
wherein,andrespectively representing global image features and local featuresFinally, the method is used for the pedestrian re-identification of the non-motor vehicles;
in order to make the network have better identification capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:
wherein,which represents the cross-entropy loss of the entropy,a loss of a triplet is represented as,andrepresenting the weight, cross entropy loss, of both loss functions separatelyThe loss function is expressed as:
whereinIndicates the number of pictures for the minimum batch process,representation featureC represents the number of categories;
the triple loss respectively represents an anchor sample, a negative sample and a positive sample, wherein the anchor sample is a sample randomly selected from a training data set, the positive sample and the anchor sample belong to the same class, the negative sample and the anchor sample belong to different classes, the purpose of triple loss function learning is to enable the intra-class difference of the same class to be minimum and the inter-class difference of the different classes to be maximum, and the loss function can be represented as:
wherein,the edge-over-parameter is represented,、andrespectively representing anchor sample features, positive sample features and negative sample features.
The system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for acquiring a sample of the non-motor vehicle;
the non-motor vehicle pedestrian re-recognition system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
The invention has the beneficial effects that: the invention provides a method and a system for re-identifying pedestrians in non-motor vehicles, which consider the difference from the traditional pedestrian re-identification, and firstly re-construct a data set for re-identifying pedestrians in non-motor vehicles according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:
fig. 1 is a schematic flow chart of a pedestrian re-identification method for a non-motor vehicle according to the present invention.
Fig. 2 is a system block diagram of a pedestrian re-identification system of a non-motor vehicle according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures of the present invention are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Example 1
Referring to fig. 1, for a first embodiment of the present invention, there is provided a non-motor pedestrian re-identification method, including the steps of:
the method comprises the following steps: the method comprises the following steps of constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene, wherein the specific mode is as follows:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) Finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set;
step two: according to a pre-trained human body detector (yolov 5s detector), human body detection is carried out on a non-motor vehicle pedestrian heavy identification data set to obtain a local human body image, and global image features and the local human body image are preprocessed in the following mode: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, randomly increasing some shielding and rotating samples to enhance training data through random horizontal overturning, random erasing, random cutting and normalization of image pixel values;
step three: the global image features and the local human body images preprocessed in the second step are used for training a preset non-motor vehicle pedestrian re-recognition network model, in order to improve the recognition performance of the non-motor vehicle pedestrian re-recognition network model, the network model is recognized in a mode that three models are matched with each other, and the non-motor vehicle pedestrian re-recognition network model is matched with each other in the mode that the three models are matched with each other, and the method comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module;
step four: performing feature extraction on a target image to be recognized by using the trained non-motor vehicle pedestrian weight recognition network model, namely using the target image to be inquired as the input of the non-motor vehicle pedestrian weight recognition network model, respectively learning the global image feature and the local human body feature of the target, adaptively distributing weights for the global image feature and the local human body feature, and fusing the global image feature and the local human body feature to obtain a final feature descriptor of the target; and performing the same operation steps on all the pictures in the candidate picture library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate library as a final identification result.
To explain further, the TraDeS tracker focuses on using the tracking information to assist in detection and feeding back the detection results to the tracking. TraDes is constructed as a point cloud-based object detection network CenterNet, which is mainly composed of two models, a correlation model (CVA) and a dynamic guided feature warping (MFW) model. The CVA extracts re-id embedded features point by point through a backbone network to construct cost quantity, and the cost quantity stores similar embedded feature pairs matched with the two frames; then, a tracking offset is extracted according to the cost amount, namely the displacement of all points in time and space, and the tracking offset integrates all the characteristics for performing a simple two-round long-time data association; then, the MFW propagates the tracking offset as dynamic information from the previous frame to the current frame; and finally, combining the propagated characteristics and the characteristics of the current frame, and then detecting and segmenting. Here, the detection and tracking model is only directly applied, and thus, the description is not repeated.
The unsupervised method infomap is initially used for face clustering, construction of adjacent edges is accelerated through faiss, good clustering effect is obtained, and meanwhile clustering speed is improved.
While the YOLO series of detectors are classical One-Stage Object Detection (One-Stage Object Detection) structures. Yolov3 is divided into four parts of an input end, a Backbone network (Backbone), a neutral and an output end. The function of the backhaul is mainly used for extracting the characteristics of an input image for the use of a following network; the Neck is mainly used for reprocessing and reasonably utilizing important features extracted from the Backbone. The integral structure Yolov3 of the Yolov4 is the same, but a plurality of integration innovations are carried out on each substructure, and particularly, mosaic data enhancement, cmBN and SAT self-confrontation training are adopted at the input end; the Backbone adopts CSPDarknet53, a Mish activation function and Dropblock; an SPP module and an FPN + PAN structure are adopted in the Neck; the anchor frame mechanism of the output end is the same as Yolov3, but the Loss function during training adopts CIOU _ Loss, and the nms screened by the prediction frame adopts DIOU _ nms.
The structure of Yolov5 is very similar to that of Yolov4, but there are also some differences. Specifically, the input end adopts adaptive anchor frame calculation and adaptive picture scaling while adopting Mosaic data enhancement; a Focus structure and a CSP structure are adopted on the Backbone; the Neck and the Yolov4 both adopt the FPN + PAN structure, and the difference is that the Neck structure of the Yolov4 adopts common convolution operation, while the Neck structure of the Yolov5 adopts the CSP2 structure designed by using CSPnet for reference, so that the network feature fusion capability is enhanced; and the output end adopts GIOU _ Loss as a Loss function of the Bounding box. Yolov5 has four network models in total, namely four models of Yolov5s, yolov5m, yolov5l and Yolov5x, wherein the Yolov5s network is the network with the minimum depth and the minimum width of a characteristic diagram in a Yolov5 series, and therefore the speed is the fastest network. Although some precision is lost compared with the other three network structures, the Yolov5s network is finally selected as the human body detector because the task complexity is not very high and the requirement on the speed is high.
It should be particularly noted that, a global image feature extraction module (multiple granular Network, multi-Granularity Network) selects MGN as a basic skeleton, and an input global image passes through MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether an input picture is ridden, wherein if the judgment target is a riding person, higher weight is given to the local features, otherwise, higher weight is given to the global image features, so that the model can be helped to better solve the problem of pedestrian re-identification of the non-motor vehicles, and meanwhile, the model can be helped to obtain higher performance for pedestrian re-identification under the traditional condition;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and the global image features and the multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, the larger the granularity is, the finer the N is, the two branches are respectively responsible for extracting multi-granularity local features, and the global and multi-granularity local features are combined together, so that rich information and details can be obtained to represent the input global pedestrian image; the backbone network of the MGN uses the resnet50, and is divided into three branches from the latter half of the network, and the structure of the three branchesSimilarly, the downsampling rate is different, global branches are downsampled by using stride =2 convolution, 2048-dimensional feature vectors are generated by adopting global maximum pooling on the obtained feature map, and the feature vectors are compressed into 256-dimensional global feature vectors by 1 × 1 convolution(ii) a Two local branches are used to learn the local feature representation, and in order to preserve the receptive field that fits the local features, neither branch uses downsampling. Wherein, one local branch divides the characteristic diagram into 2 strips uniformly in the horizontal direction, which can be understood as dividing the pedestrian into an upper half body and a lower half body; the other partial branch divides the characteristic diagram into 3 strips uniformly in the horizontal direction, namely the pedestrian is divided into an upper part, a middle part and a lower part. The two local branches operate in a similar manner, first compressing the feature map before segmentation into a 256-dimensional global feature vector by 1 × 1 convolutionAndafter the segmentation is finished, global pooling is firstly carried out on each strip, then dimension reduction operation is carried out, and two 256-dimensional local feature vectors are obtained by one local branchAndthe other local branch obtains three 256-dimensional local feature vectors、Andfinally, the 8 256-dimensional features are connected in series to form a 2048-dimensional feature which is used as the global feature of the input global pedestrian image.
The local attention module employs a hierarchical attention network HAN, which is originally used for text classification, with two hierarchical attention mechanisms that can focus different attention on content of different importance. The characteristic diagrams are provided with different channels, the meanings represented by the different channels are different, so that the contribution to final recognition is different, and meanwhile, different spatial positions of the characteristic diagrams also have different semantics, so that the expression of the local human body image on the channel and spatial dimensions is increased by adopting HAN. Specifically, the local human body image passes through the front half of the resnet50 to obtain a B × 2048 × 24 × 8 feature map, where B represents the batch size. The third channel of the feature map is segmented to obtain three feature mapsThe dimensionality is B multiplied by 2048 multiplied by 8, each feature map passes through a channel attention mechanism which comprises a generalized average pooling layer, a full connection layer for dimensionality reduction, a ReLU layer, a full connection layer for dimensionality increase and a sigmoid activation function, and the processed feature map becomes. For the features above channel c can be expressed asSpatial attention of the features is achieved by enhancing the peak response.
The feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
In the third step, the specific training mode is as follows: first global image featuresInput into a simple binary network to obtain a Bx 2 characteristicAccording toGiving weights to the global image features and the local human body features, if the target is a riding person, obtaining higher attention from the local human body part, and finally fusing the global image features and the local human body features to obtain the following result:
wherein,andrespectively representing global image features and local featuresFinally, the method is used for the pedestrian re-identification of the non-motor vehicles;
in order to make the network have better recognition capability, a cross-entropy loss function (cross-entropy loss) for classification and a ternary loss function (triplet loss) for metric learning are used as the loss functions of the training process at the same time:
wherein,which represents the cross-entropy loss in the entropy domain,a loss in the triplet is represented as,andthe weights of the two kinds of loss functions are respectively expressed, and the cross entropy loss function is expressed as:
whereinIndicates the number of pictures for the minimum batch process,representation featureC represents the number of categories;
the triple loss respectively represents an Anchor sample (Anchor), a Negative sample (Negative) and a Positive sample (Positive), wherein the Anchor sample is a sample randomly selected from a training data set, the Positive sample and the Anchor sample belong to the same class, the Negative sample and the Anchor sample belong to different classes, the triple loss function learning aims to enable the intra-class difference of the same class to tend to be minimum, and the inter-class difference of the different classes to tend to be maximum, and the loss function can be expressed as:
wherein,the edge-over-parameter is represented,、andrespectively representing anchor sample features, positive sample features and negative sample features.
Example 2
Referring to fig. 2, a second embodiment of the present invention, which is different from the first embodiment, is: the system for the pedestrian re-identification method of the non-motor vehicle in the embodiment is provided.
The system comprises: the system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;
the system comprises a sample acquisition unit, a data processing unit, a model training unit, a model application unit and a non-motor vehicle pedestrian re-recognition network model, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
According to the method and the system for re-identifying the non-motor vehicle pedestrians, the difference from the traditional pedestrian re-identification is considered, and firstly, a non-motor vehicle pedestrian re-identification data set is reconstructed according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.
Claims (8)
1. A pedestrian re-identification method of a non-motor vehicle is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene;
step two: according to a pre-trained human body detector, performing human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image features and the local human body image;
step three: training a preset non-motor vehicle pedestrian re-recognition network model by using the global image features and the local human body images preprocessed in the second step;
step four: and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle.
2. The non-motor pedestrian re-identification method according to claim 1, characterized in that: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:
(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;
(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;
(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;
(4) And finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set.
3. The non-motor pedestrian re-identification method according to claim 2, characterized in that: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.
4. The non-motor pedestrian re-identification method according to claim 3, characterized in that: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.
5. A method of pedestrian re-identification of a non-motor vehicle as claimed in claim 1~4 wherein: the human body detector in the second step is a yolov5s detector, and the specific operation of pretreatment is as follows: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.
6. The non-motor pedestrian re-identification method according to claim 3, characterized in that: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;
the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and global image features and multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, and the finer the granularity is, the larger the N is, the more the granularity is, the two branches are respectively responsible for extracting multi-granularity local features; the local attention module adopts a hierarchical attention network HAN;
the feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.
7. Non-automotive pedestrian re-identification as claimed in claim 6The method is characterized in that: the specific training mode in the third step process is as follows: first global image featuresInput into a simple binary network to obtain a Bx 2 characteristicAccording toGiving weights to the global image features and the local human body features, if the judgment target is a riding person, the local human body part of the riding person needs to obtain higher attention, and finally fusing the global image features and the local human body features:
wherein,andrespectively representing global image features and local featuresFinally, the method is used for identifying the pedestrians of the non-motor vehicles;
in order to make the network have better recognition capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:
wherein,which represents the cross-entropy loss in the entropy domain,a loss in the triplet is represented as,andthe weights of the two kinds of loss functions are respectively expressed, and the cross entropy loss function is expressed as:
whereinIndicating the minimum number of pictures to be batched,representation featureC represents the number of categories;
the triple loss respectively represents an anchor sample, a negative sample and a positive sample, wherein the anchor sample is a sample randomly selected from a training data set, the positive sample and the anchor sample belong to the same class, the negative sample and the anchor sample belong to different classes, the purpose of triple loss function learning is to enable the intra-class difference of the same class to be minimum and the inter-class difference of the different classes to be maximum, and the loss function can be represented as:
8. A system for use in the method of pedestrian re-identification of non-motor vehicles according to any one of claims 1 to 7, wherein: the system comprises: the device comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;
the non-motor vehicle pedestrian re-recognition system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524755.6A CN115205890A (en) | 2022-05-13 | 2022-05-13 | Method and system for re-identifying pedestrians of non-motor vehicles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524755.6A CN115205890A (en) | 2022-05-13 | 2022-05-13 | Method and system for re-identifying pedestrians of non-motor vehicles |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115205890A true CN115205890A (en) | 2022-10-18 |
Family
ID=83575258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210524755.6A Pending CN115205890A (en) | 2022-05-13 | 2022-05-13 | Method and system for re-identifying pedestrians of non-motor vehicles |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115205890A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561372A (en) * | 2023-07-03 | 2023-08-08 | 北京瑞莱智慧科技有限公司 | Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium |
CN116797781A (en) * | 2023-07-12 | 2023-09-22 | 北京斯年智驾科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN117152851A (en) * | 2023-10-09 | 2023-12-01 | 中科天网(广东)科技有限公司 | Face and human body collaborative clustering method based on large model pre-training |
CN118522039A (en) * | 2024-07-23 | 2024-08-20 | 南京信息工程大学 | Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition |
-
2022
- 2022-05-13 CN CN202210524755.6A patent/CN115205890A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561372A (en) * | 2023-07-03 | 2023-08-08 | 北京瑞莱智慧科技有限公司 | Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium |
CN116561372B (en) * | 2023-07-03 | 2023-09-29 | 北京瑞莱智慧科技有限公司 | Personnel gear gathering method and device based on multiple algorithm engines and readable storage medium |
CN116797781A (en) * | 2023-07-12 | 2023-09-22 | 北京斯年智驾科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN116797781B (en) * | 2023-07-12 | 2024-08-20 | 北京斯年智驾科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN117152851A (en) * | 2023-10-09 | 2023-12-01 | 中科天网(广东)科技有限公司 | Face and human body collaborative clustering method based on large model pre-training |
CN117152851B (en) * | 2023-10-09 | 2024-03-08 | 中科天网(广东)科技有限公司 | Face and human body collaborative clustering method based on large model pre-training |
CN118522039A (en) * | 2024-07-23 | 2024-08-20 | 南京信息工程大学 | Frame extraction pedestrian retrieval method based on YOLOv s and stage type regular combined pedestrian re-recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115205890A (en) | Method and system for re-identifying pedestrians of non-motor vehicles | |
Deng et al. | FEANet: Feature-enhanced attention network for RGB-thermal real-time semantic segmentation | |
CN111931684B (en) | Weak and small target detection method based on video satellite data identification features | |
CN111126258B (en) | Image recognition method and related device | |
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
Chen et al. | Accurate and efficient traffic sign detection using discriminative adaboost and support vector regression | |
CN111523410B (en) | Video saliency target detection method based on attention mechanism | |
CN111047551A (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN110263712B (en) | Coarse and fine pedestrian detection method based on region candidates | |
CN112541448B (en) | Pedestrian re-identification method and device, electronic equipment and storage medium | |
CN111339832B (en) | Face synthetic image detection method and device | |
CN109190472B (en) | Pedestrian attribute identification method based on image and attribute combined guidance | |
Zhang et al. | Attention to head locations for crowd counting | |
Cho et al. | Semantic segmentation with low light images by modified CycleGAN-based image enhancement | |
Mahmoodi et al. | Violence detection in videos using interest frame extraction and 3D convolutional neural network | |
CN113139501A (en) | Pedestrian multi-attribute identification method combining local area detection and multi-level feature capture | |
CN117373058A (en) | Identification method for small-difference classroom behaviors | |
Ghazali et al. | Pedestrian detection in infrared outdoor images based on atmospheric situation estimation | |
Tian et al. | Domain adaptive object detection with model-agnostic knowledge transferring | |
CN114596548A (en) | Target detection method, target detection device, computer equipment and computer-readable storage medium | |
Shi et al. | Cpa-yolov7: Contextual and pyramid attention-based improvement of yolov7 for drones scene target detection | |
Liu et al. | Enhanced pedestrian detection using deep learning based semantic image segmentation | |
CN113627383A (en) | Pedestrian loitering re-identification method for panoramic intelligent security | |
Dong et al. | Nighttime pedestrian detection with near infrared using cascaded classifiers | |
Naosekpam et al. | Ifvsnet: intermediate features fusion based cnn for video subtitles identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |