CN115205890A

CN115205890A - Method and system for re-identifying pedestrians of non-motor vehicles

Info

Publication number: CN115205890A
Application number: CN202210524755.6A
Authority: CN
Inventors: 鞠蓉
Original assignee: Nanjing Boya Jizhi Intelligent Technology Co ltd
Current assignee: Nanjing Boya Jizhi Intelligent Technology Co ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-10-18

Abstract

The invention discloses a method and a system for re-identifying pedestrians of non-motor vehicles, which comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene; carrying out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image characteristics and the local human body image; training a preset non-motor vehicle pedestrian re-recognition network model by using the preprocessed global image features and local human body images; and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle. In order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the invention trains a non-motor vehicle pedestrian re-identification model by using the preprocessed global pedestrian image and the local human body image, and adopts a characteristic fusion module to self-adaptively distribute weight for the global characteristic and the local characteristic, thereby solving the problem of non-motor vehicle pedestrian re-identification.

Description

Method and system for re-identifying pedestrians of non-motor vehicles

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to a non-motor vehicle pedestrian re-identification method and system.

Background

The pedestrian re-identification means that the same person is matched under different cameras, and the method is basically applied to monitoring scenes. Most of the existing pedestrian re-identification data sets are concentrated on pedestrians in a camera view at present, but in a real monitoring scene, the pedestrians are not only walking, but also have a large amount of riding behaviors, such as riding bicycles, electric vehicles, motorcycles and other non-motor vehicles; in criminal investigation, a large number of non-motor vehicle theft incidents occur, and therefore it is of vital importance to identify non-motor vehicle pedestrians on a large scale.

In order to adapt to real scenes, it is necessary to additionally consider non-motor vehicles when re-identifying pedestrians. Through the analysis of the non-motor vehicle pedestrians, the non-motor vehicle pedestrians generally have a definite structure and can be divided into two parts, namely a human body and a non-motor vehicle. The non-motor vehicle pedestrian weight recognition is significantly different from the traditional pedestrian and vehicle weight recognition. When different vehicles have the same model and color, they look very similar, resulting in greater inter-class similarity. Also, when a pedestrian wears similar clothing (e.g., the same uniform and jersey), difficulty is also presented to the pedestrian in re-identification. However, when a non-motor vehicle is combined with a pedestrian, the information of the non-motor vehicle (such as model and color) is combined with the information of the pedestrian (such as clothes, carried objects and the like), so that the similarity in the class is greatly reduced.

In the above, the pedestrian re-identification of the non-motor vehicle under the same condition is considered, and if the motion state of the pedestrian changes, the pedestrian information and the non-motor vehicle information need to be considered separately.

Based on the above problems, a need exists for a method and a system for re-identifying pedestrians and non-motor vehicles, which simultaneously consider pedestrian and non-motor vehicle information.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made keeping in mind the above problems occurring in the prior art.

Therefore, the invention aims to provide a non-motor vehicle pedestrian re-identification method, which aims to better solve the problem of non-motor vehicle pedestrian re-identification.

In order to solve the technical problems, the invention provides the following technical scheme: a method for re-identifying pedestrians of non-motor vehicles comprises the following steps,

the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos under different cameras in the same scene;

step two: carrying out human body detection on the non-motor vehicle pedestrian re-identification data set according to a pre-trained human body detector to obtain a local human body image, and preprocessing global image features and the local human body image;

step three: training a preset non-motor vehicle pedestrian re-recognition network model by using the global image features and the local human body images preprocessed in the second step;

step four: and performing feature extraction on the target image to be recognized by using the trained pedestrian re-recognition network model of the non-motor vehicle.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:

(1) Firstly, selecting monitoring videos of the same scene under different cameras, and detecting and tracking targets of pedestrians and pedestrians riding non-motor vehicles in the monitoring videos by adopting an online detection and tracking model TraDes to obtain target frame information and track information in each monitoring video;

(2) Then extracting all detected target characteristics by adopting a pre-trained resnet50 deep learning network model;

(3) Clustering all targets through an unsupervised method, namely, clustering all targets to associate the same target under a cross-camera;

(4) And finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the human body detector in the second step is a yolov5s detector, and the pretreatment comprises the following specific operations: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;

the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and global image features and multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, and the finer the granularity is, the larger the N is, the more the granularity is, the two branches are respectively responsible for extracting multi-granularity local features; the local attention module adopts a hierarchical attention network HAN;

the feature fusion module is an adaptive attention module, determines the weight of the global and local features by distinguishing input types, and determines whether to give more weight to the local features by judging whether the bicycle is a rider.

As a preferable aspect of the non-motor vehicle pedestrian re-identification method of the present invention, wherein: the specific training mode in the third step process is as follows: first global image features

Input into a simple binary network to obtain a Bx 2 characteristic

According to

Giving weight to the global image feature and the local human body feature, if the determination target is a riding human, the local human body part thereofWe should get higher attention and finally we fuse the global image features and local human features:

+

wherein,

and

respectively representing global image features and local features

Finally, the method is used for the pedestrian re-identification of the non-motor vehicles;

in order to make the network have better identification capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:

wherein,

which represents the cross-entropy loss of the entropy,

a loss of a triplet is represented as,

and

representing the weight, cross entropy loss, of both loss functions separatelyThe loss function is expressed as:

wherein

Indicates the number of pictures for the minimum batch process,

representation feature

C represents the number of categories;

the triple loss respectively represents an anchor sample, a negative sample and a positive sample, wherein the anchor sample is a sample randomly selected from a training data set, the positive sample and the anchor sample belong to the same class, the negative sample and the anchor sample belong to different classes, the purpose of triple loss function learning is to enable the intra-class difference of the same class to be minimum and the inter-class difference of the different classes to be maximum, and the loss function can be represented as:

wherein,

the edge-over-parameter is represented,

、

and

respectively representing anchor sample features, positive sample features and negative sample features.

The system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for acquiring a sample of the non-motor vehicle;

the non-motor vehicle pedestrian re-recognition system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.

The invention has the beneficial effects that: the invention provides a method and a system for re-identifying pedestrians in non-motor vehicles, which consider the difference from the traditional pedestrian re-identification, and firstly re-construct a data set for re-identifying pedestrians in non-motor vehicles according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor. Wherein:

fig. 1 is a schematic flow chart of a pedestrian re-identification method for a non-motor vehicle according to the present invention.

Fig. 2 is a system block diagram of a pedestrian re-identification system of a non-motor vehicle according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, embodiments accompanying figures of the present invention are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Example 1

Referring to fig. 1, for a first embodiment of the present invention, there is provided a non-motor pedestrian re-identification method, including the steps of:

the method comprises the following steps: the method comprises the following steps of constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene, wherein the specific mode is as follows:

(4) Finally, carrying out manual calibration to construct a final non-motor vehicle pedestrian re-identification data set;

step two: according to a pre-trained human body detector (yolov 5s detector), human body detection is carried out on a non-motor vehicle pedestrian heavy identification data set to obtain a local human body image, and global image features and the local human body image are preprocessed in the following mode: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, randomly increasing some shielding and rotating samples to enhance training data through random horizontal overturning, random erasing, random cutting and normalization of image pixel values;

step three: the global image features and the local human body images preprocessed in the second step are used for training a preset non-motor vehicle pedestrian re-recognition network model, in order to improve the recognition performance of the non-motor vehicle pedestrian re-recognition network model, the network model is recognized in a mode that three models are matched with each other, and the non-motor vehicle pedestrian re-recognition network model is matched with each other in the mode that the three models are matched with each other, and the method comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module;

step four: performing feature extraction on a target image to be recognized by using the trained non-motor vehicle pedestrian weight recognition network model, namely using the target image to be inquired as the input of the non-motor vehicle pedestrian weight recognition network model, respectively learning the global image feature and the local human body feature of the target, adaptively distributing weights for the global image feature and the local human body feature, and fusing the global image feature and the local human body feature to obtain a final feature descriptor of the target; and performing the same operation steps on all the pictures in the candidate picture library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate library as a final identification result.

To explain further, the TraDeS tracker focuses on using the tracking information to assist in detection and feeding back the detection results to the tracking. TraDes is constructed as a point cloud-based object detection network CenterNet, which is mainly composed of two models, a correlation model (CVA) and a dynamic guided feature warping (MFW) model. The CVA extracts re-id embedded features point by point through a backbone network to construct cost quantity, and the cost quantity stores similar embedded feature pairs matched with the two frames; then, a tracking offset is extracted according to the cost amount, namely the displacement of all points in time and space, and the tracking offset integrates all the characteristics for performing a simple two-round long-time data association; then, the MFW propagates the tracking offset as dynamic information from the previous frame to the current frame; and finally, combining the propagated characteristics and the characteristics of the current frame, and then detecting and segmenting. Here, the detection and tracking model is only directly applied, and thus, the description is not repeated.

The unsupervised method infomap is initially used for face clustering, construction of adjacent edges is accelerated through faiss, good clustering effect is obtained, and meanwhile clustering speed is improved.

While the YOLO series of detectors are classical One-Stage Object Detection (One-Stage Object Detection) structures. Yolov3 is divided into four parts of an input end, a Backbone network (Backbone), a neutral and an output end. The function of the backhaul is mainly used for extracting the characteristics of an input image for the use of a following network; the Neck is mainly used for reprocessing and reasonably utilizing important features extracted from the Backbone. The integral structure Yolov3 of the Yolov4 is the same, but a plurality of integration innovations are carried out on each substructure, and particularly, mosaic data enhancement, cmBN and SAT self-confrontation training are adopted at the input end; the Backbone adopts CSPDarknet53, a Mish activation function and Dropblock; an SPP module and an FPN + PAN structure are adopted in the Neck; the anchor frame mechanism of the output end is the same as Yolov3, but the Loss function during training adopts CIOU _ Loss, and the nms screened by the prediction frame adopts DIOU _ nms.

The structure of Yolov5 is very similar to that of Yolov4, but there are also some differences. Specifically, the input end adopts adaptive anchor frame calculation and adaptive picture scaling while adopting Mosaic data enhancement; a Focus structure and a CSP structure are adopted on the Backbone; the Neck and the Yolov4 both adopt the FPN + PAN structure, and the difference is that the Neck structure of the Yolov4 adopts common convolution operation, while the Neck structure of the Yolov5 adopts the CSP2 structure designed by using CSPnet for reference, so that the network feature fusion capability is enhanced; and the output end adopts GIOU _ Loss as a Loss function of the Bounding box. Yolov5 has four network models in total, namely four models of Yolov5s, yolov5m, yolov5l and Yolov5x, wherein the Yolov5s network is the network with the minimum depth and the minimum width of a characteristic diagram in a Yolov5 series, and therefore the speed is the fastest network. Although some precision is lost compared with the other three network structures, the Yolov5s network is finally selected as the human body detector because the task complexity is not very high and the requirement on the speed is high.

It should be particularly noted that, a global image feature extraction module (multiple granular Network, multi-Granularity Network) selects MGN as a basic skeleton, and an input global image passes through MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether an input picture is ridden, wherein if the judgment target is a riding person, higher weight is given to the local features, otherwise, higher weight is given to the global image features, so that the model can be helped to better solve the problem of pedestrian re-identification of the non-motor vehicles, and meanwhile, the model can be helped to obtain higher performance for pedestrian re-identification under the traditional condition;

the MGN adopted by the global pedestrian feature extraction network is a multi-branch depth network, and the global image features and the multi-granularity local features are combined at the same time; one branch is used for extracting global image features and is responsible for extracting common features; then dividing the image into N strips, wherein different N represents different granularity and is responsible for extracting features of different levels or different levels, the larger the granularity is, the finer the N is, the two branches are respectively responsible for extracting multi-granularity local features, and the global and multi-granularity local features are combined together, so that rich information and details can be obtained to represent the input global pedestrian image; the backbone network of the MGN uses the resnet50, and is divided into three branches from the latter half of the network, and the structure of the three branchesSimilarly, the downsampling rate is different, global branches are downsampled by using stride =2 convolution, 2048-dimensional feature vectors are generated by adopting global maximum pooling on the obtained feature map, and the feature vectors are compressed into 256-dimensional global feature vectors by 1 × 1 convolution

(ii) a Two local branches are used to learn the local feature representation, and in order to preserve the receptive field that fits the local features, neither branch uses downsampling. Wherein, one local branch divides the characteristic diagram into 2 strips uniformly in the horizontal direction, which can be understood as dividing the pedestrian into an upper half body and a lower half body; the other partial branch divides the characteristic diagram into 3 strips uniformly in the horizontal direction, namely the pedestrian is divided into an upper part, a middle part and a lower part. The two local branches operate in a similar manner, first compressing the feature map before segmentation into a 256-dimensional global feature vector by 1 × 1 convolution

And

after the segmentation is finished, global pooling is firstly carried out on each strip, then dimension reduction operation is carried out, and two 256-dimensional local feature vectors are obtained by one local branch

And

the other local branch obtains three 256-dimensional local feature vectors

、

And

finally, the 8 256-dimensional features are connected in series to form a 2048-dimensional feature which is used as the global feature of the input global pedestrian image.

The local attention module employs a hierarchical attention network HAN, which is originally used for text classification, with two hierarchical attention mechanisms that can focus different attention on content of different importance. The characteristic diagrams are provided with different channels, the meanings represented by the different channels are different, so that the contribution to final recognition is different, and meanwhile, different spatial positions of the characteristic diagrams also have different semantics, so that the expression of the local human body image on the channel and spatial dimensions is increased by adopting HAN. Specifically, the local human body image passes through the front half of the resnet50 to obtain a B × 2048 × 24 × 8 feature map, where B represents the batch size. The third channel of the feature map is segmented to obtain three feature maps

The dimensionality is B multiplied by 2048 multiplied by 8, each feature map passes through a channel attention mechanism which comprises a generalized average pooling layer, a full connection layer for dimensionality reduction, a ReLU layer, a full connection layer for dimensionality increase and a sigmoid activation function, and the processed feature map becomes

. For the features above channel c can be expressed as

Spatial attention of the features is achieved by enhancing the peak response.

In the third step, the specific training mode is as follows: first global image features

Input into a simple binary network to obtain a Bx 2 characteristic

According to

Giving weights to the global image features and the local human body features, if the target is a riding person, obtaining higher attention from the local human body part, and finally fusing the global image features and the local human body features to obtain the following result:

+

wherein,

and

respectively representing global image features and local features

in order to make the network have better recognition capability, a cross-entropy loss function (cross-entropy loss) for classification and a ternary loss function (triplet loss) for metric learning are used as the loss functions of the training process at the same time:

wherein,

which represents the cross-entropy loss in the entropy domain,

a loss in the triplet is represented as,

and

the weights of the two kinds of loss functions are respectively expressed, and the cross entropy loss function is expressed as:

wherein

Indicates the number of pictures for the minimum batch process,

representation feature

C represents the number of categories;

the triple loss respectively represents an Anchor sample (Anchor), a Negative sample (Negative) and a Positive sample (Positive), wherein the Anchor sample is a sample randomly selected from a training data set, the Positive sample and the Anchor sample belong to the same class, the Negative sample and the Anchor sample belong to different classes, the triple loss function learning aims to enable the intra-class difference of the same class to tend to be minimum, and the inter-class difference of the different classes to tend to be maximum, and the loss function can be expressed as:

wherein,

the edge-over-parameter is represented,

、

and

Example 2

Referring to fig. 2, a second embodiment of the present invention, which is different from the first embodiment, is: the system for the pedestrian re-identification method of the non-motor vehicle in the embodiment is provided.

The system comprises: the system comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;

the system comprises a sample acquisition unit, a data processing unit, a model training unit, a model application unit and a non-motor vehicle pedestrian re-recognition network model, wherein the sample acquisition unit is used for constructing a non-motor vehicle pedestrian re-recognition data set according to monitoring videos of different cameras in the same scene, the data processing unit is used for carrying out human body detection on the non-motor vehicle pedestrian re-recognition data set according to a pre-trained human body detector and carrying out pre-processing on a global image and a local human body image, the model training unit is used for training a preset non-motor vehicle pedestrian re-recognition network model by using the pre-processed global pedestrian image and local human body image, and the model application unit is used for carrying out feature extraction on an image to be recognized by using the trained non-motor vehicle pedestrian re-recognition network model.

According to the method and the system for re-identifying the non-motor vehicle pedestrians, the difference from the traditional pedestrian re-identification is considered, and firstly, a non-motor vehicle pedestrian re-identification data set is reconstructed according to the monitoring videos of different cameras in the same scene; then, considering the independence between the non-motor vehicles and the pedestrians, a pre-trained human body detector is adopted to carry out human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and the global image and the local human body image are preprocessed; in order to fully utilize the global image and the local human body image to obtain the global characteristic and the local characteristic, the preprocessed global pedestrian image and the preprocessed local human body image are used for training a non-motor vehicle pedestrian re-identification model, and the characteristic fusion module is adopted for self-adaptively distributing weights to the global characteristic and the local characteristic, so that the model can be helped to better solve the problem of non-motor vehicle pedestrian re-identification, and meanwhile, the model can be helped to re-identify pedestrians under the traditional condition to obtain higher performance.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A pedestrian re-identification method of a non-motor vehicle is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

the method comprises the following steps: constructing a non-motor vehicle pedestrian re-identification data set according to monitoring videos of different cameras in the same scene;

step two: according to a pre-trained human body detector, performing human body detection on the non-motor vehicle pedestrian re-identification data set to obtain a local human body image, and preprocessing global image features and the local human body image;

2. The non-motor pedestrian re-identification method according to claim 1, characterized in that: in the first step, the construction of the non-motor vehicle pedestrian re-identification data set comprises four steps:

3. The non-motor pedestrian re-identification method according to claim 2, characterized in that: the non-motor vehicle pedestrian re-identification network model adopts three models to be matched with each other, and comprises the following steps: the system comprises a global image feature extraction module, a local attention module and a feature fusion module.

4. The non-motor pedestrian re-identification method according to claim 3, characterized in that: in the fourth step, the target image to be inquired is used as the input of the non-motor vehicle pedestrian re-identification network model, the global image characteristic and the local human body characteristic of the target are respectively learned, the weights are adaptively distributed to the global image characteristic and the local human body characteristic, and the global image characteristic and the local human body characteristic are fused to be used as the final characteristic descriptor of the target; and performing the same operation steps on all the pictures in the candidate image library to obtain the feature descriptors of the pictures, calculating cosine distances of the features of the query picture and all the pictures in the candidate image library, sequencing the distances, and selecting a target with the highest similarity to the query distance from the candidate image library as a final identification result.

5. A method of pedestrian re-identification of a non-motor vehicle as claimed in claim 1~4 wherein: the human body detector in the second step is a yolov5s detector, and the specific operation of pretreatment is as follows: adjusting all original images for training and testing and the size of the detected local human body image to 384 multiplied by 128; then, through random horizontal turning, random erasing, random cutting and normalization of image pixel values, a plurality of shielding and rotating samples are added randomly to enhance the training data.

6. The non-motor pedestrian re-identification method according to claim 3, characterized in that: the global image feature extraction module selects MGN as a basic skeleton, and the input global image passes through the MGN to obtain global image features; the local attention module comprises a channel attention mechanism and a space attention mechanism, and the input local human body image obtains local characteristics after passing through the local attention module; the feature fusion module has the core idea that different weights are given to the global image features and the local features according to whether the input picture is ridden, wherein if the judgment target is a rider, higher weight is given to the local features, and otherwise, higher weight is given to the global image features;

7. Non-automotive pedestrian re-identification as claimed in claim 6The method is characterized in that: the specific training mode in the third step process is as follows: first global image features

Input into a simple binary network to obtain a Bx 2 characteristic

According to

Giving weights to the global image features and the local human body features, if the judgment target is a riding person, the local human body part of the riding person needs to obtain higher attention, and finally fusing the global image features and the local human body features:

+

wherein,

and

respectively representing global image features and local features

Finally, the method is used for identifying the pedestrians of the non-motor vehicles;

in order to make the network have better recognition capability, a cross entropy loss function for classification and a ternary loss function for metric learning are used as the loss functions of the training process at the same time:

wherein,

which represents the cross-entropy loss in the entropy domain,

a loss in the triplet is represented as,

and

wherein

Indicating the minimum number of pictures to be batched,

representation feature

C represents the number of categories;

wherein,

which represents the edge-over-parameter,

、

and

8. A system for use in the method of pedestrian re-identification of non-motor vehicles according to any one of claims 1 to 7, wherein: the system comprises: the device comprises a sample acquisition unit, a data processing unit, a model training unit and a model application unit;