CN114998934B

CN114998934B - Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion

Info

Publication number: CN114998934B
Application number: CN202210742934.7A
Authority: CN
Inventors: 高赞; 龚丽敏; 宋健明; 张蕊; 陶瑞涛; 聂礼强
Original assignee: Suzhou Calmcar Vision Electronic Technology Co ltd; Shandong University; Qingdao Haier Smart Technology R&D Co Ltd; Shandong Institute of Artificial Intelligence
Current assignee: Suzhou Calmcar Vision Electronic Technology Co ltd; Shandong University; Qingdao Haier Smart Technology R&D Co Ltd; Shandong Institute of Artificial Intelligence
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2023-01-03
Anticipated expiration: 2042-06-27
Also published as: CN114998934A

Abstract

The invention provides a method, a system, electronic equipment and a storage medium for re-identifying and retrieving clothes-changed pedestrians based on multi-mode intelligent perception and fusion, and belongs to the technical field of computer vision, wherein original pedestrian images are subjected to pixel sampling, and the obtained pixels are modified according to a human body analytic graph to obtain the clothes-changed pedestrian images; then 2D feature extraction is respectively carried out on the original pedestrian image and the pedestrian image after changing clothes, and 3D feature extraction is carried out on point cloud data; and finally, identifying the identity of the pedestrian according to the extracted features. Finally, the technical effect of efficiently and accurately identifying the clothes changing weight of the pedestrian is achieved.

Description

Clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion.

Background

With the development of machine learning technology and deep learning models in the field of computer vision, human body recognition based on a deep learning method is also widely applied in security scenes. Among them, the technology represented by pedestrian re-identification (Person ReID) is widely applied to pedestrian tracking and cross-camera pedestrian retrieval, and can also be used as an effective substitute technology when face identification fails. The object of the pedestrian re-identification task is to solve the problem of retrieving a target pedestrian across cameras, in particular to determine whether a specific pedestrian is present in images or video sequences captured by different cameras based on computer vision techniques. The pedestrian re-identification can be combined with personnel detection and personnel tracking technologies, and plays an important role in scenes such as city planning and intelligent monitoring. Pedestrian re-identification needs to take a video or image content acquired by a skynet camera or intelligent monitoring as a medium and is influenced by factors such as long shooting distance, low image pixel and the like, and moreover, the huge change of the background, the change of illumination conditions, the change of postures and the change of the visual angle of the camera often happen simultaneously, so that the pedestrian re-identification becomes a challenging task.

At present, the pedestrian re-identification has achieved a stable identification effect. However, the conventional pedestrian re-identification is established on the premise that the appearance of the pedestrian is not changed in a short time. The existing re-identification of clothes-changing pedestrians is carried out based on images, for example, yang et al introduces space polar coordinate transformation on a contour sketch to learn shape characteristics (PRCC), qian et al utilizes human key points to eliminate influence of appearance (LTCC) and Hong et al proposes a fine-grained shape-appearance mutual learning framework-work (FSAM); although the purpose of clothes change weight identification is basically achieved, the following disadvantages still exist:

1) Because the image is basically derived from video surveillance, the face of a person in the video surveillance image may be blurred, and effective identification information is hardly provided at this time; if the features are extracted by only using the body shape, the body outline image and the like of the person in the video monitoring image, other feature information irrelevant to the clothes on the original image can be ignored;

2) In the existing image feature extraction process, only 2D image data is obtained, and three-dimensional features of a human body in an image are lacked, so that the feature learned by a re-identification model of a clothes-changing person is lacked in distinctiveness and robustness.

Therefore, a clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion is needed.

Disclosure of Invention

The invention provides a clothes-changing pedestrian re-identification and retrieval method, a system, electronic equipment and a storage medium based on multi-mode intelligent perception and fusion, which are used for overcoming at least one technical problem in the prior art.

In order to achieve the purpose, the invention provides a clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion, which comprises the following steps:

acquiring a pedestrian image to be identified, and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information;

inputting original pedestrian images, human body analytic graphs and point cloud data into a pre-trained clothes-changing pedestrian re-identification model;

respectively performing coat sampling and trousers sampling on an original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to pixel information of a human body analytic graph to obtain a pedestrian image after changing clothes;

respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrians;

and classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified.

Further, preferably, the method for acquiring the image of the pedestrian after changing the coat pixels and the trousers pixels according to the pixel information of the human body analysis chart includes:

acquiring a coat pixel set and all vector representations of a coat by using a human body analytic graph, wherein the human body analytic graph is acquired by a pre-trained human body analytic model, and the semantic result is described as S = [ S ] ₁ ，S ₂ ，...，S _B ](ii) a Wherein S is _i Is 1 × h × w; for original pedestrian image X = [ X = ₁ ，X ₂ ，...，X _B ]Performing a random processing to obtain

Assume that all vector pixels of X are represented as

Wherein, the first and the second end of the pipe are connected with each other,

a pixel value representing a jacket portion, M represents a total pixel value, and M = B · H · W; performing semantic segmentation on the randomly processed original pedestrian image and acquiring a semantic segmentation result

Obtaining the pixel vector of the jacket according to the obtained semantic segmentation result, and further obtaining the pixel of the jacket according to the pixel vector of the jacketCollection of

Changing all vector representations of the coat with the coat pixel set;

acquiring a trousers pixel set and all vector representations of trousers according to a human body analytic graph, and changing all vector representations of the trousers by using the trousers pixel set;

and acquiring the image of the changed pedestrian through all vector representations of the changed upper garment and all vector representations of the changed trousers.

Further, preferably, the method for respectively extracting features of the coat-changed pedestrian image, the original pedestrian image and the point cloud data and fusing the extracted features to obtain the identity features of the pedestrian comprises the following steps:

performing feature extraction on the pedestrian image after changing the clothes and the original pedestrian image to obtain a 2D feature map, and performing feature extraction on the point cloud data to obtain a 3D feature map;

inputting the 2D feature map and the 3D feature map into an attention mechanism network respectively, and acquiring a third 2D feature map and a third 3D feature map; acquiring a first 2D feature map through a channel attention module according to the 2D feature map; multiplying the 2D feature map and the first 2D feature map according to channels, and acquiring a second 2D feature map through a space attention module; multiplying the 2D feature map and the second 2D feature map to obtain a third 2D feature map; in addition, a first 3D feature map is obtained through a channel attention module; multiplying the 3D characteristic diagram and the first 3D characteristic diagram according to channels, and acquiring a second 3D characteristic diagram through a space attention module; multiplying the 3D feature map and the second 3D feature map to obtain a third 3D feature map;

and adding the acquired third 2D characteristic diagram and the third 3D characteristic diagram to obtain the identity characteristic of the pedestrian.

Further, preferably, the characteristic extraction of the pedestrian image after changing clothes and the original pedestrian image is realized by a ResNet-50 neural network;

and extracting the characteristics of the point cloud data through a graph convolution network.

Further, preferably, the method for acquiring the first 2D feature map through the channel attention module according to the 2D feature map includes:

performing maximum pooling and average pooling on the 2D feature map respectively to form two weight vectors;

sharing the two weight vectors through weight, and mapping the two weight vectors into the weight of each channel;

adding the mapped weights, performing normalization processing, and determining channel weights;

and acquiring a first 2D feature map according to the channel weight and the 2D feature map.

Further, preferably, the clothes-changing pedestrian re-identification model is trained and constrained by using a loss function, and the loss function is realized by the following formula:

L＝L _mse +L _i +L _t

wherein, L is _mse Represents the mean square error loss, L _i Represents the cross entropy loss, L _t Indicating a triplet penalty.

Further, preferably, the mean square error loss function is implemented by the following formula:

wherein, | | · | | represents L ₂ Norm, f _i Denotes the ith feature of X, f _i ' indicates the characteristic after changing clothes.

In order to solve the above problems, the present invention further provides a system for re-identifying and retrieving clothes-changing pedestrians based on multi-modal intelligent sensing and fusion, comprising:

the data acquisition unit is used for acquiring a pedestrian image to be identified and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information;

the characteristic extraction unit is used for inputting the original pedestrian image, the human body analytic graph and the point cloud data into a pre-trained clothes-changing pedestrian re-identification model; respectively performing coat sampling and trousers sampling on the original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to the pixel information of the human body analytic graph to obtain a pedestrian image after changing clothes; respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian;

and the identity recognition unit is used for classifying and recognizing the identity characteristics of the pedestrians and determining the identity of the pedestrian to be recognized.

In order to solve the above problem, the present invention also provides an electronic device, including: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the steps of the clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion.

The invention also protects a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the clothes changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion.

The invention discloses a clothes-changing pedestrian re-identification and retrieval method, a system, electronic equipment and a storage medium based on multi-mode intelligent perception and fusion, which have the following beneficial effects:

1) By constructing a double-flow network structure comprising a 2D image processing network and a point cloud data processing network, the purpose of simultaneously utilizing visual information given by a human body plane image and structural information of a human body in a 3-dimensional space is realized. The feature information acquired by the double-flow network is fused, so that the fusion of the feature information of the 2D image feature and the feature information of the 3D image feature is realized, a multi-mode feature which is rich, robust and stable is further obtained, and powerful information support is provided for re-identification of clothes changing pedestrians.

2) In addition, the attention module is integrated into the whole network of the clothes changing pedestrian re-identification model, so that the clothes changing pedestrian re-identification model learns the area more relevant to the identity characteristic. The clothes changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion can achieve a good effect on relevant clothes changing pedestrian re-identification data sets.

Drawings

FIG. 1 is a schematic flow diagram of a clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention;

FIG. 2 is a schematic diagram illustrating a clothing-change pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention;

FIG. 3 is a schematic diagram of effects before and after changing clothes of a pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention;

FIG. 4 is a block diagram of a logical structure of a clothes-changing pedestrian re-identification and retrieval system based on multi-modal intelligent perception and fusion according to an embodiment of the invention;

fig. 5 is a schematic diagram of an internal structure of an electronic device for implementing a clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology and a computer vision technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction and the like, and also includes common biometric technologies such as face recognition, fingerprint recognition and the like.

Specifically, as an example, fig. 1 is a schematic flowchart of a clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the present invention. Referring to fig. 1, the present invention provides a clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion, which can be executed by a device, and the device can be implemented by software and/or hardware. The clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion comprises the steps of S110-S140.

Specifically, S110, acquiring a pedestrian image to be identified, and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information; s120, inputting original pedestrian images, human body analysis graphs and point cloud data into a pre-trained clothes-changing pedestrian re-recognition model; respectively performing coat sampling and trousers sampling on an original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to pixel information of a human body analytic graph to obtain a pedestrian image after changing clothes; s130, respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian; and S140, classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified.

FIG. 2 is a schematic diagram illustrating a clothing-change pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention; as shown in fig. 2, the method comprises four steps of data acquisition, feature extraction, attention module and loss function constraint training. Firstly, respectively acquiring an original pedestrian image, a human body analytic graph acquired by analyzing the original pedestrian image and point cloud data, and respectively inputting the original pedestrian image, the human body analytic graph and the reconstructed 3D point cloud data of the human body. Secondly, performing feature extraction on the original pedestrian image, the human body analysis graph and the reconstructed 3D point cloud data of the human body by using a double-current network, wherein the original pedestrian image and the human body analysis graph are used as 2D data to perform feature extraction by using a backbone network, and the point cloud data is used as 3D data to perform feature extraction by using another backbone network. Specifically, an original pedestrian image is input into a jacket sampling module and a trousers sampling module to carry out pixel sampling on target clothes, and pixels of the jacket and the trousers of the original pedestrian image are modified by utilizing a human body analytic graph; extracting features of the original image and the reloaded image by adopting ResNet-50 as a main network; and extracting features from the 3D point cloud data by adopting a graph convolution neural network. Thirdly, inputting the features extracted by the ResNet-50 backbone network and the features extracted by the graph convolution neural network into an attention module respectively for processing; performing feature fusion according to the obtained image features and point cloud features to obtain multi-modal feature information; and re-identifying the identity of the pedestrian according to the multi-mode characteristic information. And finally, performing constraint training on the built clothes-changing pedestrian re-recognition model by using the loss function. Specifically, the entire loss function module consists of 3 parts: cross entropy loss, mean square error loss, and triplet loss. The whole training process is constrained and guided by the loss function, and the whole network learning is more robust and has stronger expressive characteristics through the three lost guides and constraints.

In summary, the overall coat-changing pedestrian re-identification framework comprises two branches: a 2D image feature extraction network and a 3D point cloud data processing network; before the training is started, the RGB image is converted into a human body analytic graph by using the existing human body analytic model, and the divided components are combined into 6 parts: background, head, arms, jacket, pants, and legs; an original pedestrian image is sampled by a coat pixel sampling and trousers pixel sampling module, the pixel value is changed according to the position of a human body analytic graph, and a modified image is stored; in the subsequent training, both the original pedestrian image and the image with the changed pixel value participate in the training; by using the two images, the network can learn more identity characteristics which are not related to cloth but have discriminability, such as hair, face, legs and the like; the point cloud data contains the three-dimensional structure information of the human body of the pedestrian to be identified, which is also the feature that the point cloud data is unchanged along with the change of the cloth; simultaneously training the two networks, respectively extracting relevant features, inputting the feature map into an attention mechanism, and enabling the networks to be more concentrated on the features irrelevant to the cloth through an attention module; and finally, performing fusion on the feature graphs output by the attention module to obtain the final identity feature.

In addition, the existing pedestrian re-identification mainly learns the appearance characteristics of pedestrians so as to distinguish different pedestrians, and the problem of re-identification of clothes-changed pedestrians is that the appearance change is large due to the fact that the pedestrians change clothes, so that the performance of a general pedestrian re-identification model is sharply reduced. That is, in the clothing change pedestrian re-recognition, the clothing appearance information is unreliable; therefore, the clothing part of the pedestrian needs to be replaced by the confused pixel value, and the network learning identity characteristic which is irrelevant to the cloth is restrained by utilizing the mean square error loss. In addition, the existing method for re-identifying the clothes-changing pedestrians always focuses on 2D plane images, but neglects that people are in a 3-dimensional world, and the prior information such as the structure of a human body and the like cannot change along with the change of the appearance, so that certain identity characteristics irrelevant to cloth can be learned in a 3-dimensional space by using point cloud data, and the problems caused by clothes changing can be relieved to a certain extent; fusing these features into one architecture can result in a more powerful representation of the features.

In a specific implementation process, the clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion comprises the steps S110-S140.

S110, acquiring a pedestrian image to be identified, and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information.

It should be noted that the human body analysis diagram is analyzed by using the existing human body analysis model image, and the components are combined into 6 parts: background, head, arms, jacket, pants, and legs, described as S = [ S = [ S ] ₁ ，S ₂ ，...，S _B ]The pixel value of which belongs to {0,1,2,3,4,5}. The 3D point cloud data is obtained by analyzing the original pedestrian image according to the existing network model. That is, the human body analysis map is used for pixel reference for generating a pedestrian image after changing clothes. The 3D point cloud data includes three-dimensional XYZ coordinate information and RGB pixel information, and is to provide 3D information of a pedestrian image. The two types of information are processed respectively, XYZ coordinates are used for establishing Graph, and RGB is mainly used for computing features.

S120, inputting original pedestrian images, human body analysis graphs and point cloud data into a pre-trained clothes-changing pedestrian re-recognition model; and respectively carrying out coat sampling and trousers sampling on the original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to the pixel information of the human body analytic graph to obtain the pedestrian image after changing the clothes.

It should be noted that the coat-changing pedestrian re-recognition model is trained in advance. And acquiring a training data set, wherein the training data set comprises a pedestrian image, a human body analytic graph corresponding to the pedestrian image and point cloud data. And then, acquiring the images after changing clothes corresponding to the pedestrian images by pixel sampling and pixel confusion according to the human body analytic graph. Then, inputting the original pedestrian image and the confused image after changing clothes into a main network for feature extraction; inputting the point cloud data into another backbone network for feature extraction; and respectively inputting the features extracted by the two main networks into an attention mechanism, and obtaining the identity features of the pedestrians after the output of the attention feature graphs are fused to obtain a trained re-identification model of the clothes-changing pedestrians.

The whole training process of the clothes-changing pedestrian re-identification model is constrained and guided by the loss function; the entire loss function module consists of 3 parts: cross entropy loss, mean square error loss, and triplet loss. Through the guidance and the constraint of the three losses, the whole network learning is more robust, and the expressive performance is stronger.

Specifically, in the training phase of the 2D image, in order to make the network learn the identity features which are cloth-independent, the mean square error loss MSE is used for constraint, and the mean square error loss function is implemented by the following formula:

wherein, | | · | | represents L ₂ Norm, f _i Denotes the ith feature of X, f _i ' indicates the characteristic after the change of clothes.

The cross-entropy loss is expressed as follows:

L _i represents a sort penalty, y _i Represents a sample x _i True tag of (2), P (x) _i ) Represents a sample x _i Is predicted by the targetAnd (6) a label. B represents the number of samples.

Thus, the overall loss function of the entire network consists of cross-entropy loss, mean-square error loss, and triplet loss, expressed as follows: l = L _mse +L _i +L _t

In a specific embodiment, the characteristic extraction of the pedestrian image after the clothes change and the original pedestrian image is realized through a ResNet-50 neural network; and extracting the characteristics of the point cloud data through a graph convolution network. Specifically, the Resnet-50 deep neural network is mainly formed by overlapping a convolutional layer and a batch normalization layer. The original pedestrian image and the pedestrian image after changing clothes participate in training, and the original pedestrian image and the pedestrian image after changing clothes are input into a main network to extract pedestrian characteristics; the ResNet-50 network is used as the backbone network in this example; in the task of re-identifying pedestrians, the proportion of pixels occupied by clothes is very large, and the appearance of the image with changed pixels and the original image is greatly changed, so that the learning is restricted by using the mean square error in order to enable the network to learn clues irrelevant to the cloth.

In addition, the point cloud data is learned by using a graph convolution network (Parameter-Efficient Person Re-identification in the 3D space, zhendong Zheng, nenggan Zheng, yi Yang. ArXiv.

The method for changing the acquired coat pixels and trousers pixels according to the pixel information of the human body analytic graph to acquire the image of the changed pedestrian comprises the following steps: s121, acquiring a coat pixel set and all vector representations of the coat by using a human body analytic graph, wherein the human body analytic graph is acquired by a pre-trained human body analytic model, and the semantic result is described as S = [ S ] ₁ ，S ₂ ，...，S _B ](ii) a Wherein S is _i Is 1 × h × w; to pairOriginal pedestrian image X = [ X = [ X ] ₁ ，X ₂ ，...，X _B ]Performing stochastic processing acquisition

Assume that all vector pixels of X are represented as

Wherein the content of the first and second substances,

Obtaining the pixel vector of the coat according to the obtained semantic segmentation result, and further obtaining a coat pixel set according to the pixel vector of the coat

S122, changing all vector representations of the coat by using the coat pixel set; s123, acquiring a trousers pixel set and all vector representations of the trousers according to the human body analytic graph, and changing all vector representations of the trousers by using the trousers pixel set. For a specific implementation of obtaining a set of trousers pixels and all vector representations of trousers from a human body map and changing said all vector representations of trousers with said set of trousers pixels, reference is made to steps S121-S122. And S124, acquiring the image of the changed pedestrian through all vector representations of the changed coat and all vector representations of the changed trousers.

FIG. 3 is a schematic diagram of effects before and after changing clothes of a pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion according to an embodiment of the invention; as shown in fig. 3, the pixel acquisition is performed on the clothes through the coat pixel sampling module and the trousers pixel sampling module, the pixel value of the original clothes is replaced by the randomly acquired pixel value, and the image with the changed pixel value is stored, that is, the image of the pedestrian after changing the coat. That is, in each of the 3 different pedestrian images in fig. 3, the coat pixel change and the pants pixel change are performed based on the body analysis maps corresponding to the respective pedestrian images, that is, the images after the respective clothes changes are generated.

S130, respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian.

In a specific embodiment, the method for respectively extracting features of the coat-changed pedestrian image, the original pedestrian image and the point cloud data and fusing the extracted features to obtain the identity features of the pedestrian comprises the following steps: s131, performing feature extraction on the pedestrian image after the clothes changing and the original pedestrian image to obtain a 2D feature map, and performing feature extraction on the point cloud data to obtain a 3D feature map; s132, inputting the 2D feature map and the 3D feature map into an attention mechanism network respectively, and acquiring a third 2D feature map and a third 3D feature map; acquiring a first 2D feature map through a channel attention module according to the 2D feature map; multiplying the 2D feature map and the first 2D feature map according to channels, and acquiring a second 2D feature map through a space attention module; multiplying the 2D feature map and the second 2D feature map to obtain a third 2D feature map; in addition, a first 3D feature map is obtained through a channel attention module; multiplying the 3D characteristic diagram and the first 3D characteristic diagram according to channels, and acquiring a second 3D characteristic diagram through a space attention module; multiplying the 3D feature map and the second 3D feature map to obtain a third 3D feature map; and S133, adding the acquired third 2D characteristic diagram and the third 3D characteristic diagram to obtain the identity characteristic of the pedestrian.

It should be noted that, in step S132, the method for obtaining the first 2D feature map through the channel attention module according to the 2D feature map includes: performing maximum pooling and average pooling on the 2D feature map respectively to form two weight vectors; sharing the two weight vectors through weight, and mapping the two weight vectors into the weight of each channel; adding the mapped weights, performing normalization processing, and determining channel weights; and acquiring a first 2D feature map according to the channel weight and the 2D feature map.

That is, the outputs of both branches are input into an attention mechanism, which consists of both channel attention and spatial attention; if the feature map extracted by the 2D feature map composed of the original pedestrian image and the clothes-changed pedestrian image is represented by the feature map A; the feature map extracted from the 3D feature map composed of point cloud data is represented by a sign map B. Then, the feature maps a and B are respectively input into the attention mechanism, the feature map a obtains a channel attention feature A1 through a channel attention module, the input feature map a and the channel attention feature map A1 are multiplied by each other according to a channel, the input feature map a and the channel attention feature map A1 are input into the spatial attention to obtain a spatial attention feature A2, finally, the feature map input by the module is multiplied by A2 to obtain a final attention feature A3, the process of the feature map B is similar to the process of obtaining the attention feature B3, and finally, the features A3 and B3 are added to obtain a final identity feature of the network.

It should be noted that the channel attention module includes: and (2) respectively passing the original input feature graph A through MaxPool and AvgPool to form two weight vectors of [ C, 1], respectively passing the two weight vectors through the same MLP network (weight sharing), mapping the two weight vectors into the weight of each channel, adding the mapped weights, then outputting the obtained weights by means of Sigmoid, and multiplying the obtained channel weights and the original feature graph A according to the channels to obtain a channel attention output feature A1. In addition, the spatial attention module includes: performing maximum pooling and average pooling according to channels on the input feature map A1, and stacking the obtained two feature maps; after passing through one convolution layer, a spatial weight is obtained, and the spatial weight is multiplied by the input feature map A1 to obtain a final attention feature map A3.

In a word, the input of the original image contains very rich visual information, and the reloading image mixed by clothes pixels enables the network to learn more characteristics irrelevant to cloth, so that the interference caused by clothes changing is reduced. The 3D data contains the three-dimensional structure information of the human body, and the distinguishing features aiming at different pedestrians can be extracted by utilizing the three-dimensional structure information, so that the features learned by the 2D image and the feature information extracted by the 3D image are fused to learn a multi-modal feature which is rich, robust and stable, and the method is very valuable in the field of re-identification of clothes-changing pedestrians.

And S140, classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified.

In a specific implementation process, identifying the identity characteristics of the pedestrian is a mature prior art, and a specific implementation mode is not limited.

In summary, the method for re-identifying and retrieving the clothes-changing pedestrians based on the multi-mode intelligent perception and fusion achieves the purpose of simultaneously utilizing visual information given by a human body plane image and structural information of a human body in a 3-dimensional space by constructing a double-flow network structure comprising a 2D image processing network and a point cloud data processing network. The feature information acquired by the double-flow network is fused, so that the fusion of the feature information of the 2D image feature and the feature information of the 3D image feature is realized, a multi-mode feature which is rich, robust and stable is further obtained, and powerful information support is provided for the re-identification of clothes-changing pedestrians. In addition, the attention module is integrated into the whole network of the clothes changing pedestrian re-identification model, so that the clothes changing pedestrian re-identification model learns the area more relevant to the identity characteristic. The clothes changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion can achieve a good effect on relevant clothes changing pedestrian re-identification data sets.

Corresponding to the clothes-changing pedestrian re-identification and retrieval method based on multi-mode intelligent perception and fusion, the invention also provides a clothes-changing pedestrian re-identification and retrieval system based on multi-mode intelligent perception and fusion. Fig. 4 shows functional modules of a clothes-change pedestrian re-identification and retrieval system based on multi-modal intelligent perception and fusion according to an embodiment of the invention.

As shown in fig. 4, the system 400 for re-identifying and retrieving clothes-changing pedestrians based on multi-modal intelligent perception and fusion provided by the present invention can be installed in an electronic device. According to the implemented functions, the system 400 for re-identifying and retrieving clothes-changing pedestrians based on multi-modal intelligent perception and fusion can include a data acquisition unit 410, a feature extraction unit 420 and an identity recognition unit 430. The units of the invention, which may also be referred to as modules, refer to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a certain fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the data acquisition unit 410 is used for acquiring a pedestrian image to be identified and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information;

a feature extraction unit 420, configured to input the original pedestrian image, the human body analysis graph, and the point cloud data into a pre-trained clothes-changing pedestrian re-identification model; respectively performing coat sampling and trousers sampling on the original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to the pixel information of the human body analytic graph to obtain a pedestrian image after changing clothes; respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian;

and an identity recognition unit 430, configured to perform classification recognition on the identity features of the pedestrian, and determine the identity of the pedestrian to be recognized.

The more specific implementation manner of the coat-changing pedestrian re-identification and retrieval system based on multi-modal intelligent perception and fusion provided by the invention can be expressed by referring to the embodiment of the coat-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion, and is not listed here.

According to the coat changing pedestrian re-identification and retrieval system based on multi-mode intelligent sensing and fusion, provided by the invention, the original image and the image after the coat changing are subjected to 2D feature extraction, and the point cloud data is subjected to 3D feature extraction, so that the rich and robust fusion features are obtained, and the efficient and accurate identity identification of the coat changing pedestrians is further realized.

As shown in fig. 5, the present invention provides an electronic device 5 for a clotheshorse re-identification and retrieval method based on multi-modal intelligent sensing and fusion.

The electronic device 5 may comprise a processor 50, a memory 51 and a bus, and may further comprise a computer program stored in the memory 51 and executable on said processor 50, such as a clothes-changing pedestrian re-identification and retrieval program 52 based on multimodal smart perception and fusion.

The memory 51 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a removable hard disk of the electronic device 5. The memory 51 may also be an external storage device of the electronic device 5 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 may be used not only to store application software installed in the electronic device 5 and various types of data, such as codes of a clothes-changing pedestrian re-identification and retrieval program based on multi-modal smart perception and fusion, etc., but also to temporarily store data that has been output or will be output.

The processor 50 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 50 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 5 by running or executing programs or modules (e.g., clothes-changing pedestrian re-identification and retrieval programs based on multimodal intelligent sensing and fusion, etc.) stored in the memory 51 and calling data stored in the memory 51.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 51 and at least one processor 50 or the like.

Fig. 5 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 5, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 5 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 50 through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 5 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 5 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device 5 and other electronic devices.

Optionally, the electronic device 5 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), or alternatively, a standard wired interface, or a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 5 and for displaying a visualized user interface.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The clothes-changing pedestrian re-identification and retrieval program 52 based on multi-modal smart perception and fusion stored in the memory 51 of the electronic device 5 is a combination of a plurality of instructions, which when executed in the processor 50, can realize: s110, acquiring a pedestrian image to be identified, and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information; s120, inputting original pedestrian images, human body analysis graphs and point cloud data into a pre-trained clothes-changing pedestrian re-recognition model; respectively performing coat sampling and trousers sampling on an original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to pixel information of a human body analytic graph to obtain a pedestrian image after changing clothes; s130, respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian; and S140, classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified.

Specifically, the specific implementation method of the instruction by the processor 50 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again. It is emphasized that, in order to further ensure the privacy and security of the clothes-changing pedestrian re-identification and retrieval program based on multi-modal intelligent sensing and fusion, the clothes-changing pedestrian re-identification and retrieval program based on multi-modal intelligent sensing and fusion is stored in the node of the block chain where the server cluster is located.

Further, the integrated modules/units of the electronic device 5, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).

An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium may be nonvolatile or volatile, and the storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements: s110, acquiring a pedestrian image to be identified, and acquiring a corresponding human body analytic graph and point cloud data according to the pedestrian image; the point cloud data comprises three-dimensional coordinate information and RGB pixel information; s120, inputting original pedestrian images, human body analysis graphs and point cloud data into a pre-trained clothes-changing pedestrian re-recognition model; respectively performing coat sampling and trousers sampling on an original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to pixel information of a human body analytic graph to obtain a pedestrian image after changing clothes; s130, respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain identity features of the pedestrian; and S140, classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified.

Specifically, the specific implementation method of the computer program when being executed by the processor may refer to the description of the relevant steps in the clothes-changing pedestrian re-identification and retrieval method based on multi-modal intelligent perception and fusion in the embodiment, which is not described herein again.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like, and the blockchain may store medical data, such as personal health records, kitchens, examination reports, and the like.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A clothes-changing pedestrian re-recognition and retrieval method based on multi-mode intelligent perception and fusion is characterized by comprising the following steps:

respectively performing coat sampling and trousers sampling on the original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to the pixel information of the human body analytic graph to obtain a pedestrian image after changing clothes;

respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain identity features of pedestrians;

classifying and identifying the identity characteristics of the pedestrians, and determining the identity of the pedestrian to be identified;

the method for changing the acquired coat pixels and trousers pixels according to the pixel information of the human body analytic graph to acquire the image of the changed pedestrian comprises the following steps:

acquiring a coat pixel set and all vector representations of a coat by using a human body analytic graph, wherein the human body analytic graph is acquired through a pre-trained human body analytic model, and the semantics of the human body analytic graph areThe results are described as

(ii) a Wherein the content of the first and second substances,S _i is 1 × h × w; for the original pedestrian image

Performing stochastic processing acquisition

Assume that all vector pixels of X are represented as

(ii) a Wherein the content of the first and second substances,

pixels representing the jacket portion, M represents the total pixel value, and M = B · H · W; performing semantic segmentation on the randomly processed original pedestrian image and acquiring a semantic segmentation result

(ii) a Obtaining the pixel vector of the coat according to the obtained semantic segmentation result, and further obtaining a coat pixel set according to the pixel vector of the coat

；

Changing all vector representations of the coat with the coat pixel set;

and acquiring the image of the pedestrian after changing the coat by all the vector representations of the changed coat and all the vector representations of the changed trousers.

2. The method for re-identifying and retrieving a clothed pedestrian based on multi-modal intelligent perception and fusion as claimed in claim 1, wherein the method for respectively extracting features of the clothed pedestrian image, the original pedestrian image and the point cloud data and fusing the extracted features to obtain the identity features of the pedestrian comprises the following steps:

performing feature extraction on the pedestrian image after the clothes changing and the original pedestrian image to obtain a 2D feature map, and performing feature extraction on the point cloud data to obtain a 3D feature map;

3. The method for re-identifying and retrieving clothes-changing pedestrians based on multi-modal intelligent perception and fusion as claimed in claim 2,

the characteristic extraction of the pedestrian image after changing clothes and the original pedestrian image is realized through a ResNet-50 neural network;

4. The method for re-identifying and retrieving clothes-changing pedestrians based on multi-modal intelligent perception and fusion as claimed in claim 2, wherein the method for obtaining the first 2D feature map through the channel attention module according to the 2D feature map comprises:

5. The method for re-identifying and retrieving clothed pedestrians based on multi-modal intelligent perception and fusion as claimed in claim 1, wherein the clothed pedestrian re-identification model is trained and constrained by a loss function, the loss function is realized by the following formula:

L=L _mse +L _i +L _t

wherein, theL _mse Which represents the loss in the mean-square error,L _i which represents the cross-entropy loss in the entropy domain,L _t indicating a triplet penalty.

6. The method for pedestrian re-identification and retrieval based on multi-modal intelligent perception and fusion as claimed in claim 5, wherein the mean square error loss function is implemented by the following formula:

wherein the content of the first and second substances,

represents L ₂ The number of the norm is calculated,f _i the i-th feature of X is represented,f _i 'showing the characteristics after the change of clothes.

7. The utility model provides a pedestrian who changes clothes re-discerns and search system based on multimode intelligent perception and integration which characterized in that includes:

the characteristic extraction unit is used for inputting original pedestrian images, human body analysis graphs and point cloud data into a pre-trained clothes-changing pedestrian re-recognition model; respectively performing coat sampling and trousers sampling on the original pedestrian image to obtain coat pixels and trousers pixels, and changing the obtained coat pixels and trousers pixels according to the pixel information of the human body analytic graph to obtain a pedestrian image after changing clothes; respectively extracting features of the pedestrian image after changing the clothes, the original pedestrian image and the point cloud data, and fusing the extracted features to obtain the identity features of the pedestrian;

the identity recognition unit is used for classifying and recognizing the identity characteristics of the pedestrian and determining the identity of the pedestrian to be recognized;

acquiring a coat pixel set and all vector representations of a coat by using a human body analytic graph, wherein the human body analytic graph is acquired by a pre-trained human body analytic model, and the semantic result is described as

(ii) a Wherein, the first and the second end of the pipe are connected with each other,S _i is 1 × h × w; for the original pedestrian image

Performing stochastic processing acquisition

Assume that all vector pixels of X are represented as

(ii) a Wherein the content of the first and second substances,

；

Changing all vector representations of the coat with the coat pixel set;

8. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the multi-modal intelligent perception and fusion based clotheshorse re-identification and retrieval method as recited in any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for re-identifying and retrieving a clothed pedestrian based on multi-modal intelligent perception and fusion according to any one of claims 1 to 6.