CN116052220A

CN116052220A - Pedestrian re-identification method, device, equipment and medium

Info

Publication number: CN116052220A
Application number: CN202310125013.0A
Authority: CN
Inventors: 闫文雪; 宋宏健; 张燕; 厉吉华; 李军宏
Original assignee: Beijing Duowei Shitong Technology Co ltd
Current assignee: Beijing Duowei Shitong Technology Co ltd
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2023-05-02
Anticipated expiration: 2043-02-07
Also published as: CN116052220B

Abstract

The disclosure provides a pedestrian re-identification method, device, equipment and medium, and relates to the technical field of digital image processing. The method comprises the following steps: extracting feature vectors of the pedestrian image to obtain feature vectors of the target image and feature vectors of each first image; wherein, when the pedestrian image category is a whole-body image, the feature vector of the pedestrian image comprises the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image; in the case where the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes the half-body feature of the pedestrian image; based on the category of the target image and the category of each first image, selecting corresponding features from the feature vector of the target image and the feature vector of the first image, calculating the cosine distance between the target image and each first image, and sequencing according to the cosine distance to obtain a pedestrian re-identification result. According to the embodiment of the disclosure, the accuracy of pedestrian re-identification can be improved.

Description

Pedestrian re-identification method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of digital image processing, and in particular relates to a pedestrian re-identification method, device, equipment and medium.

Background

The Person Re-identification (ReID) technology aims at judging whether a target object (a specific pedestrian) exists in different camera videos, and can be used for various scenes such as target tracking, intelligent video monitoring and target retrieval.

The pedestrian re-recognition technology in the related art is mostly based on pedestrian re-recognition of the target integral image, and although the existing pedestrian re-recognition model can learn the pedestrian characteristics with strong characterization capability, the data in the real scene is complex, some pedestrians are shielded to different degrees, and even due to the shooting angle of a camera, the complete pedestrian image cannot be shot in the view angle range of the camera. When the pedestrian image is incomplete, the accuracy of pedestrian re-recognition is often greatly reduced, so that the false detection rate of the existing pedestrian re-recognition technology is higher. Therefore, how to improve the accuracy of pedestrian re-recognition becomes a technical problem to be solved in the art.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure provides a pedestrian re-recognition method, device, equipment and medium, which at least overcome the problem of low pedestrian re-recognition accuracy caused by incomplete pedestrian images in the related art to a certain extent.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to one aspect of the present disclosure, there is provided a pedestrian re-recognition method including:

acquiring a pedestrian image, wherein the pedestrian image at least comprises a target image and a plurality of first images to be identified;

extracting feature vectors of pedestrian images based on a preset image feature extraction method to obtain feature vectors of target images and feature vectors of each first image; wherein, when the category of the pedestrian image is a whole-body image, the feature vector of the pedestrian image comprises the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image; in the case where the category of the pedestrian image is a half-body image, the feature vector of the pedestrian image includes the half-body feature of the pedestrian image;

based on the category of the target image and the category of each first image, selecting corresponding features from the feature vector of the target image and the feature vector of the first image, calculating the cosine distance between the target image and each first image, and sequencing according to the cosine distance to obtain a pedestrian re-identification result.

In one embodiment of the present disclosure, a preset image feature extraction method includes:

judging whether the pedestrian image is a whole-body image or not;

under the condition that the pedestrian image is a whole-body image, inputting the pedestrian image into a pre-trained pedestrian re-recognition whole-body model to obtain the whole-body characteristics of the pedestrian image; based on the pedestrian image, segmenting to obtain a half-body image corresponding to the pedestrian image, and inputting the half-body image corresponding to the pedestrian image into a pre-trained pedestrian re-recognition half-body model to obtain the half-body characteristics of the pedestrian image;

and under the condition that the pedestrian image is a half-body image, inputting the pedestrian image into a pre-trained pedestrian re-recognition half-body model to obtain the half-body characteristics of the pedestrian image.

In one embodiment of the present disclosure, based on a pedestrian image, a half-body image corresponding to the pedestrian image is obtained by segmentation, including:

the height H' of the half-body image is calculated based on the following formula:

H′＝s ₁ *H+s ₂ *n

wherein s is ₁ Representing the scale factor, s, of the image height scale ₂ The scale factor of the floating value is represented, n is a random number, and H represents the height of an original pedestrian image;

and cutting the pedestrian image based on the calculated height of the half-body image to obtain the half-body image corresponding to the pedestrian image.

In one embodiment of the present disclosure, the method further comprises:

when training the pedestrian re-recognition whole-body model and the pedestrian re-recognition half-body model, a whole-body and half-body mutual distillation mode is adopted to distill the pedestrian re-recognition whole-body model logic output layer and the pedestrian re-recognition half-body model logic output layer.

In one embodiment of the present disclosure, determining whether a category of pedestrian images is a whole-body image includes:

inputting the pedestrian image into a classifier, calculating the probability that the pedestrian image belongs to the half-body image, the probability that the pedestrian image belongs to the whole-body image and the probability that the pedestrian image belongs to the non-pedestrian image, and comparing the sizes of the probability that the pedestrian image belongs to the half-body image, the probability that the pedestrian image belongs to the whole-body image and the probability that the pedestrian image belongs to the non-pedestrian image;

and determining the pedestrian image as the whole-body image under the condition that the probability that the pedestrian image belongs to the whole-body image is maximum or the probability that the pedestrian image belongs to the non-pedestrian image is maximum.

In one embodiment of the present disclosure, selecting a corresponding feature in a feature vector of a target image and a feature vector of a first image based on a class of the target image and a class of each first image, calculating a cosine distance between the target image and each first image, includes:

When the category of the target image is a whole-body image and the category of the first image is a whole-body image, calculating the cosine distance between the target image and the first image based on the whole-body characteristics of the first image and the whole-body characteristics of the target image;

when the category of the target image is a whole-body image and the category of the first image is a half-body image, calculating the cosine distance between the target image and the first image based on the half-body characteristics of the first image and the half-body characteristics of the target image;

when the category of the target image is a half-body image and the category of the first image is a whole-body image, calculating the cosine distance between the target image and the first image based on the half-body characteristics of the first image and the half-body characteristics of the target image;

when the category of the target image is a body image and the category of the first image is a body image, a cosine distance between the target image and the first image is calculated based on the body characteristics of the first image and the body characteristics of the target image.

In one embodiment of the present disclosure, acquiring a pedestrian image includes:

acquiring a target image and a video to be detected, wherein the video to be detected comprises video data shot by a plurality of monitoring devices;

performing target detection on the video to be detected to obtain a plurality of first images to be identified, wherein the plurality of first images are images of a detection frame selection range in video frame images of the video to be detected.

According to another aspect of the present disclosure, there is provided a pedestrian re-recognition apparatus including:

the image acquisition module is used for acquiring a pedestrian image, and the pedestrian image at least comprises a target image and a plurality of first images to be identified;

the feature extraction module is used for extracting feature vectors of the pedestrian images based on a preset image feature extraction method to obtain feature vectors of the target images and feature vectors of each first image; wherein, when the pedestrian image category is a whole-body image, the feature vector of the pedestrian image comprises the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image; in the case where the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes the half-body feature of the pedestrian image;

the distance calculation module is used for selecting corresponding features from the feature vectors of the target image and the feature vectors of the first images based on the category of the target image and the category of each first image, calculating the cosine distance between the target image and each first image, and sorting according to the cosine distance to obtain a pedestrian re-recognition result.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including: a memory for storing instructions; and the processor is used for calling the instructions stored in the memory to realize the pedestrian re-identification method.

According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the pedestrian re-recognition method described above.

According to yet another aspect of the present disclosure, there is provided a computer program product storing instructions that, when executed by a computer, cause the computer to implement the pedestrian re-recognition method described above.

According to yet another aspect of the present disclosure, there is provided a chip comprising at least one processor and an interface;

an interface for providing program instructions or data to at least one processor;

the at least one processor is configured to execute the program instructions to implement the pedestrian re-recognition method described above.

In the pedestrian re-recognition method provided by the embodiment of the disclosure, when the feature vector of the pedestrian image is extracted, the feature vector of the pedestrian image comprises the whole body feature of the pedestrian image and the half body feature of the pedestrian image under the condition that the pedestrian image is the whole body image; under the condition that the pedestrian image is a half-body image, the feature vector of the pedestrian image comprises the half-body feature of the pedestrian image, then based on the category of the target image and the category of each first image, the corresponding feature is selected from the feature vector of the target image and the feature vector of each first image, the cosine distance between the target image and each first image is calculated, and the pedestrian re-recognition result is obtained according to cosine distance sequencing.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.

FIG. 1 illustrates a system architecture schematic diagram of an application environment of an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a pedestrian re-recognition method in an embodiment of the disclosure;

FIG. 3 illustrates a flow chart of a method for extracting preset image features in an embodiment of the present disclosure;

FIG. 4 shows a schematic architecture diagram of a classifier in an embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of extracting feature vectors of a first image in an embodiment of the present disclosure;

FIG. 6 illustrates a flow chart for extracting feature vectors of a target image in an embodiment of the present disclosure;

FIG. 7 shows a flow chart for calculating cosine distances in an embodiment of the present disclosure;

FIG. 8 illustrates a schematic diagram of a pedestrian re-identification device in an embodiment of the present disclosure;

fig. 9 shows a block diagram of an electronic device in an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully hereinafter with reference to the accompanying drawings.

It should be noted that the exemplary embodiments can be implemented in various forms and should not be construed as limited to the examples set forth herein.

Based on the background art, when the pedestrian image is incomplete, the accuracy of pedestrian re-recognition is often greatly reduced, so that the false detection rate of the existing pedestrian re-recognition technology is higher.

Specifically, the inventor finds that the pedestrian re-recognition model (ReID model) has different attention degrees on features at different positions in the picture due to different training data sets, the model trained by using the whole-body pedestrian training set focuses on whole-body semantic information, and the model trained by using the half-body pedestrian training set can mine fine-granularity and more discriminative semantic information. The data under ReID practical application scene is complicated, and some pedestrians are blocked to different degrees, even because of the shooting angle of the camera, a complete pedestrian picture can not be shot in the view angle range of the camera. Secondly, the whole-body characteristic and the half-body characteristic of the picture have the difference in semantic information, and when the cosine similarity is calculated between the whole-body characteristic and the half-body characteristic, the inaccuracy of the result can be caused.

Fig. 1 illustrates a schematic diagram of a system architecture of an exemplary application environment in which pedestrian re-recognition methods and apparatuses in embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of the

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The

terminal devices

101, 102, 103 may be various electronic devices capable of capturing video including, but not limited to, smart monitoring cameras, smartphones, tablet computers, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The pedestrian re-recognition method provided by the embodiment of the present disclosure may be executed by the server 105, and accordingly, the pedestrian re-recognition device may be provided in the server 105.

In some embodiments, the pedestrian re-recognition method provided by the embodiment of the present disclosure may replace face recognition to find a target object to be found in a video sequence under the condition that a person cannot be detected, so as to realize re-recognition of a captured pedestrian.

It should be noted that, the execution subject for executing the pedestrian re-recognition method provided by the embodiment of the present disclosure is not limited to the server 105 shown in fig. 1, and the pedestrian re-recognition method provided by the embodiment of the present disclosure may be executed by any electronic device having a computing processing capability.

Fig. 2 shows a flowchart of a pedestrian re-recognition method in an embodiment of the present disclosure, and as shown in fig. 2, the pedestrian re-recognition method provided in the embodiment of the present disclosure includes steps S210 to S230.

In S210, a pedestrian image is acquired, the pedestrian image including at least a target image and a plurality of first images to be recognized.

The target image may be an image of a target object, which may be a target pedestrian.

In one embodiment, the pedestrian re-recognition method may be used to find a target pedestrian (e.g., a passer-by nail) in the surveillance video, where the target image may be an image containing the passer-by nail.

In practical applications, the image of the road person first may be a whole-body image including the whole body of the road person first, or may be a half-body image including only the half body of the road person first. In one example, the half-body image of the road person's nail may be an upper-body image of the road person's nail, that is, an image containing the road person's nail's head and neck.

In the above example, the plurality of first images to be identified may be derived from video frame images in the surveillance video that contain passerby nails.

In the following step S220, the target image and the first image are images composed of the target object in the images as a main body, and the target object may be a pedestrian, and the pedestrian may be a whole body or a half body.

The plurality of first images in the embodiments of the present disclosure may be derived from video data captured by a plurality of monitoring devices.

In one embodiment, acquiring a pedestrian image may include: acquiring a target image and a video to be detected, wherein the video to be detected comprises video data shot by a plurality of monitoring devices; performing target detection on the video to be detected to obtain a plurality of first images to be identified, wherein the plurality of first images are images of a detection frame selection range in video frame images of the video to be detected.

It should be further noted that, in the embodiment of the present disclosure, the plurality of first images to be identified may include a non-pedestrian image, and in an example, the non-pedestrian image may be processed according to a processing flow of the pedestrian image of the whole body. In some scenarios, the plurality of first images to be identified may also be referred to as search library images, and the target image may also be referred to as a pedestrian monitoring image to be searched.

In S220, based on a preset image feature extraction method, feature vectors of the pedestrian image are extracted, and feature vectors of the target image and feature vectors of each first image are obtained.

Wherein, when the pedestrian image category is a whole-body image, the feature vector of the pedestrian image comprises the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image; in the case where the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes the half-body feature of the pedestrian image.

In some embodiments, when extracting the feature vector of the pedestrian image, the category of the pedestrian image may be determined first, and different feature extraction modes are adopted for pedestrian images of different categories.

As an example, different feature extraction modes are adopted for different types of pedestrian images, and different feature extraction models are adopted for different types of pedestrian images to extract features, so that when the feature extraction models are trained, different types of image training sets can be adopted for targeted training, and further the trained feature extraction models can pay more attention to the feature information of the category to which the trained feature extraction models belong. For example, in the case where the category includes a whole-body image and a half-body image, the feature extraction model (e.g., a pedestrian re-recognition whole-body model) trained with the whole-body image training set focuses more on the feature information of the whole-body image, and can better extract the features of the whole-body pedestrian image, and accordingly, the feature extraction model (e.g., a pedestrian re-recognition half-body model) trained with the half-body image training set can better extract the features of the half-body image.

In the above example, the feature vector of the pedestrian image classified as the whole-body image includes the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image, and the whole-body feature may be extracted by using the pedestrian re-recognition whole-body model first and then the half-body feature may be extracted by using the pedestrian re-recognition half-body model. In one embodiment, before the pedestrian re-recognition body model is adopted to extract the body features, the pedestrian image of the whole body can be processed into the body image, and then the body image is input into the pedestrian re-recognition body model to extract the features.

S230, selecting corresponding features from the feature vectors of the target image and the feature vectors of the first images based on the category of the target image and the category of each first image, calculating the cosine distance between the target image and each first image, and sorting according to the cosine distance to obtain a pedestrian re-recognition result.

In some embodiments, based on the class of the target image and the class of each first image, a corresponding feature is selected from the feature vector of the target image and the feature vector of the first image, and the cosine distance between the target image and each first image is calculated, which may be the selection of the corresponding feature for calculation, such that a calculation is made between the whole-body feature and the whole-body feature, and a calculation is made between the half-body feature and the half-body feature.

Based on the category of the target image and the category of each first image, selecting a corresponding feature from the feature vector of the target image and the feature vectors of the first images, which may be selecting a half-body feature in the feature vector of the first image and calculating a cosine distance therefrom in the case that only the half-body feature is in the feature vector of the target image, selecting a first image of the whole body feature and the whole body calculating a cosine distance in the case that the feature vector of the target image includes the whole body feature and the half-body feature, and selecting a first image of the half-body feature and the half-body calculating a pre-distance.

In some embodiments, selecting corresponding features in the feature vector of the target image and the feature vector of the first image based on the class of the target image and the class of each first image, calculating the cosine distance between the target image and each first image may include the steps of:

The pedestrian re-recognition method provided by the embodiment of the disclosure can fully utilize the whole body characteristic and half body characteristic information of the pedestrian image, calculate the cosine distance of the corresponding category attribute image characteristic in the similarity calculation process, finally sort all cosine distance values, improve the applicability of the model in complex scenes, and enable the sorting result to be more accurate.

In some embodiments, as shown in fig. 3, the preset image feature extraction method provided in the embodiments of the present disclosure includes steps S310 to S330.

In S310, it is determined whether or not the pedestrian image is a whole-body image.

In S320, in the case where the pedestrian image is a whole-body image, inputting the pedestrian image into a pre-trained pedestrian re-recognition whole-body model to obtain a whole-body feature of the pedestrian image; and based on the pedestrian image, segmenting to obtain a half-body image corresponding to the pedestrian image, and inputting the half-body image corresponding to the pedestrian image into a pre-trained pedestrian re-recognition half-body model to obtain the half-body characteristics of the pedestrian image.

In S330, when the pedestrian image is a body image, the pedestrian image is input to the pre-trained pedestrian re-recognition body model, and the body characteristics of the pedestrian image are obtained.

In some embodiments, the step S310 may be to input the pedestrian image into a classifier, calculate the probability that the pedestrian image belongs to the body image, the probability that the pedestrian image belongs to the whole body image, and the probability that the pedestrian image belongs to the non-pedestrian image, and compare the magnitudes among the three probabilities; and determining the pedestrian image as the whole-body image under the condition that the probability that the pedestrian image belongs to the whole-body image is maximum or the probability that the pedestrian image belongs to the non-pedestrian image is maximum.

The classifier in the above embodiment may be trained using a pedestrian image dataset. The pedestrian image dataset may include a whole-body image and a half-body image. In one example, the partial bust image in the pedestrian image dataset may be segmented based on the whole-body image. When training the classifier, 9/10 of the whole-body image and the half-body image can be extracted as a classifier training set and 1/10 as a classifier test set respectively.

In some embodiments, the segmentation in S320 based on the pedestrian image to obtain a half-body image corresponding to the pedestrian image includes calculating a height H' of the half-body image based on the following formula:

H′＝s ₁ *H+s ₂ *n (1)

In one embodiment, s ₁ May range from 0.4 to 0.7, s as an example ₁ The effect is better when the value of (2) is 0.6. Scale factor s of floating value ₂ It is possible to use 0.05, the random number n obeys a normal distribution.

In some embodiments, when training the pedestrian re-recognition whole-body model and the pedestrian re-recognition half-body model, the whole-body image training set and the half-body image training set can be obtained in the segmentation mode, then the whole-body image training set is applied to train the pedestrian re-recognition whole-body model, and the half-body image training set is applied to train the pedestrian re-recognition half-body model.

In some embodiments, in order to make the model have higher accuracy and enable the pedestrian re-recognition whole-body model to mine finer granularity characteristics, the pedestrian re-recognition whole-body model can acquire richer semantic information, and a whole-body mutual distillation mode can also be adopted to distill the pedestrian re-recognition whole-body model and the pedestrian re-recognition whole-body model logic output layer.

The logical output layer distillation uses reverse kl divergence to transfer training sample class information between two models. The output characteristics of the cls full-connection layer of the whole body model and the half body model are G respectively _kl And P _kl The output characteristics are divided by the distillation temperature T to obtain soft label values, wherein the distillation temperature is 2, and the multi-label classification probability values are obtained through softmax logistic regression. And obtaining a kl divergence value between the whole body model and the half body model according to a kl divergence calculation formula. Meanwhile, in order to keep the attention of each model to different characteristics, a threshold value beta is introduced, the value can be 0.002, and the absolute value of the difference between the kl divergence value and the threshold value is taken as the distillation loss of the logic layer. The logic output layer distillation makes the logic distribution between the two models more compact, so that the model can be converged more quickly during training.

In some embodiments, as shown in fig. 4, an image is input for the whole body attribute classification procedure of the classifier in the present disclosure, and after features are extracted through the backbone network, each image feature is converted into an output of 1×3 channels by using a fully connected layer. And obtaining index values of maximum values of three channels according to the output values. An index value of 0, representing the image category as whole body; an index value of 1, representing the image category as half; an index value of 2 represents that the image category is a non-pedestrian image.

The following describes the process of feature extraction and cosine value calculation in the embodiments of the present disclosure with reference to fig. 5, 6 and 7.

Fig. 5 illustrates a feature extraction flow for a plurality of first images (which may also be referred to as search library images) to be identified by an embodiment of the present disclosure, in this example, each image extracting features of a 1 x 2048 dimensional vector.

Acquiring a plurality of first images to be identified, judging the category of each image by using a classifier, and extracting the whole body characteristics of the image by using a trained pedestrian re-identification whole body model when the person takes the whole body in the image, and marking the whole body characteristics as G _i At the same time, the upper half part is cut out along the height direction of the image, and is taken as the input of a pedestrian re-identification half body model, and the upper half part characteristic of the image (namely the half body characteristic of the half body image corresponding to the pedestrian image in the previous step) is extracted and marked as G _j The method comprises the steps of carrying out a first treatment on the surface of the When the pedestrian in the image is a half body, the pedestrian re-identification half body model is directly used for extracting the image characteristics, and the image characteristics are marked as P _i . The pedestrian re-recognition whole-body model and the pedestrian re-recognition half-body model picture input size may be 384×128.

Fig. 6 shows a feature extraction procedure for a target image (which may also be referred to as a pedestrian monitoring image to be searched) in an embodiment of the present disclosure, in which the feature extraction procedure is similar to the above, and when the target image is a whole-body image, a whole-body feature G and a half-body feature G are obtained _p The method comprises the steps of carrying out a first treatment on the surface of the When the target image is a half-body image, a half-body feature P is obtained.

Fig. 7 shows the process of feature selection and cosine distance calculation in S230, which includes the following steps:

when the target image is the whole body and the first image is the whole body, calculating the feature G and the feature G _i Cosine distance of (2);

when the target image is whole body and the first image is half body, calculating the characteristic G _p And feature P _i Cosine distance of (2);

when the target image is half body and the first image is whole body, calculating the features P and G _j Cosine distance of (2);

when the target image is half body and the first image is half body, calculating the feature P and the feature P _i Is the remainder of (2)Chordal distance.

In some embodiments, the cosine distance calculation in the embodiments of the present disclosure may use the following formula:

the feature vectors may be normalized before the cosine distance similarity is calculated. Wherein m and n are 2048-dimensional eigenvectors, m _i And n _j Is the eigenvalue of the eigenvector.

The embodiment of the disclosure can fully utilize the whole-body characteristic and half-body local characteristic information of the pedestrian image. The error caused by the characteristic information difference existing between the whole body characteristic information and the half body characteristic information is reduced, and the error of the model on the attention degree extracted from the characteristics of different positions of the picture is also reduced. And judging the attribute of the pedestrian image by using the classification model, extracting the image features by using the whole-body model and the half-body model respectively, calculating the feature similarity between the attribute images of the corresponding categories, and finally sequencing all the similarity values.

In the presently disclosed embodiments, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The term "and/or" in this disclosure is merely one association relationship describing the associated object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results.

In some embodiments, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Based on the same inventive concept, a pedestrian re-recognition device is also provided in the embodiments of the present disclosure, as described in the following embodiments. Since the principle of solving the problem of the embodiment of the device is similar to that of the embodiment of the method, the implementation of the embodiment of the device can be referred to the implementation of the embodiment of the method, and the repetition is omitted.

Fig. 8 shows a schematic diagram of a pedestrian re-recognition device in an embodiment of the disclosure, and as shown in fig. 8, the pedestrian re-recognition device 800 includes:

an image acquisition module 810 for acquiring a pedestrian image including at least a target image and a plurality of first images to be identified;

the feature extraction module 820 is configured to extract feature vectors of the pedestrian image based on a preset image feature extraction method, so as to obtain feature vectors of the target image and feature vectors of each first image; wherein, when the pedestrian image category is a whole-body image, the feature vector of the pedestrian image comprises the whole-body feature of the pedestrian image and the half-body feature of the pedestrian image; in the case where the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes the half-body feature of the pedestrian image;

the distance calculating module 830 is configured to select a corresponding feature from a feature vector of the target image and a feature vector of the first image based on a category of the target image and a category of each first image, calculate a cosine distance between the target image and each first image, and obtain a pedestrian re-recognition result according to the cosine distance ranking.

In some embodiments, a preset image feature extraction method includes:

Judging whether the pedestrian image is a whole-body image or not;

In some embodiments, based on the pedestrian image, segmentation obtains a half-body image corresponding to the pedestrian image, including:

calculating the height of the half-body image based on the formula (1);

In some embodiments, when training the pedestrian re-recognition whole-body model and the pedestrian re-recognition half-body model, a whole-body and half-body mutual distillation mode is adopted to distill the pedestrian re-recognition whole-body model logic output layer and the pedestrian re-recognition half-body model logic output layer.

In some embodiments, determining whether the category of the pedestrian image is a whole-body image includes:

In some embodiments, distance calculation module 830 includes:

a first calculation unit configured to calculate a cosine distance between the target image and the first image based on the whole-body feature of the first image and the whole-body feature of the target image when the category of the target image is the whole-body image and the category of the first image is the whole-body image;

a second calculation unit configured to calculate a cosine distance between the target image and the first image based on the body feature of the first image and the body feature of the target image when the category of the target image is the whole-body image and the category of the first image is the body image;

A third calculation unit configured to calculate a cosine distance between the target image and the first image based on the body feature of the first image and the body feature of the target image when the category of the target image is the body image and the category of the first image is the whole body image;

and a fourth calculation unit for calculating a cosine distance between the target image and the first image based on the half-body feature of the first image and the half-body feature of the target image when the category of the target image is the half-body image and the category of the first image is the half-body image.

In some embodiments, the image acquisition module 810 includes:

the first acquisition unit is used for acquiring a target image and a video to be detected, wherein the video to be detected comprises video data shot by a plurality of monitoring devices;

the target detection unit is used for carrying out target detection on the video to be detected to obtain a plurality of first images to be identified, wherein the plurality of first images are images of a detection frame selection range in video frame images of the video to be detected.

The terms "first," "second," and the like in this disclosure are used solely to distinguish one from another device, module, or unit, and are not intended to limit the order or interdependence of functions performed by such devices, modules, or units.

With regard to the pedestrian re-recognition apparatus in the above-described embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment concerning the pedestrian re-recognition method, and will not be described in detail here.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory.

Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

An electronic device provided by an embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

Fig. 9 shows a schematic architecture diagram of an electronic device 900 according to the present disclosure. As shown in fig. 9, the electronic device 900 includes, but is not limited to: at least one processor 910, at least one memory 920.

Memory 920 for storing instructions.

In some embodiments, memory 920 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 9201 and/or cache memory 9202, and may further include Read Only Memory (ROM) 9203.

In some embodiments, memory 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

In some embodiments, memory 920 may store an operating system. The operating system may be a real-time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS or OS X like operating systems.

In some embodiments, memory 920 may also have data stored therein.

As one example, processor 910 may read data stored in memory 920, which may be stored at the same memory address as the instructions, or which may be stored at a different memory address than the instructions.

A processor 910 for invoking instructions stored in memory 920 to perform steps according to various exemplary embodiments of the present disclosure described in the above "exemplary methods" section of the present specification. For example, the processor 910 may perform the steps of the method embodiments described above.

It should be noted that the processor 910 may be a general-purpose processor or a special-purpose processor. Processor 910 may include one or more processing cores, and processor 910 performs various functional applications and data processing by executing instructions.

In some embodiments, the processor 910 may include a central processing unit (central processing unit, CPU) and/or a baseband processor.

In some embodiments, processor 910 may determine an instruction based on a priority identification and/or functional class information carried in each control instruction.

In the present disclosure, the processor 910 and the memory 920 may be separately provided or may be integrated.

As one example, processor 910 and memory 920 may be integrated on a single board or System On Chip (SOC).

As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. The electronic device 900 may also include a bus 930.

The bus 930 may be any one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 940 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 900, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950.

Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960.

As shown in fig. 9, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930.

It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It is to be understood that the illustrated structure of the presently disclosed embodiments does not constitute a particular limitation of the electronic device 900. In other embodiments of the present disclosure, electronic device 900 may include more or fewer components than shown in FIG. 9, or may combine certain components, or split certain components, or a different arrangement of components. The components shown in fig. 9 may be implemented in hardware, software, or a combination of software and hardware.

The present disclosure also provides a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the pedestrian re-recognition method described in the above method embodiments.

A computer-readable storage medium in an embodiment of the present disclosure is a computer instruction that can be transmitted, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device.

As one example, the computer-readable storage medium is a non-volatile storage medium.

In some embodiments, more specific examples of the computer readable storage medium in the present disclosure may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, a U disk, a removable hard disk, or any suitable combination of the foregoing.

In an embodiment of the present disclosure, a computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with computer instructions (readable program code) carried therein.

Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing.

In some examples, the computing instructions contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The disclosed embodiments also provide a computer program product storing instructions that, when executed by a computer, cause the computer to implement the pedestrian re-recognition method described in the above method embodiments.

The instructions may be program code. In particular implementations, the program code can be written in any combination of one or more programming languages.

The programming languages include object oriented programming languages such as Java, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages.

The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The embodiment of the disclosure also provides a chip comprising at least one processor and an interface;

the at least one processor is configured to execute the program instructions to implement the pedestrian re-recognition method described in the method embodiment.

In some embodiments, the chip may also include a memory for holding program instructions and data, the memory being located either within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above embodiments may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein.

This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A pedestrian re-recognition method, characterized by comprising:

extracting feature vectors of the pedestrian image based on a preset image feature extraction method to obtain feature vectors of the target image and feature vectors of each first image; wherein, in the case that the pedestrian image category is a whole-body image, the feature vector of the pedestrian image includes a whole-body feature of the pedestrian image and a half-body feature of the pedestrian image; in the case that the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes a half-body feature of the pedestrian image;

based on the category of the target image and the category of each first image, selecting corresponding features from the feature vector of the target image and the feature vector of the first image, calculating the cosine distance between the target image and each first image, and sorting according to the cosine distance to obtain a pedestrian re-recognition result.

2. The method according to claim 1, wherein the preset image feature extraction method includes:

Judging whether the pedestrian image is a whole-body image or not;

inputting the pedestrian image into a pre-trained pedestrian re-recognition whole-body model to obtain whole-body characteristics of the pedestrian image under the condition that the pedestrian image is a whole-body image; based on the pedestrian image, segmenting to obtain a half-body image corresponding to the pedestrian image, and inputting the half-body image corresponding to the pedestrian image into a pre-trained pedestrian re-recognition half-body model to obtain the half-body characteristics of the pedestrian image;

3. The method according to claim 2, wherein the segmenting, based on the pedestrian image, obtains a half-body image corresponding to the pedestrian image, includes:

H′＝s ₁ *H+s ₂ *n

wherein s is ₁ Representing the scale factor, s, of the image height scale ₂ Representing the scale factor of the scale of the floating value,n is a random number, and H represents the height of an original pedestrian image;

4. The method according to claim 2, wherein the method further comprises:

5. The method of claim 2, wherein determining whether the category of the pedestrian image is a whole-body image comprises:

inputting the pedestrian image into a classifier, calculating the probability that the pedestrian image belongs to a half-body image, the probability that the pedestrian image belongs to a whole-body image and the probability that the pedestrian image belongs to a non-pedestrian image, and comparing the sizes of the probability that the pedestrian image belongs to the half-body image, the probability that the pedestrian image belongs to the whole-body image and the probability that the pedestrian image belongs to the non-pedestrian image;

and determining that the pedestrian image is a whole-body image under the condition that the probability that the pedestrian image belongs to the whole-body image is maximum or the probability that the pedestrian image belongs to a non-pedestrian image is maximum.

6. The method according to any one of claims 1-5, wherein selecting corresponding features in the feature vector of the target image and the feature vector of the first image based on the class of the target image and the class of each first image, and calculating the cosine distance between the target image and each first image comprises:

When the category of the target image is a whole-body image and the category of the first image is a whole-body image, calculating a cosine distance between the target image and the first image based on whole-body features of the first image and whole-body features of the target image;

when the category of the target image is a whole-body image and the category of the first image is a half-body image, calculating a cosine distance between the target image and the first image based on the half-body characteristics of the first image and the half-body characteristics of the target image;

when the category of the target image is a half-body image and the category of the first image is a whole-body image, calculating a cosine distance between the target image and the first image based on the half-body characteristics of the first image and the half-body characteristics of the target image;

when the category of the target image is a half-body image and the category of the first image is a half-body image, a cosine distance between the target image and the first image is calculated based on the half-body characteristics of the first image and the half-body characteristics of the target image.

7. The method of claim 1, wherein the acquiring the pedestrian image comprises:

and carrying out target detection on the video to be detected to obtain a plurality of first images to be identified, wherein the plurality of first images are images of a detection frame selection range in video frame images of the video to be detected.

8. A pedestrian re-recognition device, characterized by comprising:

the feature extraction module is used for extracting the feature vector of the pedestrian image based on a preset image feature extraction method to obtain the feature vector of the target image and the feature vector of each first image; wherein, in the case that the pedestrian image category is a whole-body image, the feature vector of the pedestrian image includes a whole-body feature of the pedestrian image and a half-body feature of the pedestrian image; in the case that the pedestrian image category is a half-body image, the feature vector of the pedestrian image includes a half-body feature of the pedestrian image;

the distance calculation module is used for selecting corresponding features from the feature vector of the target image and the feature vector of the first image based on the category of the target image and the category of each first image, calculating the cosine distance between the target image and each first image, and sorting according to the cosine distance to obtain a pedestrian re-recognition result.

9. An electronic device, comprising:

a memory for storing instructions;

a processor for invoking instructions stored in said memory to implement the pedestrian re-recognition method of any one of claims 1-7.

10. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the pedestrian re-recognition method of any one of claims 1-7.