CN113705329A

CN113705329A - Re-recognition method, training method of target re-recognition network and related equipment

Info

Publication number: CN113705329A
Application number: CN202110768934.XA
Authority: CN
Inventors: 张兴明; 马定鑫; 李平生
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-11-26
Also published as: WO2023279604A1

Abstract

The application discloses a moving target re-identification method, a target re-identification network training method, electronic equipment and a computer readable storage medium. The re-recognition method comprises the following steps of utilizing a target re-recognition model to perform the following processing: respectively extracting the characteristics of a static image and a moving image of a target to be identified and a plurality of candidate targets to obtain image characteristics, wherein the moving image is used for representing the motion information of each pixel point of the static image; calculating the similarity between the image characteristics of the target to be recognized and the image characteristics of a plurality of candidate targets; and determining a re-recognition result of the target to be recognized from the plurality of candidate targets based on the similarity. Through the mode, the accuracy of the re-recognition result can be improved.

Description

Re-recognition method, training method of target re-recognition network and related equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to a method for re-identifying a moving object, a method for training a target re-identification network, an electronic device, and a computer-readable storage medium.

Background

In order to meet the increasing security requirements of the public, intelligent monitoring technologies are emerging. The intelligent monitoring is suitable for various moving objects, such as human beings, various animals (such as cows and pigs) and the like. Plays a key role in the public safety field. As an important branch of intelligent monitoring, target recognition is also receiving increasing attention from researchers.

Object re-identification is a technique that utilizes computer vision techniques to retrieve whether a particular object is present in an image or video sequence. However, the accuracy of the re-recognition result obtained by the current target re-recognition method still needs to be improved.

Disclosure of Invention

The application provides a moving target re-identification method, a target re-identification network training method, electronic equipment and a computer readable storage medium, which can improve the accuracy of a moving target re-identification result.

In order to solve the technical problem, the application adopts a technical scheme that: a method for re-identifying a moving object is provided. The re-recognition method comprises the following steps of utilizing a target re-recognition model to perform the following processing: respectively extracting characteristics of a static image and a moving image of a target to be identified and a plurality of candidate targets to obtain image characteristics, wherein the moving image is used for representing motion information of each pixel point of the static image; calculating the similarity between the image characteristics of the target to be recognized and the image characteristics of a plurality of candidate targets; and determining a re-recognition result of the target to be recognized from the plurality of candidate targets based on the similarity.

In order to solve the technical problem, the application adopts a technical scheme that: a training method for a target re-recognition network is provided. The training method comprises the following steps: performing feature extraction on a training static image and a training moving image of a moving target by using a target re-identification model to obtain training image features, wherein the training moving image is used for representing motion information of each pixel point of the training static image; classifying by using a target re-recognition model based on the training image characteristics to obtain a classification result; and adjusting parameters of the target re-identification model based on the classification result.

In order to solve the above technical problem, the present application adopts another technical solution that: an electronic device is provided, which comprises a processor, a memory connected with the processor, wherein the memory stores program instructions; the processor is configured to execute the program instructions stored by the memory to implement the above-described method.

In order to solve the above technical problem, the present application adopts another technical solution: there is provided a computer readable storage medium storing program instructions that when executed are capable of implementing the above method.

Through the mode, the static image and the dynamic image of the same target are subjected to feature extraction, and compared with a mode of only performing feature extraction on the static image, the extracted image features not only comprise texture features but also comprise motion features, so that the representation capability of the image features is stronger, and the re-identification result of the target to be identified determined based on the image features is more accurate.

Drawings

FIG. 1 is a schematic flow chart of a moving object re-identification method according to a first embodiment of the present application;

FIG. 2 is a flowchart illustrating a second embodiment of a moving object re-identification method according to the present application;

FIG. 3 is a schematic diagram of the detailed process of S21 in FIG. 2;

FIG. 4 is a schematic view of the detailed process of S22 in FIG. 2;

FIG. 5 is a flowchart illustrating a third embodiment of a moving object re-identification method according to the present application;

FIG. 6 is a flowchart illustrating a fourth embodiment of a moving object re-identification method according to the present application;

FIG. 7 is a schematic flowchart of a fifth embodiment of a moving object re-identification method according to the present application;

FIG. 8 is a schematic diagram of a structure of a target re-identification network of the present application;

FIG. 9 is a diagram illustrating the detection results of key points in RGB;

FIG. 10 is a schematic diagram of an occlusion situation of a partial feature map;

FIG. 11 is a flowchart illustrating a first embodiment of a method for training a re-recognition target network according to the present application;

FIG. 12 is a schematic structural diagram of an embodiment of an electronic device of the present application;

FIG. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Fig. 1 is a flowchart illustrating a first embodiment of a moving object re-identification method according to the present application. It is noted that the present embodiment is not limited to the flow sequence shown in fig. 1 if substantially the same result is obtained. As shown in fig. 1, the present embodiment may include:

s11: and respectively extracting the characteristics of the static image and the moving image of the target to be recognized and the plurality of candidate targets to obtain image characteristics.

The motion image is used for representing motion information of each pixel point of the static image.

The method provided by the application can be realized by using the target re-identification model. The still image is a single image in a video sequence acquired by the camera device, and the color space to which the still image belongs may be RGB, HSV, SILTP, or the like. The motion information of each pixel point of the static image is the motion information between the static image relative to the previous static image in the video sequence. The moving image may be an optical flow image or another image obtained by processing the optical flow image.

The database comprises a plurality of static images of candidate targets acquired in a history mode, and the image features of the static images of the candidate targets can be extracted in advance or can be extracted synchronously with the image features of the static images of the targets to be identified.

The still image contains texture information and the moving image contains motion information, whereby the image features encompass both texture features and motion features.

S12: and calculating the similarity of the image characteristics of the target to be recognized and the image characteristics of the candidate targets.

The similarity may be a cosine distance, a euclidean distance, a hamming distance, etc. between image features, and is not particularly limited herein.

S13: and determining a re-recognition result of the target to be recognized from the plurality of candidate targets based on the similarity.

The static images of the multiple candidate targets may be arranged in an order from the greater similarity to the lesser similarity, and the candidate target in Top-1 is the target that is most matched with the target to be identified and may be used as a re-identification result. Thereby completing re-recognition of the moving object.

Through the implementation of the embodiment, the static image and the dynamic image of the same target are subjected to feature extraction, and compared with a mode of only performing feature extraction on the static image, the extracted image features not only contain texture features but also motion features, so that the representation capability of the image features is stronger, and the re-identification result of the target to be identified, which is determined based on the image features subsequently, is more accurate.

In a specific embodiment, the image feature referred to in S11 may include an overall feature map. Therefore, in the step, feature extraction can be performed on the splicing result of the static image and the moving image to obtain an overall feature map. Taking a static image as an RGB image and a moving image as an optical flow image as an example, the static image may be subjected to horizontal optical flow extraction and vertical optical flow extraction respectively to obtain a first moving image and a second moving image, wherein the optical flow extraction method includes, but is not limited to, an HS optical flow method; and inputting the splicing result of the static image, the first moving image and the second moving image into a pre-trained five-channel feature extraction network to obtain an overall feature map.

As a specific example, the five-channel feature extraction network may be Resnet-50 with the global average pooling layer and the full connection layer removed, and the number of convolutional kernel channels of the first convolutional layer is 5, and the Stride of the last convolutional layer is 1, so as to expand the resolution of the global feature map. When the resolution of the stitching result of the RGB image and the moving image is H × W, the resolution of the feature map obtained through the five-channel feature extraction network is H/16 × W/16.

It can be understood that, compared with a method of extracting features of a plurality of static images in different color spaces by using a feature extraction network, the method of extracting features of RGB images and moving images by using the feature extraction network reduces the number of channels of image features, thereby reducing the amount of computation of the feature extraction network and making the feature extraction network lighter and faster.

In another embodiment, the image features may include keypoint feature vectors derived based on an overall feature map. In this case, the first embodiment can be further expanded to obtain the second embodiment. The method comprises the following specific steps:

fig. 2 is a flowchart illustrating a second embodiment of a moving object re-identification method according to the present application. It is noted that, if the result is substantially the same, the flow sequence shown in fig. 2 is not limited in this embodiment. The present embodiment is a further extension of S11, and as shown in fig. 2, the present embodiment may include:

s21: performing feature extraction on the splicing result of the static image and the moving image to obtain an integral feature map; and extracting key points belonging to the target to be recognized and the candidate target based on the still image and the moving image.

The confidence level (key point confidence level) of each pixel point in the static image as a key point can be obtained, the confidence level (foreground point confidence level) of each pixel point as a foreground point is obtained based on the moving image, and then the key point is determined based on the confidence level of each pixel point and/or the confidence level of the foreground point. The keypoints can be pixel points of which the corresponding keypoint confidence degrees are greater than a first threshold value and/or the foreground point confidence degrees are greater than a second threshold value.

It can be understood that if only the condition that the confidence of the keypoint is greater than the first threshold is used as the condition for determining the keypoint, it is highly likely that the pixel point at the position of the occlusion object with texture information similar to the moving object in the static image is mistaken for the keypoint. And if the condition that the confidence of the foreground point is larger than the second threshold value is only used as the condition for judging the key point, the pixel point at the position of the obstruction with high motion speed in the static image is probably mistaken as the key point. Therefore, the confidence of the key points is greater than the first threshold value and the confidence of the foreground points is greater than the second threshold value as the judgment key conditions, so that the accuracy is higher.

With reference to fig. 3, in the case that the keypoint is a pixel point whose confidence is greater than the first threshold and/or whose confidence is greater than the second threshold, S21 may be further expanded to the following sub-steps:

s211: performing attitude evaluation on the static image to obtain the confidence coefficient of the key point of each pixel point; and carrying out normalization processing on the motion information of the motion image to obtain the confidence coefficient of the foreground point of each pixel point.

The method can be used for carrying out key point detection on the static image through the attitude estimation network so as to obtain the key point confidence of each pixel point.

The motion information of the moving image is obtained from the motion information of the first moving image and the motion information of the second moving image. For example, the motion information of the first motion picture is U, and the motion information of the second motion picture is UInformation is V, then the motion information of the motion image is

The motion information of the motion image reflects the motion rate of each pixel point. And the confidence coefficient of the foreground point of the pixel point is the result of normalization processing. The higher the confidence of the foreground point of the pixel point is, the higher the motion rate is, and the more likely the foreground point is; conversely, the lower the confidence of the foreground point of the pixel point, the lower the motion rate, and the more likely it is the background point.

S212: and screening out pixel points with the key point confidence degree larger than or equal to a first threshold value and the foreground point confidence degree larger than or equal to a second threshold value as key points.

It can be understood that the pixel points with the confidence of the key point being less than the first threshold and/or the confidence of the foreground point being greater than or equal to the second threshold are the blocked pixel points in the static image.

If the confidence of the key point of the pixel point is smaller than a first threshold and the confidence of the foreground point is smaller than a second threshold, the pixel point is judged to be shielded by a static shelter with dissimilar texture, such as a fence. If the confidence of the key point of the pixel point is greater than the first threshold and the confidence of the foreground point is less than the second threshold, it is determined that the pixel point is blocked by a static and similarly textured blocking object, such as a billboard including a target. If the confidence of the key point of the pixel point is smaller than the first threshold and the confidence of the foreground point is larger than the set threshold, the pixel point is judged to be shielded by a dynamic shelter with dissimilar texture, such as a running automobile.

S22: a keypoint feature vector is determined based on the keypoints and the global feature map.

Referring to fig. 4 in combination, S22 may include the following sub-steps:

s221: and setting the responsivity of the key points to be equal to the confidence coefficient of the key points, and setting the responsivity of other pixel points except the key points to be zero so as to obtain a responsivity image.

The process of obtaining a responsivity image may be embodied as the following equation:

wherein x represents the abscissa of the pixel, y represents the ordinate of the pixel, C_xyIndicating key point confidence, UV_xyRepresenting confidence of foreground points, η₁Representing a first threshold value, η₂Representing a second threshold.

S222: and performing point multiplication on the responsivity image and the integral characteristic image to obtain the key point characteristics.

The point multiplication is to multiply the responsivity image and the corresponding characteristic value in the overall characteristic diagram.

S223: performing maximum pooling on the key point characteristics to obtain a maximum pooling result; and performing average pooling on the overall feature map to obtain an average pooling result.

S224: and splicing the maximum pooling result and the average pooling result to obtain the characteristic vector of the key point.

In addition, on the basis of the second embodiment described above, S12 may be further expanded to S23, and S13 may be expanded to S24. Thus, the following example three was obtained:

fig. 5 is a flowchart illustrating a third embodiment of a moving object re-identification method according to the present application. It is noted that, if the result is substantially the same, the flow sequence shown in fig. 5 is not limited in this embodiment. As shown in fig. 5, the present embodiment may include:

s23: and calculating first similarity between the key point feature vector of the target to be identified and the key point feature vectors of the candidate targets.

The corresponding features between the key point feature vector of the target to be identified and the key point feature vectors of the candidate targets can be aligned one by one, so that the calculated first similarity is favorable for obtaining a subsequent re-identification result.

S24: and determining a re-recognition result of the target to be recognized based on at least the first similarity.

The re-recognition result of the target to be recognized may be determined based on only the first similarity, or may be determined based on a sum result of the first similarity and a second similarity mentioned in the later embodiments. The accuracy of the re-recognition result determined by the latter is higher.

In yet another embodiment, the image feature may include a local feature map obtained by dividing the entire feature map. In this case, the first embodiment can be further expanded to obtain a fourth embodiment. The method comprises the following specific steps:

fig. 6 is a flowchart illustrating a fourth embodiment of a moving object re-identification method according to the present application. It is noted that the flow sequence shown in fig. 6 is not limited in this embodiment if substantially the same result is obtained. As shown in fig. 6, the present embodiment may include:

s31: and performing feature extraction on the splicing result of the static image and the moving image to obtain an integral feature map.

S32: and dividing the overall characteristic graph of the target to be identified and the candidate target into a plurality of local characteristic graphs corresponding to each other along a preset direction, and extracting local characteristic vectors.

The preset direction may be a horizontal direction or a vertical direction. For example, the global feature map is divided into 6 local feature maps along a preset direction, and corresponding local feature vectors are extracted.

On the basis of the fourth embodiment, S12 can be further expanded to obtain S33-S34, and S13 can be further expanded to obtain S35. Thus, the following example v was obtained:

fig. 7 is a schematic flowchart of a fifth embodiment of a moving object re-identification method according to the present application. It is noted that the flow sequence shown in fig. 7 is not limited in this embodiment if substantially the same result is obtained. As shown in fig. 7, the present embodiment may include:

s33: and determining the shielding condition of the local feature map based on the confidence degrees of the foreground points and the confidence degrees of the key points of the pixel points of the static images of the target to be recognized and the candidate target.

In a specific embodiment, if a pixel point (having a hidden keypoint) with a keypoint confidence smaller than a first threshold and/or a foreground point confidence smaller than a second threshold exists in the local feature map, it is determined that the local feature map is hidden; otherwise, judging that the local feature map is not blocked.

In another embodiment, if there is a pixel point in the local feature map whose confidence of the keypoint is greater than or equal to the first threshold and whose confidence of the foreground point is greater than or equal to the second threshold (i.e., there is a keypoint), it is determined that the local feature map is not occluded; otherwise, judging that the local feature map is blocked.

S34: and under the condition that the corresponding local feature maps are not shielded, calculating a second similarity between the local feature vectors of the corresponding local feature maps, and summing.

For example, if 3 of 6 local feature maps are occluded, a second similarity between the local feature vectors of the 3 local feature maps that are not occluded is calculated.

S35: and determining a re-recognition result of the target to be recognized at least based on the summation result of the second similarity.

The re-recognition result of the target to be recognized may be determined based on only the sum result of the second similarities, or may be determined based on the sum result of the second similarities and the first similarities mentioned in the foregoing embodiments. The accuracy of the re-recognition result determined by the latter is higher.

In S24/S35, if the re-recognition result of the target to be recognized is determined based on the sum result of the second similarities and the first similarities, the sum result of the second similarities and the first similarities may be summed up twice, and the summed up twice may be divided by the number of the second similarities to obtain a third similarity; and determining a re-recognition result of the target to be recognized based on the third similarity. The specific formula that can be used for the implementation process is as follows:

wherein, the Similarity_allIndicating a third Similarity, Similarity_iRepresenting the ith local featureA second similarity between the vectors; if the ith local feature map is occluded, then I_iIs 0, otherwise I_iIs 1; similarity_keypointRepresenting a first degree of similarity.

The re-recognition method provided in the present application is described in detail below by way of an example.

Example 1:

referring to fig. 8, fig. 8 is a schematic diagram of a structure of the object re-identification network of the present application. As shown in fig. 8, a still image of an object (pedestrian) is represented by RGB, a first moving image (horizontal optical flow image) is represented by U, a second moving image (vertical optical flow image) is represented by V, and a stitching result of RGB, U, and V is represented by UVRGB. The target re-identification model comprises a feature extraction backbone network (namely the aforementioned five-channel feature extraction network) and a key point feature extraction network.

Extracting a trunk network from RGBLUV input characteristics to obtain an integral characteristic diagram; and dividing the overall feature map into 6 local feature maps along the horizontal direction, and respectively carrying out average pooling and 1 × 1 convolution processing on the 6 local feature maps to obtain corresponding local feature vectors.

Inputting RGB, U and V into a key point feature extraction network, carrying out posture estimation on RGB to obtain the confidence coefficient of key points of each pixel point, obtaining a motion image UV by utilizing U and V, and carrying out normalization processing on motion information of the motion image UV to obtain the confidence coefficient of foreground points of each pixel point.

And taking the pixel points with the key point confidence degree larger than or equal to the first threshold value and the foreground point confidence degree larger than or equal to the second threshold value as key points. Fig. 9 is a schematic diagram of the detection result of the key points in RGB, where a and B both represent key points of a pedestrian, a represents detected key points, and B represents key points that are blocked by a blocking object and are not detected.

And setting the responsivity of the key points to be equal to the confidence coefficient of the key points, and setting the responsivity of other pixel points except the key points to be zero so as to obtain a responsivity image. And performing point multiplication and maximum pooling on the responsivity image and the overall characteristic image to obtain a maximum pooling result, performing average pooling on the overall characteristic image to obtain an average pooling result, and splicing the maximum pooling result and the average pooling result to obtain the characteristic vector of the key point. And calculating first similarity between the key point feature vectors corresponding to the target to be identified and the candidate target.

Judging the local feature map of the pixel points of which the confidence degrees of the corresponding key points are smaller than a first threshold and/or the confidence degrees of the foreground points are smaller than a second threshold in the corresponding region as a blocked local feature map; and judging other local feature maps as the local feature maps which are not shielded. And calculating the sum of second similarity between local feature vectors corresponding to the local feature maps which are not shielded in the local feature maps corresponding to the target to be identified and the candidate target. Fig. 10 is a schematic diagram of the occlusion situation of the local feature map, where C is a static image of the target to be recognized, C 'is a global feature map of the target to be recognized, and is divided into 6 local feature maps, D is a static image of the candidate target, and D' is a global feature map of the candidate target. And C, the upper 3 local feature maps are not shielded in the corresponding area, and D is not shielded in the 6 local feature maps, so that the upper three local feature maps are the local feature maps which are not shielded in C and D. And calculating the sum of second similarities between the three corresponding local feature maps in C 'and D'.

And summing the sum of the second similarity and the first similarity twice to obtain a third similarity.

And determining a re-recognition result of the target to be recognized based on the third similarity.

In addition, before the target re-recognition network is put into use, the target re-recognition network needs to be trained.

Fig. 11 is a flowchart illustrating a first embodiment of a method for training a target re-recognition network according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 11 is not limited in this embodiment. As shown in fig. 11, the present embodiment may include:

s41: and performing feature extraction on the training static image and the training moving image of the moving target by using the target re-recognition model to obtain the features of the training images.

The training moving image is used for representing the motion information of each pixel point of the training static image.

S42: and classifying the target re-recognition model based on the training image characteristics to obtain a classification result.

The classification result may be used to represent the category to which the training image features belong, i.e., identity information of the moving object.

S43: and adjusting parameters of the target re-identification model based on the classification result.

The classification result may be used to calculate a loss (e.g., cross entropy loss) of the target re-recognition model, and parameters of the target re-recognition model may be adjusted based on the cross entropy loss and a stochastic gradient descent method.

The process of obtaining the training image features in the training process is the same as the application process, and reference is specifically made to the foregoing embodiment, which is not repeated herein. In addition, the training process target re-identification model is also provided with a classification network (full connection layer) for classifying the training image characteristics.

Fig. 12 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in fig. 12, the electronic device may include a processor 51, a memory 52 coupled to the processor 51.

Wherein the memory 52 stores program instructions for implementing the method of any of the above embodiments; the processor 51 is operative to execute program instructions stored by the memory 52 to implement the steps of the above-described method embodiments. The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a micro-processor or the processor 51 may be any conventional processor or the like.

FIG. 13 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application. As shown in fig. 13, the computer readable storage medium 60 of the embodiment of the present application stores program instructions 61, and the program instructions 61 implement the method provided by the above-mentioned embodiment of the present application when executed. The program instructions 61 may form a program file stored in the computer readable storage medium 60 in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage medium 60 includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings or applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A moving target re-identification method is characterized by comprising the following steps of using a target re-identification model:

respectively extracting the characteristics of a static image and a moving image of a target to be identified and a plurality of candidate targets to obtain image characteristics, wherein the moving image is used for representing the motion information of each pixel point of the static image;

calculating the similarity between the image features of the target to be recognized and the image features of the candidate targets;

and determining a re-recognition result of the target to be recognized from the plurality of candidate targets based on the similarity.

2. The method according to claim 1, wherein the step of performing feature extraction on the still image and the moving image of the object to be recognized and the plurality of candidate objects, respectively, comprises:

performing feature extraction on the splicing result of the static image and the moving image to obtain an overall feature map;

extracting key points belonging to the target to be recognized and the candidate target based on the static image and the moving image;

determining a keypoint feature vector based on the keypoints and the global feature map;

the step of calculating the similarity between the image features of the target to be recognized and the image features of the candidate targets comprises the following steps:

calculating first similarity between the key point feature vector of the target to be identified and the key point feature vectors of the candidate targets;

the step of determining re-recognition results of the target to be recognized from the plurality of candidate targets based on the similarity includes:

and determining a re-recognition result of the target to be recognized based on at least the first similarity.

3. The method of claim 2, wherein the static image is an RGB image;

the step of extracting the characteristics of the splicing result of the static image and the motion image comprises the following steps:

performing horizontal optical flow extraction and vertical optical flow extraction on the still image to obtain a first moving image and a second moving image, respectively;

and inputting the splicing result of the static image, the first moving image and the second moving image into a pre-trained five-channel feature extraction network to obtain the overall feature map.

4. The method according to claim 2, wherein the step of extracting the keypoints belonging to the target to be recognized and the candidate target based on the still image and the moving image comprises:

performing attitude evaluation on the static image to obtain the confidence coefficient of the key point of each pixel point;

carrying out normalization processing on the motion information of the motion image to obtain the confidence coefficient of the foreground point of each pixel point;

and screening out pixel points of which the confidence degrees of the key points are greater than or equal to a first threshold value and the confidence degrees of the foreground points are greater than or equal to a second threshold value as the key points.

5. The method of claim 4, wherein the step of determining a keypoint feature vector based on the keypoint and the global feature map comprises:

setting the responsivity of the key points to be equal to the key point confidence coefficient of the key points, and setting the responsivity of other pixel points except the key points to be zero so as to obtain a responsivity image;

performing point multiplication on the responsivity image and the overall characteristic image to obtain key point characteristics;

performing maximum pooling on the key point features to obtain a maximum pooling result;

performing average pooling on the overall feature map to obtain an average pooling result;

and splicing the maximum pooling result and the average pooling result to obtain the feature vector of the key point.

6. The method according to claim 4, wherein the step of feature extracting the still image and the moving image of the object to be recognized and the plurality of candidate objects, respectively, further comprises:

dividing the overall characteristic graph of the target to be recognized and the candidate target into a plurality of local characteristic graphs corresponding to each other along a preset direction, and extracting local characteristic vectors;

the step of calculating the similarity between the image features of the target to be recognized and the image features of the candidate targets includes:

determining the shielding condition of the local feature map based on the foreground point confidence coefficient and the key point confidence coefficient of each pixel point of the static image of the target to be recognized and the candidate target;

under the condition that the corresponding local feature maps are not shielded, calculating a second similarity between the local feature vectors of the corresponding local feature maps, and summing;

and determining a re-recognition result of the target to be recognized at least based on the summation result of the second similarity.

7. The method according to claim 6, wherein the step of determining the re-recognition result of the target to be recognized from the plurality of candidate targets based on the similarity comprises:

performing secondary summation on the summation result of the second similarity and the first similarity, and dividing the secondary summation result by the number of the second similarities to obtain a third similarity;

8. A training method of a target re-recognition model is characterized by comprising the following steps:

performing feature extraction on a training static image and a training moving image of a moving target by using the target re-identification model to obtain training image features, wherein the training moving image is used for representing motion information of each pixel point of the training static image;

classifying based on the training image characteristics by using the target re-identification model to obtain a classification result;

and adjusting parameters of the target re-identification model based on the classification result.

9. An electronic device comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions;

the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1-8.

10. A computer-readable storage medium, characterized in that the storage medium stores program instructions that, when executed, implement the method of any of claims 1-8.