WO2023173544A1

WO2023173544A1 - Person re-identification method and apparatus based on artificial intelligence, and device and storage medium

Info

Publication number: WO2023173544A1
Application number: PCT/CN2022/090156
Authority: WO
Inventors: 郑喜民; 朱翌; 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-03-16
Filing date: 2022-04-29
Publication date: 2023-09-21
Also published as: CN114639165B; CN114639165A

Abstract

The present application relates to the technical field of artificial intelligence. Disclosed are a person re-identification method and apparatus based on artificial intelligence, and a device and a storage medium. The method comprises: inputting a target image into a preset feature extraction model, so as to obtain a feature vector to be analyzed that is output by each feature output module; inputting, into a preset classification prediction module, each feature vector to be analyzed, so as to perform classification probability prediction, and obtaining a classification probability prediction result; according to a target feature vector and a preset number of similar images, determining sets of similar human body images from a preset human body image library, wherein the target feature vector is any feature vector to be analyzed; for each human body image in each set of similar human body images, performing weighted summation on each classification probability prediction result and the weight of each classification prediction module, so as to obtain a soft voting score; and determining a person re-identification result according to the soft voting scores. Attention is paid to both low-level features, such as the color and texture of clothes, and high-level global semantic information, thereby improving the accuracy of person re-identification.

Description

Pedestrian re-identification method, device, equipment and storage medium based on artificial intelligence

This application requests the priority of the Chinese patent application submitted to the China Patent Office on March 16, 2022, with the application number being priority number 202210256790.4, and the invention being titled "Pedestrian re-identification method, device, equipment and storage medium based on artificial intelligence" , the entire contents of which are incorporated herein by reference.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a pedestrian re-identification method, device, equipment and storage medium based on artificial intelligence.

Background technique

Person re-identification, also known as person re-identification, is a technology that uses computer vision technology to determine whether a specific pedestrian exists in an image or video sequence. It is widely considered to be a sub-problem of image retrieval, and its task is to give a Monitor pedestrian images and retrieve whether there are human body images that are the same as the pedestrians in the monitored pedestrian images in the human body image database.

Generally speaking, the process of the pedestrian re-identification task is to first train a feature extraction network, obtain the loss value based on the cosine metric distance calculated by the feature, and the optimizer continuously updates the parameters of the network based on the loss value to achieve the learning effect.

In the feature extraction network for pedestrian re-identification tasks, features close to the input layer often contain more local detailed semantic information, while features close to the output layer often contain higher-level global semantic information. The inventor found that existing Pedestrian re-identification methods often only use the feature vectors of the last layer of the feature extraction network to match the human images in the human image database without considering the underlying features. Since small objects themselves have less pixel information, they are processed in the feature extraction network. It is easily lost during the down-sampling process, resulting in low accuracy of pedestrian re-identification.

technical problem

The main purpose of this application is to provide a method, device, equipment and storage medium for pedestrian re-identification based on artificial intelligence, aiming to solve the problem that the existing pedestrian re-identification method only uses the feature vector of the last layer of the feature extraction network and the human body image. The human body images in the database are matched without considering the underlying features, resulting in technical problems such as low accuracy of pedestrian re-identification.

Technical solutions

In order to achieve the above-mentioned object of the invention, this application proposes a pedestrian re-identification method based on artificial intelligence, which method includes:

Get the target image;

Input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid Connect to each of the feature output modules respectively;

Input each of the feature vectors to be analyzed into a preset classification prediction module for classification probability prediction, and obtain a classification probability prediction result;

Determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the feature vectors to be analyzed;

For each human body image in each of the similar human body image sets, perform a weighted sum of each of the classification probability prediction results and the weight of each of the classification prediction modules to obtain a soft voting score;

According to each of the soft voting scores, the pedestrian re-identification result is determined.

This application also proposes a pedestrian re-identification device based on artificial intelligence, which includes:

Image acquisition module, used to acquire target images;

A feature vector determination module to be analyzed is used to input the target image into a preset feature extraction model to obtain a feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of The feature output module, the feature pyramid is connected to each of the feature output modules respectively;

The classification probability prediction result determination module is used to input each of the feature vectors to be analyzed into a preset classification prediction module to perform classification probability prediction and obtain the classification probability prediction result;

A similar human body image set determination module, configured to determine a similar human body image set from a preset human body image library based on the target feature vector and a preset number of similar images, wherein the target feature vector is any one of the features to be analyzed vector;

A soft voting score determination module, configured to perform a weighted sum of each of the classification probability prediction results and the weights of each of the classification prediction modules for each human body image in each of the similar human body image sets to obtain a soft voting score;

The pedestrian re-identification result determination module is used to determine the pedestrian re-identification result according to each of the soft voting scores.

This application also proposes a computer device, including a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements an artificial intelligence-based pedestrian re-identification method, wherein: The method includes the following steps:

Get the target image;

This application also proposes a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements an artificial intelligence-based pedestrian re-identification method, wherein the method includes the following steps:

Get the target image;

beneficial effects

The artificial intelligence-based pedestrian re-identification method, device, equipment and storage medium of this application improves the accuracy of pedestrian re-identification.

Description of the drawings

Figure 1 is a schematic flow chart of a pedestrian re-identification method based on artificial intelligence according to an embodiment of the present application;

Figure 2 is a schematic structural block diagram of an artificial intelligence-based pedestrian re-identification device according to an embodiment of the present application;

FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.

Best Mode of Carrying Out the Invention

In order to solve the above problem, an embodiment of the present application provides a pedestrian re-identification method based on artificial intelligence. Please refer to Figure 1 for details. The method includes:

S1: Get the target image;

S2: Input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of the feature output modules, The feature pyramid is connected to each of the feature output modules respectively;

S3: Input each feature vector to be analyzed into the preset classification prediction module to perform classification probability prediction, and obtain the classification probability prediction result;

S4: Determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, where the target feature vector is any one of the feature vectors to be analyzed;

S5: For each human body image in each of the similar human body image sets, perform a weighted sum of each classification probability prediction result and the weight of each classification prediction module to obtain a soft voting score;

S6: Determine the pedestrian re-identification result based on each of the soft voting scores.

In this embodiment, the feature pyramid of the feature extraction model is connected to each feature output module respectively, and feature vectors to be analyzed with different levels of semantic information are obtained, thereby achieving attention to low-level features such as clothing color and texture and high-level global semantic information, and improving The accuracy of pedestrian re-identification is improved; and the soft voting score is determined within the range of similar human image sets determined based on the target feature vector and the preset number of similar images, which further improves the accuracy of pedestrian re-identification.

For S1, the target image can be obtained from the database, the target image input by the user can be obtained, and the target image can be obtained from a third-party application.

The target image is an image that needs to be retrieved in the human body image database. The target image is an image taken of a human body.

For S2, the target image is input into a preset feature extraction model, and each feature output module of the feature extraction model outputs a feature vector to be analyzed.

The feature pyramid includes: bottom-up feature extraction link, top-down feature fusion link and horizontal connection. Among them, the horizontal connection is one of the bottom-up feature extraction link and the top-down feature fusion link. connections between.

Each feature fusion layer of the top-down feature fusion link of the feature pyramid is linked to one of the feature output modules, and the feature fusion layer corresponds to the feature output module one-to-one.

Bottom-up feature extraction link, which is the forward process of the network. In the forward process, the size of the feature map will change after passing through some network layers, and when passing through other network layers It will not change. Network layers with the same feature map size will be unified into one feature extraction layer.

The top-down feature fusion link uses upsampling to amplify high-level feature maps in equal proportions.

Horizontal connection is to fuse the feature vector obtained by upsampling and the feature vector of the same size output by the bottom-up feature extraction link through the channel expansion of the 1*1 convolution kernel to obtain the fused feature vector . The fused feature vector will be output to the feature output module.

For S3, each feature vector to be analyzed is input into a preset classification prediction module for classification probability prediction, and the predicted probability vector is used as the classification probability prediction result.

The number of vector elements in the classification probability prediction result is the same as the number of human body images in the human body image database. That is to say, the vector elements in the classification probability prediction result correspond to the human body images in the human body image database.

The value of the vector element in the classification probability prediction result is the probability that the human body in the target image is similar to the human body in the human body image corresponding to the vector element.

It can be understood that the number of classification probability prediction results is the same as the number of feature vectors to be analyzed.

The classification prediction module is a fully connected layer using softmax activation function.

For S4, any one of the feature vectors to be analyzed is used as the target feature vector; according to the target feature vector, the most similar human body image is determined from the preset human body image library as a similar human body image set, and the number of images in the similar human body image set is equal to the number of similar images.

The human body image library includes: image identification, human body images, and feature vectors corresponding to the human body images. Human body images are images taken of the human body.

For S5, perform aggregation processing and deduplication processing on each similar human body image set to obtain a target human body image set; use any human body image in the target human body image set as the image to be voted; predict each classification probability Each vector element value in the result corresponding to the image to be voted and the weight of each classification prediction module are weighted and summed, and the data obtained by the weighted summation is used as the soft voting score corresponding to the image to be voted.

For example, the number of the classification probability prediction results is 4, and the vector element value corresponding to the image to be voted in the first classification probability prediction result (which is output by the first classification prediction module) is the same as the first The weights of the two classification prediction modules are multiplied to obtain the first score, and the vector element value corresponding to the image to be voted in the second classification probability prediction result (which is output by the second classification prediction module) The second score is obtained by multiplying it with the weight of the second classification prediction module. The third classification probability prediction result (which is output by the third classification prediction module) corresponds to the image to be voted. The vector element value is multiplied by the weight of the third classification prediction module to obtain the third score. The fourth classification probability prediction result (which is output by the fourth classification prediction module) is equal to the value to be voted on. The vector element value corresponding to the image is multiplied by the weight of the fourth classification prediction module to obtain the fourth score, and the first score, the second score, the third score and the fourth score are added, The added data is used as the soft voting score corresponding to the image to be voted on.

For S6, the human body image corresponding to the soft voting score with the largest value in each of the soft voting scores is used as the hit image of the pedestrian re-identification result.

In one embodiment, the above-mentioned step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module includes:

S21: Input the target image into the first feature extraction layer of the feature pyramid to obtain the first feature initial vector;

S22: Input the i-1th feature initial vector into the i-th feature extraction layer of the feature pyramid to obtain the i-th feature initial vector, where i is greater than 1 and less than n+1, n is an integer greater than 2;

S23: Input the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fusion feature vector;

S24: Input the k-1th fusion feature vector and the n-k+1th feature initial vector into the k-th feature fusion layer of the feature pyramid to perform feature fusion, and obtain the k-th feature initial vector. Fusion feature vector, where k is greater than 1 and less than n;

S25: Input the m-th fused feature vector into the m-th feature output module for feature output, and obtain the m-th feature vector to be analyzed, where m is greater than 0 and less than n.

This embodiment implements feature extraction using bottom-up feature extraction links, top-down feature fusion links and horizontal connections, and outputs a feature vector to be analyzed for each fusion feature vector, and obtains features with different The feature vector to be analyzed is the hierarchical semantic information, so as to focus on low-level features such as clothing color and texture and high-level global semantic information, thereby improving the accuracy of pedestrian re-identification.

Among them, the 1st to nth feature extraction layers of the feature pyramid can use a Resnet50 network (residual network).

For S21, the target image is input into the first feature extraction layer of the feature pyramid, and the feature vector output by the first feature extraction layer is used as the first feature initial vector.

For S22, input the i-1th feature initial vector into the i-th feature extraction layer of the feature pyramid, and use the feature vector output by the i-th feature extraction layer as the i-th feature initial vector, Among them, i is greater than 1 and less than n+1, and n is an integer greater than 2.

That is, the number of feature initial vectors is n.

For S23, the nth feature initial vector is input into the first feature fusion layer of the feature pyramid for channel expansion, and the feature vector obtained by channel expansion is used as the first fusion feature vector.

For S24, the k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid for element addition to achieve feature fusion. The feature vector obtained by feature fusion is used as the k-th fused feature vector, and k is greater than 1 and less than n, that is, the number of fused feature vectors is n-1.

For S25, input the m-th fused feature vector into the m-th feature output module for pooling, and use the pooled feature vector as the m-th feature vector to be analyzed, where m is greater than 0 and less than n, In other words, the number of feature vectors to be analyzed is n-1.

In one embodiment, the above-mentioned step of inputting the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fused feature vector includes:

S231: Use the channel expansion convolution kernel of the first feature fusion layer to perform channel expansion on the nth feature initial vector to obtain the first fusion feature vector, wherein the channel expansion convolution kernel It is a 1*1 convolution kernel;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid for feature fusion to obtain the k-th feature initial vector. The steps to fuse feature vectors include:

S241: Use the channel expansion convolution kernel of the kth feature fusion layer to perform channel expansion on the n-k+1th feature initial vector to obtain the kth channel expansion feature;

S242: Input the k-1th fusion feature vector into the nearest neighbor interpolation processing sub-layer of the kth feature fusion layer for equal proportion amplification, and obtain the kth equal proportion amplification feature;

S243: Fusion process the k-th channel expansion feature and the k-th equal-scale amplification feature to obtain the k-th fusion feature vector.

This embodiment uses channel expansion to perform fusion processing using features of the same channel, which provides a basis for obtaining feature vectors to be analyzed with different levels of semantic information.

For S231, the channel expansion convolution kernel of the first feature fusion layer is used to convolve the nth feature initial vector to achieve channel expansion, and the feature vector obtained by convolution is used as the first feature vector. Fusion of feature vectors.

For S241, the channel expansion convolution kernel of the kth feature fusion layer is used to convolve the n-k+1th feature initial vector to achieve channel expansion, and the feature vector obtained by convolution is As the kth channel expansion feature.

For S242, the nearest neighbor interpolation processing sub-layer of the k-th feature fusion layer is used to perform nearest neighbor interpolation processing on the k-1th fusion feature vector to achieve equal-proportion amplification, and the feature vector obtained by equal-proportion amplification is As the kth equal-scale enlargement feature.

Optionally, the amplification ratio of the nearest neighbor interpolation processing sub-layer is 2.

For S243, element-wise addition is performed on the k-th channel expansion feature and the k-th proportional amplification feature, and the data obtained by the element addition is used as the k-th fusion feature vector.

Optionally, a 1*1 convolution kernel is used to fuse the k-th channel expansion feature and the k-th equal-scale amplification feature, and the data obtained by the fusion processing is used as the k-th fusion feature vector. .

In one embodiment, the above-mentioned step of inputting the m-th fused feature vector into the m-th feature output module for feature output to obtain the m-th feature vector to be analyzed includes:

S251: Use the aliasing effect elimination layer of the mth feature output module to perform aliasing effect elimination on the mth fusion feature vector to obtain the mth aliasing effect eliminated feature vector, wherein the aliasing effect is eliminated. The stacking effect elimination layer is a 3*3 convolution kernel;

S252: Input the m-th aliasing effect eliminated feature vector into the pooling layer of the m-th feature output module for pooling processing to obtain the m-th feature vector to be analyzed.

Because the top-down feature fusion link of the feature pyramid will have an aliasing effect when upsampling, in order to solve this problem, this embodiment uses a 3*3 convolution kernel for convolution, thereby improving the accuracy of the feature vector to be analyzed. accuracy.

For S251, the aliasing effect elimination layer of the mth feature output module is used to convolve the mth fusion feature vector, and the convolved data is used as the mth aliasing effect eliminated feature vector.

It can be understood that convolution kernels of other sizes can also be used for convolution to eliminate aliasing effects, which is not limited here.

For S252, input the m-th feature vector with eliminated aliasing effect into the pooling layer of the m-th feature output module for maximum pooling processing, and use the data obtained by the maximum pooling processing as the m-th feature vector to be processed. Analyze eigenvectors.

In one embodiment, the above-mentioned step of determining a set of similar human body images from a preset human body image library based on the target feature vector and the preset number of similar images includes:

S41: Calculate the similarity between the target feature vector and the feature vector corresponding to each human body image in the human body image library to obtain the first similarity;

S42: From each of the first similarities, find the first similarity with the largest value and the same number as the number of similar images as an initial similarity set;

S43: Calculate the average value of the feature vectors of each human body image corresponding to the initial similarity set and the target feature vector to obtain an adjusted feature vector;

S44: Perform similarity calculation on the adjusted feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a second similarity;

S45: From each of the second similarities, find the second similarities with the largest value and the same number as the number of similar images, and obtain a target similarity set;

S46: Use each of the human body images corresponding to the target similarity set as the similar human body image set.

This embodiment first finds the human body images that are most similar to the target feature vector and the number is the number of similar images, and then averages the found feature vectors of the human image and the target feature vector to obtain the adjusted feature vector, and then finds Find the human body images that are most similar to the adjusted feature vector and the number is the number of similar images, and use each found human body image as a set of similar human body images, because the most similar human body images and the number is the number of similar images have high confidence. There is less noise. The adjusted feature vector is calculated by averaging the feature vectors of human images that are most similar to the target feature vector and the number is the number of similar images and the target feature vector. Then the adjusted feature vector is found and is most similar to the adjusted feature vector. The operation on human body images as many as similar images improves the overall recall rate.

For S41, perform cosine similarity calculation on the target feature vector and the feature vector corresponding to each human body image in the human body image library, and use each calculated cosine similarity as a first similarity.

For S42, find a plurality of first similarities with the largest values from each of the first similarities, and use each of the found first similarities as an initial similarity set, where the initial similarity set is The number of first similarities is equal to the number of similar images.

For S43, the feature vectors of each human body image corresponding to the initial similarity set and the target feature vector will be aggregated to obtain a set to be calculated; the average value of each feature vector in the set to be calculated will be calculated, The averaged data is used as the adjusted feature vector.

For S44, perform cosine similarity calculation on the adjusted feature vector and the feature vector corresponding to each human body image in the human body image library, and use each calculated cosine similarity as a second similarity.

For S45, find a plurality of second similarities with the largest value from each of the second similarities, and use each of the found second similarities as a target similarity set, where the target similarities are concentrated The number of the second similarities is equal to the number of similar images.

For S46, each of the human body images corresponding to the target similarity set is the human body image in the human body image library that is most similar to the characteristics corresponding to the target feature vector. Therefore, each of the human body images corresponding to the target similarity set is Human body images serve as the set of similar human body images.

In one embodiment, the above-mentioned step of determining the pedestrian re-identification result based on each of the soft voting scores includes:

S61: Find the soft voting score with the largest value from each of the soft voting scores, and obtain the target score;

S62: Determine whether the target score is greater than the preset score threshold;

S63: If yes, determine that the recognition result of the pedestrian re-recognition result is successful, and use the human body image corresponding to the target score as the hit image of the pedestrian re-recognition result;

S64: If not, determine that the recognition result of the pedestrian re-recognition result is failed.

In this embodiment, the human body image corresponding to the soft voting score greater than the preset score threshold is used as the hit image of the pedestrian re-identification result, thereby improving the accuracy of the determined pedestrian re-identification result.

For S61, the soft voting score with the largest value is found from each of the soft voting scores, and the found soft voting score is used as the target score.

For S63, if yes, that is to say, the target score is greater than the preset score threshold, it means that the soft voting score with the largest value is greater than the preset score threshold, so the recognition result of the pedestrian re-identification result is determined to be is successful, and the human body image corresponding to the target score is used as the hit image of the pedestrian re-identification result.

For S64, if not, that is to say, the target score is less than or equal to the preset score threshold, which means that the soft voting score with the largest value is less than or equal to the preset score threshold, so the pedestrian re-identification is determined The recognition result of the result is failure.

In one embodiment, before the above step of inputting the target image into a preset feature extraction model and obtaining the feature vector to be analyzed output by each feature output module, the step further includes:

S71: Obtain an initial model and a training sample set, where the initial model includes: an initial feature pyramid, multiple feature output initial modules and multiple classification prediction initial modules, and the initial feature pyramid is associated with each of the feature output initial modules. Module connection, the feature output initial module and the classification prediction initial module are connected, and the feature output initial module and the classification prediction initial module correspond one to one;

S72: The weight-based weak classifier integration method and the training sample set are used to train the initial model, and the initial model after training is used as the target model, where the initial feature pyramid of the target model As the feature pyramid, the feature output initial module of the target model serves as the feature output module, and the classification prediction initial module of the target model serves as the classification prediction module.

This embodiment uses a weight-based weak classifier integration method to train the initial model, thereby increasing the weight of the initial classification prediction module with a low error rate in each round of training, while reducing the initial classification prediction module with a high error rate. The weight of the module makes the initial module of classification prediction have better effect on misclassified data.

For S71, training samples include: sample images and classification probability calibration values. The sample images are images taken of the human body. The classification probability calibration value is an accurate calibration result of whether the human body in the sample image and the human body in each human body image in the human body image library are the same person.

For S72, the weight-based weak classifier integration method and the training sample set are used to train the initial model, and the initial model after training is used as the target model, thereby improving the error rate with a small error rate in each round of training. The weight of the initial module of classification prediction is reduced, and the weight of the initial module of classification prediction with high error rate is reduced.

If the number of initial modules for classification prediction is n-1, then (n-1)/2 is rounded down to obtain x; for the training sample set, the prediction accuracy of each initial module for classification prediction is calculated; Sort each classification prediction initial module in reverse order according to the prediction accuracy to obtain the classification prediction initial module set after the reverse order; use the first weight update formula to sort the 1st to xth of the classification prediction initial module set after the reverse order The weights of the classification prediction initial modules are updated; the second weight update formula is used to update the weights of the x+1 to n-1 classification prediction initial modules of the reverse-ordered classification prediction initial module set; where, The first weight update formula Q _y1 is: Q _y1 =q _y1 *a ^x-y1+1 , the second weight update formula Q _y2 is: Q _y2 =q _y2 /a ^x-y2+1 , a is a super parameter, Q _y1 is the weight update formula of the y1th classification prediction initial module of the classification prediction initial module set after sorting in reverse order, y1 is greater than 0 and less than x+1, q _y1 is the y1th classification prediction initial module set of classification prediction after sorting in reverse order The current weight of the module, Q _y2 is the weight update formula of the y2th classification prediction initial module of the classification prediction initial module set after sorting in reverse order, y2 is greater than x and less than n, q _y2 is the classification prediction initial module set after sorting in reverse order The current weights of the initial module for y2 classification predictions.

The value calculated by Q _y1 updates the weight of the y1th classification prediction initial module in the reverse-ordered classification prediction initial module set.

The value calculated by Q _y2 updates the weight of the y2th classification prediction initial module in the reverse-ordered classification prediction initial module set.

Optionally, a is set to 1.1.

Referring to Figure 2, this application also proposes a pedestrian re-identification device based on artificial intelligence. The device includes:

Image acquisition module 100, used to acquire target images;

The feature vector to be analyzed determination module 200 is used to input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of The feature output module, the feature pyramid is connected to each of the feature output modules respectively;

The classification probability prediction result determination module 300 is used to input each of the feature vectors to be analyzed into a preset classification prediction module to perform classification probability prediction and obtain the classification probability prediction result;

The similar human body image set determination module 400 is used to determine a similar human body image set from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the to-be-analyzed Feature vector;

The soft voting score determination module 500 is configured to perform a weighted sum of each classification probability prediction result and the weight of each classification prediction module for each human body image in each of the similar human body image sets to obtain a soft voting score;

The pedestrian re-identification result determination module 600 is used to determine the pedestrian re-identification result according to each of the soft voting scores.

Referring to FIG. 3 , an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer device includes a processor, memory, network interface, and database connected through a system bus. Among them, the processor designed by the computer is used to provide computing and control capabilities. The memory of the computer device includes non-volatile storage media and internal memory. The non-volatile storage medium stores operating systems, computer programs and databases. This memory provides an environment for the operation of operating systems and computer programs in non-volatile storage media. The database of this computer device is used to store data such as pedestrian re-identification methods based on artificial intelligence. The network interface of the computer device is used to communicate with external terminals through a network connection. The computer program is executed by the processor to implement a pedestrian re-identification method based on artificial intelligence. The artificial intelligence-based pedestrian re-identification method includes: obtaining a target image; inputting the target image into a preset feature extraction model to obtain a feature vector to be analyzed output by each feature output module, wherein the feature extraction model It includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid is connected to each of the feature output modules respectively; inputting each of the feature vectors to be analyzed into a preset classification prediction module for classification probability prediction, we get Classification probability prediction results; according to the target feature vector and the preset number of similar images, determine a set of similar human body images from the preset human body image library, wherein the target feature vector is any one of the feature vectors to be analyzed; for each For each human body image in the similar human body image set, a weighted sum of each classification probability prediction result and the weight of each classification prediction module is obtained to obtain a soft voting score; according to each soft voting score, the pedestrian weight is determined. Recognition results.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. A computer program is stored thereon. When the computer program is executed by a processor, Implementing a pedestrian re-identification method based on artificial intelligence, including the steps of: obtaining a target image; inputting the target image into a preset feature extraction model to obtain a feature vector to be analyzed output by each feature output module, wherein the feature The extraction model includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid is connected to each of the feature output modules respectively; each of the feature vectors to be analyzed is input into a preset classification prediction module for classification probability prediction. , obtain the classification probability prediction result; determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the feature vectors to be analyzed; For each human body image in each of the similar human body image sets, perform a weighted sum of each of the classification probability prediction results and the weight of each of the classification prediction modules to obtain a soft voting score; according to each of the soft voting scores, determine Pedestrian re-identification results.

The artificial intelligence-based pedestrian re-identification method implemented above is connected to each feature output module through the feature pyramid of the feature extraction model, and feature vectors to be analyzed with different levels of semantic information are obtained, thereby enabling attention to clothing color, texture, etc. The low-level features and high-level global semantic information improve the accuracy of pedestrian re-identification; and by determining the soft voting score within the similar human image set determined based on the target feature vector and the preset number of similar images, the pedestrian re-identification accuracy is further improved. Re-identification accuracy.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When executed, the computer program may include the processes of the above method embodiments. Any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration, and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

It should be noted that, in this document, the terms "comprising", "comprising" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, device, article or method that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, apparatus, article or method. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, apparatus, article or method that includes that element.

The above are only preferred embodiments of the present application, and do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made using the contents of the description and drawings of the present application, or directly or indirectly used in other related The technical fields are all equally included in the scope of patent protection of this application.

Claims

A pedestrian re-identification method based on artificial intelligence, wherein the method includes:

Get the target image;

Input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid Connect to each of the feature output modules respectively;

Input each of the feature vectors to be analyzed into a preset classification prediction module for classification probability prediction, and obtain a classification probability prediction result;

Determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the feature vectors to be analyzed;

For each human body image in each of the similar human body image sets, perform a weighted sum of each of the classification probability prediction results and the weight of each of the classification prediction modules to obtain a soft voting score;

According to each of the soft voting scores, the pedestrian re-identification result is determined.
The method of pedestrian re-identification based on artificial intelligence according to claim 1, wherein the step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module includes:

Input the target image into the first feature extraction layer of the feature pyramid to obtain the first feature initial vector;

Input the i-1th feature initial vector into the i-th feature extraction layer of the feature pyramid to obtain the i-th feature initial vector, where i is greater than 1 and less than n+1, and n is greater than an integer of 2;

Input the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fusion feature vector;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid to perform feature fusion, and the k-th fusion feature is obtained. Vector, where k is greater than 1 and less than n;

The m-th fused feature vector is input into the m-th feature output module for feature output, and the m-th feature vector to be analyzed is obtained, where m is greater than 0 and less than n.
The pedestrian re-identification method based on artificial intelligence according to claim 2, wherein the n-th feature initial vector is input into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fusion The steps of eigenvector include:

Using the channel expansion convolution kernel of the first feature fusion layer, channel expansion is performed on the nth feature initial vector to obtain the first fusion feature vector, wherein the channel expansion convolution kernel is 1 *1 convolution kernel;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid for feature fusion to obtain the k-th feature initial vector. The steps to fuse feature vectors include:

Using the channel expansion convolution kernel of the kth feature fusion layer, channel expansion is performed on the n-k+1th feature initial vector to obtain the kth channel expansion feature;

Input the k-1th fusion feature vector into the nearest neighbor interpolation processing sub-layer of the kth feature fusion layer for equal proportion amplification, and obtain the kth equal proportion amplification feature;

The kth channel expansion feature and the kth equal-scale amplification feature are fused to obtain the kth fusion feature vector.
The pedestrian re-identification method based on artificial intelligence according to claim 2, wherein the m-th fused feature vector is input into the m-th feature output module for feature output to obtain the m-th to-be-analyzed The steps of eigenvector include:

Using the aliasing effect elimination layer of the mth feature output module, the mth fusion feature vector is subjected to aliasing effect elimination to obtain the mth aliasing effect eliminated feature vector, wherein the aliasing effect The elimination layer is a 3*3 convolution kernel;

The mth feature vector with eliminated aliasing effect is input into the pooling layer of the mth feature output module for pooling processing to obtain the mth feature vector to be analyzed.
The method of pedestrian re-identification based on artificial intelligence according to claim 1, wherein the step of determining a set of similar human body images from a preset human body image library according to the target feature vector and a preset number of similar images includes:

Perform similarity calculation on the target feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a first similarity;

From each of the first similarities, find the first similarity with the largest value and the same number as the number of similar images as an initial similarity set;

Calculate the average value of the feature vectors of each human body image corresponding to the initial similarity set and the target feature vector to obtain an adjusted feature vector;

Perform similarity calculation on the adjusted feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a second similarity;

From each of the second similarities, find the second similarities with the largest value and the same number as the number of similar images to obtain a target similarity set;

Each of the human body images corresponding to the target similarity set is used as the similar human body image set.
The artificial intelligence-based pedestrian re-identification method according to claim 1, wherein the step of determining the pedestrian re-identification result based on each of the soft voting scores includes:

Find the soft voting score with the largest value from each of the soft voting scores to obtain a target score;

Determine whether the target score is greater than a preset score threshold;

If so, determine that the recognition result of the pedestrian re-identification result is successful, and use the human body image corresponding to the target score as the hit image of the pedestrian re-identification result;

If not, it is determined that the recognition result of the pedestrian re-identification result is failed.
The method of pedestrian re-identification based on artificial intelligence according to claim 1, wherein before the step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, there is also include:

Obtain an initial model and a training sample set, wherein the initial model includes: an initial feature pyramid, multiple feature output initial modules and multiple classification prediction initial modules, and the initial feature pyramid is connected to each of the feature output initial modules. , the feature output initial module and the classification prediction initial module are connected, and the feature output initial module and the classification prediction initial module correspond one to one;

The initial model is trained using a weight-based weak classifier integration method and the training sample set, and the initial model after training is used as the target model, where the initial feature pyramid of the target model is used as the target model. In the feature pyramid, the feature output initial module of the target model serves as the feature output module, and the classification prediction initial module of the target model serves as the classification prediction module.
A pedestrian re-identification device based on artificial intelligence, wherein the device includes:

Image acquisition module, used to acquire target images;

A feature vector determination module to be analyzed is used to input the target image into a preset feature extraction model to obtain a feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of The feature output module, the feature pyramid is connected to each of the feature output modules respectively;

The classification probability prediction result determination module is used to input each of the feature vectors to be analyzed into a preset classification prediction module to perform classification probability prediction and obtain the classification probability prediction result;

A similar human body image set determination module, configured to determine a similar human body image set from a preset human body image library based on the target feature vector and a preset number of similar images, wherein the target feature vector is any one of the features to be analyzed vector;

A soft voting score determination module, configured to perform a weighted sum of each of the classification probability prediction results and the weights of each of the classification prediction modules for each human body image in each of the similar human body image sets to obtain a soft voting score;

The pedestrian re-identification result determination module is used to determine the pedestrian re-identification result according to each of the soft voting scores.
A computer device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements an artificial intelligence-based pedestrian re-identification method. The artificial intelligence-based pedestrian re-identification method is provided. The pedestrian re-identification method includes the following steps:

Get the target image;

Input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid Connect to each of the feature output modules respectively;

Input each of the feature vectors to be analyzed into a preset classification prediction module for classification probability prediction, and obtain a classification probability prediction result;

Determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the feature vectors to be analyzed;

For each human body image in each of the similar human body image sets, perform a weighted sum of each of the classification probability prediction results and the weight of each of the classification prediction modules to obtain a soft voting score;

According to each of the soft voting scores, the pedestrian re-identification result is determined.
The computer device according to claim 9, wherein the step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module includes:

Input the target image into the first feature extraction layer of the feature pyramid to obtain the first feature initial vector;

Input the i-1th feature initial vector into the i-th feature extraction layer of the feature pyramid to obtain the i-th feature initial vector, where i is greater than 1 and less than n+1, and n is greater than an integer of 2;

Input the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fusion feature vector;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid to perform feature fusion, and the k-th fusion feature is obtained. Vector, where k is greater than 1 and less than n;

The m-th fused feature vector is input into the m-th feature output module for feature output, and the m-th feature vector to be analyzed is obtained, where m is greater than 0 and less than n.
The computer device according to claim 10, wherein the step of inputting the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fused feature vector includes: :

Using the channel expansion convolution kernel of the first feature fusion layer, channel expansion is performed on the nth feature initial vector to obtain the first fusion feature vector, wherein the channel expansion convolution kernel is 1 *1 convolution kernel;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid for feature fusion to obtain the k-th feature initial vector. The steps to fuse feature vectors include:

Using the channel expansion convolution kernel of the kth feature fusion layer, channel expansion is performed on the n-k+1th feature initial vector to obtain the kth channel expansion feature;

Input the k-1th fusion feature vector into the nearest neighbor interpolation processing sub-layer of the kth feature fusion layer for equal proportion amplification, and obtain the kth equal proportion amplification feature;

The kth channel expansion feature and the kth equal-scale amplification feature are fused to obtain the kth fusion feature vector.
The computer device according to claim 10, wherein the step of inputting the m-th fused feature vector into the m-th feature output module for feature output to obtain the m-th feature vector to be analyzed includes: :

Using the aliasing effect elimination layer of the mth feature output module, the mth fusion feature vector is subjected to aliasing effect elimination to obtain the mth aliasing effect eliminated feature vector, wherein the aliasing effect The elimination layer is a 3*3 convolution kernel;

The mth feature vector with eliminated aliasing effect is input into the pooling layer of the mth feature output module for pooling processing to obtain the mth feature vector to be analyzed.
The computer device according to claim 9, wherein the step of determining a set of similar human body images from a preset human body image library according to the target feature vector and a preset number of similar images includes:

Perform similarity calculation on the target feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a first similarity;

From each of the first similarities, find the first similarity with the largest value and the same number as the number of similar images as an initial similarity set;

Calculate the average value of the feature vectors of each human body image corresponding to the initial similarity set and the target feature vector to obtain an adjusted feature vector;

Perform similarity calculation on the adjusted feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a second similarity;

From each of the second similarities, find the second similarities with the largest value and the same number as the number of similar images to obtain a target similarity set;

Each of the human body images corresponding to the target similarity set is used as the similar human body image set.
The computer device according to claim 9, wherein the step of determining the pedestrian re-identification result according to each of the soft voting scores includes:

Find the soft voting score with the largest value from each of the soft voting scores to obtain a target score;

Determine whether the target score is greater than a preset score threshold;

If so, determine that the recognition result of the pedestrian re-identification result is successful, and use the human body image corresponding to the target score as the hit image of the pedestrian re-identification result;

If not, it is determined that the recognition result of the pedestrian re-identification result is failed.
The computer device according to claim 9, wherein before the step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, the step further includes:

Obtain an initial model and a training sample set, wherein the initial model includes: an initial feature pyramid, multiple feature output initial modules and multiple classification prediction initial modules, and the initial feature pyramid is connected to each of the feature output initial modules. , the feature output initial module and the classification prediction initial module are connected, and the feature output initial module and the classification prediction initial module correspond one to one;

The initial model is trained using a weight-based weak classifier integration method and the training sample set, and the initial model after training is used as the target model, where the initial feature pyramid of the target model is used as the target model. In the feature pyramid, the feature output initial module of the target model serves as the feature output module, and the classification prediction initial module of the target model serves as the classification prediction module.
A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, an artificial intelligence-based pedestrian re-identification method is implemented, wherein the artificial intelligence-based pedestrian re-identification method includes Following steps:

Get the target image;

Input the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module, wherein the feature extraction model includes: a feature pyramid and a plurality of the feature output modules, the feature pyramid Connect to each of the feature output modules respectively;

Input each of the feature vectors to be analyzed into a preset classification prediction module for classification probability prediction, and obtain a classification probability prediction result;

Determine a set of similar human body images from the preset human body image library according to the target feature vector and the preset number of similar images, wherein the target feature vector is any one of the feature vectors to be analyzed;

For each human body image in each of the similar human body image sets, perform a weighted sum of each of the classification probability prediction results and the weight of each of the classification prediction modules to obtain a soft voting score;

According to each of the soft voting scores, the pedestrian re-identification result is determined.
The computer-readable storage medium according to claim 16, wherein the step of inputting the target image into a preset feature extraction model to obtain the feature vector to be analyzed output by each feature output module includes:

Input the target image into the first feature extraction layer of the feature pyramid to obtain the first feature initial vector;

Input the i-1th feature initial vector into the i-th feature extraction layer of the feature pyramid to obtain the i-th feature initial vector, where i is greater than 1 and less than n+1, and n is greater than an integer of 2;

Input the n-th feature initial vector into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fusion feature vector;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid to perform feature fusion, and the k-th fusion feature is obtained. Vector, where k is greater than 1 and less than n;

The m-th fused feature vector is input into the m-th feature output module for feature output, and the m-th feature vector to be analyzed is obtained, where m is greater than 0 and less than n.
The computer-readable storage medium according to claim 17, wherein the n-th feature initial vector is input into the first feature fusion layer of the feature pyramid for feature processing to obtain the first fused feature vector. steps, including:

Using the channel expansion convolution kernel of the first feature fusion layer, channel expansion is performed on the nth feature initial vector to obtain the first fusion feature vector, wherein the channel expansion convolution kernel is 1 *1 convolution kernel;

The k-1th fusion feature vector and the n-k+1th feature initial vector are input into the k-th feature fusion layer of the feature pyramid for feature fusion to obtain the k-th feature initial vector. The steps to fuse feature vectors include:

Using the channel expansion convolution kernel of the kth feature fusion layer, channel expansion is performed on the n-k+1th feature initial vector to obtain the kth channel expansion feature;

Input the k-1th fusion feature vector into the nearest neighbor interpolation processing sub-layer of the kth feature fusion layer for equal proportion amplification, and obtain the kth equal proportion amplification feature;

The kth channel expansion feature and the kth equal-scale amplification feature are fused to obtain the kth fusion feature vector.
The computer-readable storage medium according to claim 16, wherein the step of determining a set of similar human body images from a preset human body image library according to the target feature vector and a preset number of similar images includes:

Perform similarity calculation on the target feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a first similarity;

From each of the first similarities, find the first similarity with the largest value and the same number as the number of similar images as an initial similarity set;

Calculate the average value of the feature vectors of each human body image corresponding to the initial similarity set and the target feature vector to obtain an adjusted feature vector;

Perform similarity calculation on the adjusted feature vector and the feature vector corresponding to each human body image in the human body image library to obtain a second similarity;

From each of the second similarities, find the second similarities with the largest value and the same number as the number of similar images to obtain a target similarity set;

Each of the human body images corresponding to the target similarity set is used as the similar human body image set.
The computer-readable storage medium according to claim 16, wherein the step of determining the pedestrian re-identification result according to each of the soft voting scores includes:

Find the soft voting score with the largest value from each of the soft voting scores to obtain a target score;

Determine whether the target score is greater than a preset score threshold;

If so, determine that the recognition result of the pedestrian re-identification result is successful, and use the human body image corresponding to the target score as the hit image of the pedestrian re-identification result;

If not, it is determined that the recognition result of the pedestrian re-identification result is failed.