CN113435325A

CN113435325A - Image re-recognition method and device, electronic equipment and storage medium

Info

Publication number: CN113435325A
Application number: CN202110714602.3A
Authority: CN
Inventors: 薛全华; 戴磊; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-24

Abstract

The application is applicable to the technical field of artificial intelligence, and provides an image re-identification method, an image re-identification device, electronic equipment and a storage medium, wherein the image re-identification method comprises the following steps: acquiring an image to be identified; respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model; performing fusion processing on the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features; and generating the re-identification feature of the image to be identified for image re-identification by classifying the fusion feature. The method can extract features of multiple scales from the image to be recognized to perform image re-recognition based on a preset feature extraction model, so that the robustness of the image re-recognition is enhanced, and the recognition performance of the image re-recognition is improved.

Description

Image re-recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of intelligent decision making technologies, and in particular, to an image re-recognition method and apparatus, an electronic device, and a storage medium.

Background

The pedestrian re-identification technology (referred to as reiD hereinafter) is a technology for automatically searching a target pedestrian in a camera video without cross overlapping of multiple visual angles, and is mainly applied to scenes such as intelligent video monitoring and searching. However, the recognition performance of pedestrian re-recognition is difficult to be improved due to the installation difference generated in the installation process of the cameras, the posture of pedestrians and the visual angle change of the cameras when different cameras shoot, the picture quality difference generated when different cameras shoot pictures, the blocking of the cameras and other problems. Most of the existing reiD methods utilize the last layer of features of an image and perform pedestrian re-identification by extracting coarse-grained features of the image, so that the existing reiD methods are easy to lose information, difficult to learn robust feature representation and poor in identification performance of pedestrian re-identification. Moreover, the reiD method is used in a specific intelligent video monitoring scene, the tracking method of online pedestrians is often combined, and the tracking effect is poor if the integral reiD characteristic is directly applied to the tracking problem.

Disclosure of Invention

In view of this, embodiments of the present application provide an image re-recognition method, an image re-recognition apparatus, an electronic device, and a storage medium, which are mainly applied to pedestrian re-recognition, and can perform image re-recognition from features of multiple scales, reduce information loss, and improve recognition performance of pedestrian re-recognition.

A first aspect of an embodiment of the present application provides an image re-identification method, including:

acquiring an image to be identified;

respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model;

performing fusion processing on the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features;

and generating the re-identification feature of the image to be identified for image re-identification by classifying the fusion feature.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the preset feature extraction model includes a coarse-grained feature extraction module and a fine-grained feature extraction module, where:

the coarse-granularity feature extraction module comprises a convolution layer structure and a pooling layer structure and is used for extracting global re-identification features of the image to be identified;

the fine-grained feature extraction module comprises at least two feature extraction units, each feature extraction unit comprises two branch networks, the first branch network comprises a convolution layer structure, a batch normalization layer structure and a pooling layer structure, and the second branch network comprises a convolution layer structure, a batch normalization layer structure, a linear rectification function layer structure and a sequence stack structure and is used for extracting the local re-identification features of the image to be identified.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the second branch network in the feature extraction unit is configured to be formed by sequentially connecting a first convolution layer structure, a first batch of normalization layer structures, a linear rectification function layer structure, a sequence stack structure, a second convolution layer structure, and a second batch of normalization layer structures.

With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the sequence stack structure includes a plurality of dual core convolutional layer structure groups arranged in parallel, where each dual core convolutional layer structure group is formed by stacking one or more dual core convolutional layer structures.

With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the sequence stack structure is configured to:

a first sequence stack structure is obtained by arranging a dual-core convolutional layer structure group formed by a dual-core convolutional layer structure, a dual-core convolutional layer structure group formed by stacking three dual-core convolutional layer structures and a dual-core convolutional layer structure group formed by stacking five dual-core convolutional layer structures in parallel; or

A second sequence stack structure is obtained by arranging a dual-core convolutional layer structure group formed by one dual-core convolutional layer structure, a dual-core convolutional layer structure group formed by stacking two dual-core convolutional layer structures, a dual-core convolutional layer structure group formed by stacking three dual-core convolutional layer structures and a dual-core convolutional layer structure group formed by stacking four dual-core convolutional layer structures in parallel; or

And a third sequence stack structure is obtained by arranging in parallel a binuclear convolutional layer structure group formed by one binuclear convolutional layer structure, a binuclear convolutional layer structure group formed by stacking two binuclear convolutional layer structures, a binuclear convolutional layer structure group formed by stacking three binuclear convolutional layer structures, a binuclear convolutional layer structure group formed by stacking four binuclear convolutional layer structures, and a binuclear convolutional layer structure group formed by stacking five binuclear convolutional layer structures.

With reference to the first or second or third or fourth possible implementation manner of the first aspect, in a fifth possible implementation manner of the first aspect, the fine-grained feature extraction module is configured to include a first feature extraction unit and at least one second feature extraction unit, where a step size parameter of a convolutional layer structure of a first branch network in the first feature extraction unit is configured to be greater than 1, and a step size parameter of a convolutional layer structure of a first branch network in the second feature extraction unit is configured to be 1.

With reference to the first possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, in the coarse-grained feature extraction module, step size parameters of the convolutional layer structure and the pooling layer structure are both configured to be greater than 1, so that when a global re-identification feature of the image to be identified is extracted, the image to be identified is subjected to downsampling.

With reference to the sixth possible implementation manner of the first aspect, in a seventh possible implementation manner of the first aspect, a size of a convolution kernel of a convolutional layer structure configuration in the coarse-grained feature extraction module is larger than a size of a convolution kernel of a convolutional layer structure configuration in the fine-grained feature extraction module.

A second aspect of an embodiment of the present application provides an image re-recognition apparatus, including:

the image acquisition module is used for acquiring an image to be identified;

the image feature extraction module is used for respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model;

the image feature fusion module is used for fusing the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features;

and the image feature classification module is used for classifying the fusion features to generate re-identification features of the image to be identified for image re-identification.

A third aspect of embodiments of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the electronic device, where the processor implements the method provided by the first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the method provided by the first aspect.

The image re-identification method, the image re-identification device, the electronic equipment and the storage medium have the following beneficial effects:

the application provides an image re-identification method, an image re-identification device, electronic equipment and a storage medium, wherein the image re-identification method comprises the following steps: acquiring an image to be identified; respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model; performing fusion processing on the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features; and generating the re-identification feature of the image to be identified for image re-identification by classifying the fusion feature. The method can extract features of multiple scales from the image to be recognized to perform image re-recognition based on a preset feature extraction model, so that the robustness of the image re-recognition is enhanced, and the recognition performance of the image re-recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a basic method of an image re-identification method according to an embodiment of the present disclosure;

fig. 2 is a block diagram of a structure of a feature extraction unit in an image re-identification method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a second branch network in the image re-identification method according to the embodiment of the present application;

fig. 4 is another structural block diagram of a feature extraction unit in the image re-identification method according to the embodiment of the present application;

fig. 5 is a block diagram of a preset feature extraction model in the image re-identification method according to the embodiment of the present application;

fig. 6 is a schematic flowchart of a training method of a preset feature extraction model in the image re-recognition method according to the embodiment of the present application;

fig. 7 is a block diagram of a basic structure of an image re-recognition apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a basic structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a basic method of an image re-identification method according to an embodiment of the present disclosure. The details are as follows:

step S11: and acquiring an image to be identified.

In this embodiment, the image to be recognized is any image or video frame obtained by shooting through a monitoring device such as a camera. The image to be recognized contains an object to be re-recognized, such as a pedestrian image captured by an image capturing device in a monitored road section, and a pedestrian in the pedestrian image can be used as the object to be re-recognized.

Step S12: and respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model.

In this embodiment, a preset feature extraction model is adopted to perform feature extraction processing on the obtained image to be identified, so as to obtain re-identification features in the image with identification. In this embodiment, the preset feature extraction model includes two feature extraction structures of coarse-grained feature extraction and fine-grained feature extraction, is a neural network model for rich feature extraction that is integrally and locally combined, and can respectively extract global re-recognition features and local re-recognition features of the image to be recognized, so that features of multiple scales are considered during image re-recognition.

Step S13: and carrying out fusion processing on the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features.

In this embodiment, the global re-recognition feature and the local re-recognition feature are respectively features of the image to be recognized in different scales, and the fusion feature of the image to be recognized can be obtained by fusing the global re-recognition feature and the local re-recognition feature of the image to be recognized, so that the fusion feature has higher resolution of the low-level feature and contains more position and detail information. In this embodiment, based on a preset feature extraction model, feature fusion may be performed in a feature map feature fusion manner.

Step S14: and generating the re-identification feature of the image to be identified for image re-identification by classifying the fusion feature.

In this embodiment, after obtaining the fusion features based on a preset feature extraction model, the fusion features are classified in a full-connection manner, and the distributed features including more positions and detail information in the fusion features are mapped and classified, so as to generate re-recognition features for re-recognizing the image to be recognized.

The embodiment obtains the image to be identified; respectively extracting global re-identification features and local re-identification features of the image to be identified based on a preset feature extraction model; performing fusion processing on the global re-identification features and the local re-identification features of the image to be identified to obtain fusion features; and generating the re-identification feature of the image to be identified for image re-identification by classifying the fusion feature. The method can extract features of multiple scales from the image to be recognized to perform image re-recognition based on a preset feature extraction model, so that the robustness of the image re-recognition is enhanced, and the recognition performance of the image re-recognition is improved.

In some embodiments of the present application, the preset feature extraction model includes a coarse-grained feature extraction module and a fine-grained feature extraction module. The coarse-grained feature extraction module is used for extracting global re-identification features of the image to be identified, and the fine-grained feature extraction module is used for extracting local re-identification features in the image to be identified. In this embodiment, at least one fine-grained feature extraction module may be configured in the preset feature extraction model. And the fine-grained feature extraction module comprises at least two feature extraction units. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a feature extraction unit in an image re-identification method according to an embodiment of the present disclosure. As shown in fig. 2, the feature extraction unit includes two branch networks, a first branch network 10 includes a convolutional layer structure 11, a batch normalization layer structure 12 and a pooling layer structure 13, and a second branch network 20 includes a convolutional layer structure 21, a batch normalization layer structure 22, a linear rectification function layer structure 23 and a sequence stack structure 24. In the present embodiment, taking a pedestrian image as an example, the first branch network 10 and the second branch network 20 extract deep, local re-recognition features in the pedestrian image from different scales, respectively. Based on the embodiment, the pedestrian re-identification can be performed by extracting the local features of multiple scales from the pedestrian image by adopting the preset feature extraction model, the robustness is enhanced, and the identification performance of the pedestrian re-identification is improved.

In some embodiments of the present application, the fine-grained feature extraction module is configured to include a first feature extraction unit and at least a second feature extraction unit. In this embodiment, the first feature extraction unit and the second feature extraction unit have the same structure and each include two branch networks, where the first branch network includes a convolutional layer structure, a batch normalization layer structure and a pooling layer structure, and the second branch network includes a convolutional layer structure, a batch normalization layer structure, a linear rectification function layer structure and a sequence stack structure. The first feature extraction unit and the second feature extraction unit are different in that: the step length parameter of the convolution layer structure of the first branch network in the first feature extraction unit is configured to be larger than 1, and the step length parameter of the convolution layer structure of the first branch network in the second feature extraction unit is configured to be 1. That is, when the preset feature extraction model extracts features, the down-sampling process is performed once on the image to be recognized in the first feature extraction unit, and the down-sampling process is not performed in the second feature extraction unit. The fine-grained feature extraction module is arranged to comprise a first feature extraction unit and at least one second feature extraction unit, the jump connection capacity similar to a residual error network is realized, information of the previous feature extraction unit flows into the next feature extraction unit without being blocked, information circulation is improved, the problem of disappearing gradient and degradation caused by the fact that the network is too deep is solved, meanwhile, the preset feature extraction model is light in structure, and the extraction speed of the pedestrian re-identification features is realized in real time.

In some embodiments of the present application, please refer to fig. 3, and fig. 3 is a schematic structural diagram of a second branch network in the image re-identification method according to the embodiment of the present application. For example, as shown in fig. 3, the structure of the second branch network 20 may be specifically configured to be formed by sequentially connecting a first convolution layer structure 20-21-1, a first batch of normalization layer structures 20-22-1, a linear rectification function layer structure 20-23, a sequence stack structure 20-24, a second convolution layer structure 20-21-2, and a second batch of normalization layer structures 20-22-2.

In some embodiments of the present application, please refer to fig. 4, and fig. 4 is a block diagram illustrating another structure of the feature extraction unit in the image re-identification method according to the embodiment of the present application. As shown in fig. 4, x1 represents an input image to be recognized, and x1+1 represents an output pedestrian re-recognition feature. In the feature extraction unit, downlink is represented as a first branch network 10, CB11 is represented as a convolutional layer structure 11+ batch normalization layer structure 12, and AvgPool is represented as a pooling layer structure 13, in the first branch network 10, the pooling layer structure employs an average pooling algorithm, and the size of the feature map in the image to be recognized is reduced by matching the convolution with the average pooling. The second branch network 20 comprises a CBR11, a SEQStack and a CB11, wherein the CBR11 in the second branch network 20 is represented by the first convolutional layer structure 20-21-1+ the first batch normalization layer structure 20-22-1+ the linear rectification function layer structure 20-23, the SEQStack is represented by the sequence stack structure 20-24, and the CB11 is represented by the second convolutional layer structure 20-21-2+ the second batch normalization layer structure 20-22-2. In the present embodiment, the sequence stack structures 20-24(SEQStack) include a plurality of dual-core convolutional layer structure groups SEQ arranged in parallel, where each dual-core convolutional layer structure group SEQn is formed by stacking one or more dual-core convolutional layer structures DConv, and n is a positive integer and represents the number of stacked dual-core convolutional layer structures. In the present embodiment, the dual-core convolution layer structure DConv may be configured as a dual-core convolution layer (double convolution) + batch normalization layer (batcnorm) + linear rectification function layer (Relu), wherein the dual-core convolution layer (double convolution) is a convolution layer in which two convolution cores are stacked by 1 × 1 and 3 × 3, respectively. An exemplary dual core convolutional layer structure group formed by one dual core convolutional layer structure is SEQ1, and only one dual core convolutional layer structure DConv is provided in SEQ 1. For another example, a dual-core convolutional layer structure group formed by stacking two dual-core convolutional layer structures is SEQ2, two dual-core convolutional layer structures DConv are arranged in SEQ2 and stacked from top to bottom, and in the same way, a dual-core convolutional layer structure group formed by stacking n dual-core convolutional layer structures is SEQ n, and n dual-core convolutional layer structures DConv are arranged in SEQ n and stacked from top to bottom.

In some embodiments of the present application, 5 dual-core convolutional layer structure sets SEQ are preset, which are SEQ1, SEQ2, SEQ3, SEQ4, and SEQ5, and based on the 5 dual-core convolutional layer structure sets SEQ, three different sequence stack structures 20 to 24(SEQ stack) are constructed, which are a first sequence stack structure, a second sequence stack structure, and a third sequence stack structure. Wherein:

the first sequence stack structure is obtained by arranging in parallel a binuclear convolutional layer structure group SEQ1 formed by stacking one binuclear convolutional layer structure, a binuclear convolutional layer structure group SEQ3 formed by stacking three binuclear convolutional layer structures, and a binuclear convolutional layer structure group SEQ5 formed by stacking five binuclear convolutional layer structures.

The second sequence stack structure is obtained by arranging in parallel a dual-core convolutional layer structure group SEQ1 formed by stacking one dual-core convolutional layer structure, a dual-core convolutional layer structure group SEQ2 formed by stacking two dual-core convolutional layer structures, a dual-core convolutional layer structure group SEQ3 formed by stacking three dual-core convolutional layer structures, and a dual-core convolutional layer structure group SEQ4 formed by stacking four dual-core convolutional layer structures.

The third sequence stack structure is obtained by arranging in parallel a dual-core convolutional layer structure group SEQ1 formed by stacking one dual-core convolutional layer structure, a dual-core convolutional layer structure group SEQ2 formed by stacking two dual-core convolutional layer structures, a dual-core convolutional layer structure group SEQ3 formed by stacking three dual-core convolutional layer structures, a dual-core convolutional layer structure group SEQ4 formed by stacking four dual-core convolutional layer structures, and a dual-core convolutional layer structure group SEQ5 formed by stacking five dual-core convolutional layer structures.

In this embodiment, when extracting the features, the preset feature extraction model may select different sequence stack structures according to the size of the target pedestrian in the image to be recognized and the density of the pedestrian in the image to be recognized. As shown in fig. 3, the feature extraction unit adopts a second sequence stack structure, where SEQ1, SEQ2, SEQ3, SEQ4 are the corresponding outputs of SEQ1, SEQ2, SEQ3, SEQ4, respectively.

In this embodiment, a channel gate ChannelGate is provided in the sequence stack structures 20-24 (SEQStack). Illustratively, the channel gate ChannelGate is configured as a structure sequentially connected by an averaging pooling layer (AvgPool), a convolutional layer (Conv1), a linear rectification function layer (Relu), a convolutional layer (Conv2), and a normalization layer (Sigmoid), wherein an input dimension of Conv1 is consistent with an output dimension of Conv2, and is generally set to N, and an output dimension of Conv1 is consistent with an input dimension of Conv2, and is generally set to N/16, so as to excite a more important channel weight. In this embodiment, if the feature extraction unit is characterized as the first feature extraction unit, the stride value of the step parameter in the download is set to be greater than 1, and all the rest stride values are set to be 1. And if the characteristic extraction unit is characterized as a second characteristic extraction unit, setting all step length parameter stride values in the characteristic extraction unit to be 1.

In some embodiments of the present application, please refer to fig. 5, and fig. 5 is a block diagram illustrating a structure of a feature extraction model preset in an image re-recognition method according to an embodiment of the present application. As shown in fig. 5, the preset feature extraction model includes an Input module (Input), a coarse-grained feature extraction module (Input term), a plurality of fine-grained feature extraction modules (stage1, stage2, stage3), a feature fusion module (Conv4), a classifier (fc), and an Output layer (Output). In this embodiment, a convolution layer structure Conv0 and a pooling layer structure MaxPool are configured in the coarse-grained feature extraction module (Input step), and specifically, the convolution layer structure Conv0 in the coarse-grained feature extraction module (Input step) is configured to: the size of a convolution kernel is 7 multiplied by 7, the stride value of the step length parameter is 2, and the number of channels is 64; the pooling layer structure MaxPool in the coarse-grained feature extraction module (Input stem) is configured to: the size of the kernel is 3 multiplied by 3, the stride value of the step length parameter is 2, and the pooling treatment is carried out by adopting a maximum pooling algorithm. The fine-grained feature extraction module stage1 and the fine-grained feature extraction module stage2 are configured to include a first feature extraction unit (Start FFBlock) and a plurality of second feature extraction units (FFBlock), and the fine-grained feature extraction module stage3 is configured to include a plurality of second feature extraction units (FFBlock). For example, the number of feature extraction units of stage1, stage2 and stage3 may be configured to be 2, 2 and 2. In order to improve the feature extraction accuracy, the number of feature extraction units of stage1, stage2, and stage3 may be 3, 6, or 3. In this embodiment, the configuration of the number of channels of the overall structure of the preset feature extraction model as shown in fig. 5 may be as follows: conv0 for 64, Stage1 for 64, Stage2 for 256, Stage3 for 384, Conv4 for 512. Based on the speed and size balance considerations of the predetermined feature extraction model, the number of channels of the entire structure of the predetermined feature extraction model may also be configured to be 32/32/128/192/256 or 16/16/64/96/128.

In this embodiment, the preset feature extraction model is specifically configured with a coarse-grained feature extraction module (Input step) for extracting shallow, global pedestrian re-identification features in the image to be identified, and a fine-grained feature extraction module (stage1, stage2, stage3) for extracting deep, local pedestrian re-identification features in the image to be identified. Based on the embodiment, the preset feature extraction model takes into account the extraction capability of the local features and the global features, the pedestrian re-identification is realized by extracting the features of multiple scales from the image to be identified, the information loss of the image to be identified is reduced, and the identification performance of the pedestrian re-identification is improved.

In this embodiment, in order to realize the feature extraction functions corresponding to the coarse-grained feature extraction module (Input term) and the fine-grained feature extraction modules (stage1, stage2, and stage3), the convolution kernel of the convolution layer structure Conv0 in the coarse-grained feature extraction module (Input term) may be configured to be larger than the convolution kernels of the convolution layer structures Conv1 and Conv2 in the fine-grained feature extraction modules (stage1, stage2, and stage 3). Illustratively, the convolution layer structure Conv0 in the coarse-grained feature extraction module (Input step) selects a larger convolution kernel, so that the coarse-grained feature extraction module (Input step) can obtain as large a receptive field as possible when extracting features, and concentrate on global information, thereby extracting and obtaining shallow-layer global pedestrian re-identification features in the image to be identified. And the convolution layer structures Conv1 and Conv2 in the fine-grained feature extraction modules (stage1, stage2 and stage3) select smaller convolution kernels, so that the fine-grained feature extraction modules (stage1, stage2 and stage3) can adopt the smallest receptive field when extracting features, thereby focusing on local information more, and extracting and obtaining deep and local pedestrian re-identification features in the image to be identified.

In some embodiments of the present application, the coarse-grained feature extraction module (Input stem) performs, based on the convolution layer structure Conv0 and the pooling layer structure MaxPool, one down-sampling process, that is, two down-sampling processes, on the image to be recognized in the convolution and pooling processes, respectively, so as to obtain the global pedestrian re-recognition feature of the image to be recognized. The down-sampling is a process of scaling down the width and height of the feature map, and the scaling down is determined by configuring a step parameter (stride value). When the stride value is set to 1, the down-sampling processing is not executed; and when the stride value is larger than 1, performing down-sampling processing of a corresponding proportion according to the stride value. For example, when the stride value of the convolutional layer structure is set to 2, the output/input of the convolutional layer structure is 1/2 as a result of the down-sampling process, and when the stride value of the convolutional layer structure is set to 3, the output/input of the convolutional layer structure is 1/3. For example, in this embodiment, the step size parameter (stride value) of the convolution layer structure Conv0 and the pooling layer structure MaxPool in the coarse-grained feature extraction module (Input step) are both configured to be 2, so as to perform down-sampling processing twice on the image to be identified, thereby obtaining the global pedestrian re-identification feature of the image to be identified through the coarse-grained feature extraction module (Input step).

In some embodiments of the present application, please refer to fig. 6, and fig. 6 is a schematic flowchart of a training method of a feature extraction model preset in the image re-recognition method according to the embodiment of the present application. The details are as follows:

step S61: acquiring training sample data;

step S62: inputting the training sample data into the preset feature extraction model, performing pedestrian re-recognition feature self-learning processing and pedestrian tracking processing based on the training sample data by adopting the preset feature extraction model, and acquiring the preset feature extraction model trained to be in a convergence state.

In this embodiment, when a preset feature extraction model is trained, marker 1501 training data and a duke mtmc-reid data set are used, and marker 1501 testing data is used during testing. In this embodiment, based on the training data and the test data, the performance result of the preset feature extraction model trained to the convergence state is obtained as Datasets: market 1501; rank-1: 95.72, respectively; rank-5: 98.69 of; rank-10: 99.29, respectively; mAP: 89.97 of; a metric: 92.85.

in this embodiment, all modules of a preset feature extraction model including an input module, a coarse-grained feature extraction module, a fine-grained feature extraction module, a feature fusion module, a classifier and an output module are used for feature extraction learning, so as to train the capability of the preset feature extraction model for extracting pedestrian re-recognition features; and (3) performing tracking learning by adopting all modules except the classifier in the preset feature extraction model so as to train the pedestrian tracking capability of the preset feature extraction model. The preset feature extraction model obtained by training can balance the advantages and the defects of the technology of pedestrian tracking and pedestrian re-identification under a single camera, and can better perform pedestrian positioning and pedestrian trajectory tracking. And the preset feature extraction model is applied to pedestrian tracking, so that the accuracy of feature extraction is improved.

It should be understood that, the size of the serial number of each module in the foregoing embodiments does not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In some embodiments of the present application, please refer to fig. 7, and fig. 7 is a block diagram illustrating a basic structure of an image re-recognition apparatus according to an embodiment of the present application. The apparatus in this embodiment comprises units for performing the above-described method embodiments. The following description refers to the embodiments of the method. For convenience of explanation, only the portions related to the present embodiment are shown. As shown in fig. 7, the image re-recognition apparatus includes: an image acquisition module 71, an image feature extraction module 72, an image feature fusion module 73, and an image feature classification module 74. The image acquiring module 71 is configured to acquire an image to be identified. The image feature extraction module 72 extracts global re-identification features and local re-identification features of the image to be identified respectively based on a preset feature extraction model. The image feature fusion module 73 is configured to perform fusion processing on the global re-identification feature and the local re-identification feature of the image to be identified, so as to obtain a fusion feature. The image feature classification module 74 is configured to perform classification processing on the fusion features to generate re-recognition features, where the re-recognition features are used for performing image re-recognition on the image to be recognized.

In some embodiments of the present application, please refer to fig. 8, and fig. 8 is a block diagram illustrating a basic structure of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic apparatus 8 of this embodiment includes: a processor 81, a memory 82 and a computer program 83, such as a program for an image re-recognition method, stored in said memory 82 and executable on said processor 81. The processor 81 implements the steps in the various embodiments of the image re-recognition method described above when executing the computer program 83. Alternatively, the processor 81 implements the functions of the modules in the embodiment corresponding to the contract management apparatus when executing the computer program 83. Please refer to the description related to the embodiment, which is not repeated herein.

Illustratively, the computer program 83 may be divided into one or more modules (units) that are stored in the memory 82 and executed by the processor 81 to accomplish the present application. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 83 in the electronic device 8. For example, the computer program 83 may be divided into an acquisition module, a processing module and an execution module, each module having the specific functions as described above.

The turntable device may include, but is not limited to, a processor 81, a memory 82. Those skilled in the art will appreciate that fig. 8 is merely an example of an electronic device 8 and does not constitute a limitation of the electronic device 8 and may include more or less components than shown, or combine certain components, or different components, e.g., the turntable device may also include input-output devices, network access devices, buses, etc.

The Processor 81 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 82 may be an internal storage unit of the electronic device 8, such as a hard disk or a memory of the electronic device 8. The memory 82 may also be an external storage device of the electronic device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 8. Further, the memory 82 may also include both an internal storage unit and an external storage device of the electronic device 8. The memory 82 is used for storing the computer program and other programs and data required by the turntable device. The memory 82 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments. In this embodiment, the computer-readable storage medium may be nonvolatile or volatile.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image re-recognition method, comprising:

acquiring an image to be identified;

2. The image re-recognition method according to claim 1, wherein the preset feature extraction model comprises a coarse-grained feature extraction module and a fine-grained feature extraction module, wherein:

3. The image re-recognition method of claim 2, wherein the second branch network in the feature extraction unit is configured to be formed by sequentially connecting a first convolution layer structure, a first batch of normalization layer structures, a linear rectification function layer structure, a sequence stack structure, a second convolution layer structure, and a second batch of normalization layer structures.

4. The image re-recognition method according to claim 3, wherein the sequence stack structure includes a plurality of dual core convolutional layer structure groups arranged in parallel, wherein each dual core convolutional layer structure group is formed by stacking one or more dual core convolutional layer structures.

5. The image re-recognition method of claim 4, wherein the sequence stack structure is configured to:

6. The image re-recognition method according to any one of claims 2 to 5, wherein the fine-grained feature extraction module is configured to include a first feature extraction unit and at least one second feature extraction unit, wherein a step size parameter of the convolutional layer structure of the first branch network in the first feature extraction unit is configured to be greater than 1, and a step size parameter of the convolutional layer structure of the first branch network in the second feature extraction unit is configured to be 1.

7. The image re-identification method according to claim 2, wherein in the coarse-grained feature extraction module, the step size parameters of the convolutional layer structure and the pooling layer structure are both configured to be greater than 1, so as to perform downsampling processing on the image to be identified when the re-identification features of the image to be identified are extracted.

8. The image re-recognition method of claim 7, wherein a size of a convolution kernel of a convolutional layer structure configuration in the coarse-grained feature extraction module is larger than a size of a convolution kernel of a convolutional layer structure configuration in the fine-grained feature extraction module.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of claims 1-8 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to claims 1-8.