CN117217282A - Structured pruning method for deep pedestrian search model - Google Patents

Structured pruning method for deep pedestrian search model Download PDF

Info

Publication number
CN117217282A
CN117217282A CN202311235935.3A CN202311235935A CN117217282A CN 117217282 A CN117217282 A CN 117217282A CN 202311235935 A CN202311235935 A CN 202311235935A CN 117217282 A CN117217282 A CN 117217282A
Authority
CN
China
Prior art keywords
pruning
search model
pedestrian
channel
pedestrian search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311235935.3A
Other languages
Chinese (zh)
Inventor
陈佳鑫
吴梓萌
王蕴红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202311235935.3A priority Critical patent/CN117217282A/en
Publication of CN117217282A publication Critical patent/CN117217282A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention belongs to the technical field of model compression, and relates to a structured pruning method for a deep pedestrian search model, which comprises the following steps: s1, preparing an image data set, finishing data preprocessing, dividing the data set, and constructing a pedestrian search model to be pruned; s2, pre-training the pedestrian search model to be pruned until the pedestrian search model converges; s3, initializing a pruning module, setting a target compression scale, and grouping and initializing a coupling layer; s4, pruning the model: calculating channel importance measurement, and determining a pruned network layer by combining the position of the convolution layer and the compression scale; s5, fine-tuning the pruned model, deleting a network structure corresponding to the channel according to the mask, and training the pruned model until convergence; s6, inputting the query pedestrian image and the candidate image into the trimmed pruning model, and outputting the model to obtain a final reasoning result. The invention improves the performance of the compressed deep pedestrian search model and reduces the precision loss.

Description

Structured pruning method for deep pedestrian search model
Technical Field
The invention belongs to the technical field of model compression, and particularly relates to a structured pruning method for a deep pedestrian search model.
Background
The pruning technology of the pedestrian search model aims at deleting redundant parameters in the pedestrian search model, reducing the computational complexity, improving the reasoning speed of the pedestrian search model, and reducing the precision loss caused by the reduction of the parameters of the pedestrian search model as much as possible.
In recent years, with the rapid development of deep learning, the scale and parameters of the deep neural network are rapidly increased, the storage and reasoning costs of the pedestrian search model are also increased, and in order to solve the problems, researchers introduce the idea of removing redundant calculation into the compression of the deep pedestrian search model. Pedestrian searching is an important task in the field of computer vision, and has wide application in intelligent monitoring, smart cities, intelligent retail and the like. The core aim is to locate and determine the target pedestrian to be queried from the monitoring image. The task consists of two subtasks of pedestrian detection and pedestrian re-identification, and the parameter quantity required by the pedestrian search model is large. Meanwhile, a monitoring terminal, an unmanned aerial vehicle/vehicle and other computing resource limited platforms are important application scenes of the task, and the actual service has high requirements on quick response of the system. Therefore, the deep pedestrian search model is effectively compressed, the resource and time consumption is reduced, and the method is a key technology for improving the deployability of the deep pedestrian search model in various fields such as security, traffic, business and the like.
The existing deep pedestrian search model compression work is mostly focused on designing lightweight network structures or adopting lightweight feature extraction networks. For example, li et al in 2019 will detect as a major factor affecting the speed of a pedestrian search task, and further use lightweight MobileNet as the backbone network for pedestrian detection. Most of the methods rely on manual network structure design, and have low flexibility when facing to rapidly updated industrial application scenes and different hardware devices. In contrast, pruning methods are mostly independent of the original pedestrian search model structure, and the compression of the pedestrian search model size is significantly more flexible using floating point calculation amount specification.
The existing deep pedestrian search model pruning method is mainly oriented to general visual tasks and can be roughly divided into unstructured pruning and structural pruning. The unstructured pruning method realizes pruning by zeroing out a single weight. Because the weights removed by this type of method are irregular in spatial distribution, most hardware has difficulty in accelerating them. The structural pruning method takes a filter, a characteristic diagram channel and the like as basic units of pruning, so that large-scale compression and acceleration can be realized. A representative method in the structural pruning method is pruning based on importance measurement, and the basic idea is to design indexes such as channel importance and the like according to the contribution degree of each unit to the pedestrian search model, and select and delete low-importance units. The method comprises pruning based on a loss criterion, reflecting the importance degree of a feature map channel by using the change quantity of a loss function after pruning the channel, and approximately calculating an importance index by using a feature gradient value transmitted by a network. Pruning based on absolute value criteria, using the weight matrix norms as a measure of importance, setting a threshold and deleting all cells below the threshold. Most of the general pruning algorithms have the computer vision tasks of multi-aspect classification, detection, segmentation and the like, and have more outstanding performance on the pedestrian search model based on ResNet, faster-RCNN and the like.
However, there are currently only a small number of efforts to try the pedestrian search model compression towards the pedestrian search task. For example, a pedestrian search model of a pedestrian re-recognition subtask is compressed using unstructured pruning, and the compressed pedestrian search model is trimmed using a distillation method, without considering a pedestrian detection subtask. Compared with a general visual task, the pedestrian search task has a vivid task characteristic, and the direct use of the general pruning algorithm has a certain limitation, so that a large optimization space exists in the compressed pedestrian search model. First, in the mainstream end-to-end pedestrian search model, the two subtasks of pedestrian detection and pedestrian re-recognition typically share a backbone network. Because the detection subtasks concern common characteristics of pedestrians and the re-identification subtasks concern special identity characteristics of pedestrians, the two characteristics are not consistent in terms of semantics, when a pruning method based on importance measurement is used, the difference is not considered in the existing method, the contribution degree of a specific channel to two subtasks is not consistently reflected, and the channel which is important to the subtasks is pruned by mistake. Secondly, the existing method is easy to have the problem of pruning under the condition of high compression rate, namely, the number of residual channels of part of network layers is too small, so that the fine tuning performance of the pedestrian search model is poor after pruning. In addition, pedestrian search tasks mostly use online instance matching (Online Instance Matching, OIM) penalty, which builds positive and negative sample pairs through proxy features that dynamically maintain identity, assisting the pedestrian search model training. However, in the pruning process, as the parameters of the pedestrian search model decrease, the representation capability of the agent features to fine granularity semantics is degraded, and finally the performance of the pedestrian search model after pruning is affected.
In summary, the existing general pruning algorithm is not specially designed for the deep pedestrian search model, so that a large optimization space exists for performance.
Disclosure of Invention
The technical problems to be solved are as follows:
aiming at the defects of the existing pruning method of the pedestrian search model, the invention aims to provide a structured pruning method for a deep pedestrian search model, and aims to solve the problems that the existing pruning method based on importance measurement is difficult to fully reflect the importance degree of a channel to a plurality of subtasks for pedestrian search, the partial network layer is over-pruned and the characteristic fine granularity semantic is degraded.
The technical scheme adopted is as follows:
the invention provides a structured pruning method facing a deep pedestrian search model, which is characterized by comprising the following specific steps of:
s1: preparing an image dataset and constructing a pedestrian search model to be pruned: selecting an image data set, preprocessing the image data set, dividing the image data set, and constructing the pedestrian search model to be pruned based on a convolutional neural network;
S2: pre-training the pedestrian search model to be pruned: maintaining the original OIM loss function and super-parameter setting of the pedestrian search model, and iteratively updating the network parameters of the pedestrian search model until the pedestrian search model converges;
s3: initializing a pruning module: the pruning module comprises a subtask perceived channel importance estimation module, a channel quantity balancing constraint module and a variable OIM module; setting a target pruning scale, grouping the coupling layers of the convolutional neural network by combining a calculation graph structure automatically constructed in a deep learning framework, and dividing the convolutional layers under the input of the same feature graph into a group of coupling layers; loading the network parameters of the pre-trained pedestrian search model, replacing the OIM loss function in the step S2 with a variable OIM module, and initializing the super parameters of each pruning module;
s4: pruning the pedestrian search model to obtain a pruned pedestrian search model: in the pruning stage, the pruning process is iterated, and a group of coupling channels are pruned each time, wherein the coupling channels refer to channels at the same position in a coupling layer, and in order to ensure the continuity of activation transmission, pruning is carried out on the coupling channels until a preset pruning rate is reached by sharing a pruning Mask for the coupling layer; the method specifically further comprises the steps of forwarding input data, calculating and updating channel importance metrics of the subtask perceived channel importance estimation module, wherein the channel importance metrics are combined with the channel quantity balance constraint module to jointly determine a channel capable of pruning; simultaneously updating network parameters of the pedestrian search model by using gradients, and updating super parameters of the pruning module to obtain a pruned pedestrian search model;
S5: finely adjusting a pedestrian search model after pruning: in the fine tuning stage, only the pedestrian search model and the variable OIM module after pruning are reserved, network parameters of the pedestrian search model after pruning are loaded, super parameters of the pedestrian search model before pruning are adjusted, and the pedestrian search model after pruning is retrained until convergence;
s6: reasoning by using the trimmed pedestrian search model: and respectively inputting the query image and the candidate image into the pedestrian search model after the fine-tuned pruning, comparing output results of the two images after the fine-tuned pruning is input into the pedestrian search model, and selecting the pedestrians with the most similar identity characteristics to those in the query image in the candidate pedestrians detected in all the candidate images, namely the reasoning results of the pedestrian search model.
Further, in step S1, the selected image dataset includes a real image and pedestrian position and identity marking information; the image preprocessing comprises the operations of adjusting the image proportionally, overturning the image and normalizing the image; dividing the image data set according to the standard given by the selected image data set;
The pedestrian search model to be pruned is an end-to-end two-stage pedestrian search model based on a convolutional neural network, and is specifically selected as a SeqNet network model; the pedestrian search model to be pruned comprises the following three parts:
backbone network: extracting image features from the input image data;
network transition layer: combining or processing the image features extracted through the backbone network;
network head: the pedestrian recognition method comprises two task heads of a pedestrian detection subtask and a pedestrian recognition subtask, and is used for predicting pedestrian positioning and recognizing pedestrian identity.
Further, in step S2, a random gradient descent optimizer is used to optimize network parameters.
Further, in step S3, the target compression scale is expressed as a ratio of floating point calculated amounts FLOPs of the pruned pedestrian search model and the original pedestrian search model; the floating point calculation amount FLPs of each convolution layer is calculated as follows: for a convolutional layer c i To input the number of channels of the feature diagram, h i And w i The height and width of the input feature map are respectively; at the same time, c k For the number of convolution kernel channels, h k And w k Respectively convolutionCore height and width, n k The number of convolution kernels; c o To output the number of channels of the characteristic diagram, h o And w o The height and width of the output feature map are respectively; when the size of the image data of the input batch is N, the floating point calculated amount FLOPs generated by the convolution layer calculation can be approximately expressed by the following formula:
FLOPs=N×c i ×h k ×w k ×h o ×w o ×c o
further, in step S4, the pruning stage is performed by deleting the input feature map channel; specifically, for a certain convolution layer, the input feature diagram is A epsilon R N×C×H×W Wherein N represents the size of image data of an input batch, C represents the number of channels of the feature map, and H and W represent the height and width of the feature map respectively; m is E R C Mask representing convolved input channel, R c Representing a C-dimensional vector space, the value of all components of the mask m is initially set to 1, and the mask m of the jth channel of the convolutional layer is pruned j Setting to 0; after pruning, the input feature matrix is transformed intoHere, +.is shown that in the second dimension of the channel dimension point multiplication, i.e., in the input feature map, all elements of the pruned channel become 0.
Further, in step S3, the subtask aware channel importance estimation module: the subtask perceived channel importance estimation module calculates a subtask importance value of a channel based on a loss criterion and performs weighted fusion; for the N-th training image data, n=1, 2, 3..n, in the forward propagation process of the convolutional neural network, the subtask perceived channel importance estimation module records an input feature map matrix a of each network layer n For the n training image data input in the back propagation process of the convolutional neural network, the subtask perceived channel importance estimation module records the subtask loss function relative to the input feature map matrix A of each network layer n Gradient of (2)According to the transformation formulaCalculating the gradient of the subtask penalty function with respect to the mask m>L n Searching a model for the pedestrian corresponding to a subtask loss function of the n-th training image data;
s j the change quantity of the subtask loss function of the pedestrian search model is obtained according to a Taylor formula and is approximately expressed as:
l is a subtask loss function of the pedestrian search model; t represents the transpose operation of the matrix,representing the gradient; e, e j ∈R C In the form of a one-hot vector, namely, the j-th component of the vector takes a value of 1, and the other component takes a value of 0; expressed in gWherein the component corresponding to each channel j +.>m j The component representing the corresponding channel j in the mask vector, i.e. the j-th component, is +.>H is the Heisen matrix of the loss function of the pedestrian search model with respect to the mask m, available +.>According to the Fisher information formula, for the (j, j) component H of the hessian matrix of channel j jj Can be approximated as:
wherein E represents mathematical expectation, x represents an input sample, y represents a class deduced by model reasoning, and P (y|x) represents a probability that the sample x is classified into the y class;
the subtask loss function variation of the pedestrian search model can be approximately expressed as:
combining a coupling layer shared pruning mask mechanism, and using all convolution layer gradients in the coupling layer group to change the variable quantity s of the subtask loss function of the pedestrian search model j The expansion can be obtained:
where CG represents the set of all convolution layers within the coupled group,representing the duplication of the pruning mask of the j-th lane on the convolutional layer/.
Further, in step S3, the channel number equalization constraint module: the channel quantity balancing constraint module dynamically sets a minimum residual channel quantity threshold according to the position of the convolution layer in the convolution neural network structure and the pruning scale so as to relieve the pruning phenomenon; in the initialization stage, dividing a network layer into a main network layer and other network layers; as long as the coupling layer packet contains layers in the backbone network, all layers in the coupling layer packet are divided into backbone network parts; using the ratio of the number of remaining channels of each layer TO the number of channels of the original pedestrian search model as a channel number constraint quantity, wherein thresholds of a trunk network part and other parts are respectively represented by TB and TO and are used as super parameters of the channel number balance constraint module; when the number of the remaining channels is lower than the threshold value, pruning the corresponding convolution layer is stopped; PR E [0,1] represents the current compression scale of the pedestrian search model; for other network layers, TO dynamically decreases with PR; for a backbone network layer, in order to protect the representation capability of the network in the early pruning stage, setting a TB initialization value to be a larger value and keeping the TB initialization value unchanged, and gradually reducing the TB threshold value after all layers in the network reach the threshold value and cannot prune; the above process is specifically expressed as follows:
In step S3, the variable OIM module: the variable OIM module is used for replacing an original OIM loss function in the pedestrian search model, and the characteristic distribution of the pedestrian search model after compression is optimized by utilizing the discriminant identity characteristic in the pre-training stage so as to alleviate the characteristic degradation problem of the pedestrian search model after compression; specifically, after the pre-training is finished, storing the pedestrian search model parameter file and simultaneously storing the proxy feature v of each identity in the OIM loss function feature library of the pedestrian search model pt The method comprises the steps of carrying out a first treatment on the surface of the Loading and pre-storing the agent characteristics in a pruning stage or a fine tuning stage of the pedestrian search model;
the agent characteristic v is applied in pruning stage or fine tuning stage pt Introducing the agent characteristics of the OIM loss function of the pedestrian search model according to different proportions and fixing, wherein the fixed agent characteristics are defined as v fix The method comprises the steps of carrying out a first treatment on the surface of the LB is the number of tagged identities, Q is the number of untagged identities, v lb Proxy feature representing the lb-th identity in a tagged identity set, u q Representing the q-th identity feature in the label-free queue; on the basis, for the network extracted feature f, when the proxy feature is not fixed, the probability p that the network extracted feature f is determined as the identity a a Can be expressed as:
while when the proxy feature is fixed, the probability p that the network extracted feature f is determined to be identity a a Can be expressed as:
wherein T represents matrix transposition operation, τ is a super parameter and is specified by the original pedestrian search model;
on the basis of the above, the OIM loss function of the pedestrian search model can be expressed as l=e x [logp a ]And the channel importance calculation module is used for updating the network parameters of the pedestrian search model and calculating the channel importance of the importance estimation module.
Further, in step S4, the entire pruning stage includes the following steps:
s41, forward propagating input data in the convolutional neural network three times: after the first forward transmission, storing the input feature images of all the convolution layers; after the second forward pass, calculating the loss function of the pedestrian detection subtask and carrying out gradient reverse pass, and simultaneously calculating the change quantity of the loss function of the detection subtask of each channel; after the third forward pass, calculating a loss function of the pedestrian re-identification subtask, carrying out gradient reverse pass, and calculating the change quantity of the loss function of the re-identification subtask of each channel;
s42, calculating each channel importance measure S of the subtask by using the channel importance estimation module perceived by the subtask j After pruning based on the importance measurement, updating network parameters of the module by using gradients accumulated in the two subsequent frontward passes in order to keep the pedestrian search model convergence assumption;
Specifically, based on the calculation of the importance of the subtasks, an importance measure S of each channel is obtained j The steps of (a) are as follows:
first, two sub-categories of pedestrian detection and pedestrian re-recognition are used respectivelyThe loss function of the task is reversely transmitted, and the variation s of the loss function of the pedestrian detection subtask is calculated by all channels of the pedestrian search model det And the variation s of the loss function of all channels of the pedestrian search model on the pedestrian re-identification subtask reid The method comprises the steps of carrying out a first treatment on the surface of the For channel ch, the convolution computation directly related to it is F ch =h k ×w k ×h o ×w o ×c o The method comprises the steps of carrying out a first treatment on the surface of the Correcting the subtask loss function gradient corresponding to a channel which does not act on another subtask in the subtask by using the maximum subtask loss function variable quantity under the unit calculated quantity, wherein the subtask loss function gradient is 0; detecting the variation s of the subtask loss function of the pedestrian by using the channel in the pedestrian re-identification subtask det (ch) for example, the specific formula is as follows:
wherein CH represents a set containing all channels in the network model, F CH Namely, the convolution calculated quantity related to the channel CH;
then, normalizing the variation of the channel on the loss functions of the two subtasks to the same scale, wherein the normalization standard is the first alpha% value of each descending order in order to eliminate the influence of the numerical value long tail distribution; specifically, the loss function variation s for the pedestrian re-recognition subtask reid Scaling, and scaling the variable s' reid The formula is as follows:
on the basis, alpha and beta are super parameters of a channel importance estimation module perceived by a subtask, the subtask duty ratio is reflected by the super parameters beta, the subtask loss function variable quantity is re-weighted and fused, and the fused variable quantity is used as an importance index S of the channel to the whole pedestrian search model fuse The calculation formula is as follows:
finally, the importance index S is subjected to memory reduction caused by channel pruning fuse Normalizing to obtain importance measure S of each channel j The specific formula is as follows:
s43, pruning operation is carried out once every d groups of image data, wherein d is a preset numerical value, and only importance values are accumulated for the previous d-1 times; after the d accumulation, selecting a convolution layer which can still be pruned by using the channel quantity balancing constraint module, and selecting a channel with the lowest importance measure for pruning; and clearing the accumulated importance measurement after pruning each time, calculating floating point calculated quantity of the rest pedestrian search models, judging whether pruning is stopped, and updating the super parameters TB and TO of the channel quantity balance constraint module.
Further, in the pruning stage in step S4, the pruning operation is completed only by zeroing the channel positions corresponding to the convolutional layer masks; and for the fine tuning stage in step S5, deleting the network structure corresponding to the channel according to the mask.
Further, in the pruning stage of step S4, the proxy feature is dynamically updated in the early stage of the pruning stage, and a pre-training feature is introduced in the later stage of the pruning stage; specifically, PR ε [ r,1 ] at the early stage of pruning]R is a preset numerical value, in order to meet the convergence assumption of the pedestrian search model as far as possible, the same agent characteristic dynamic updating strategy as the traditional OIM loss function is still adopted, and the rule is consistent with the original OIM loss function learning strategy adopted by the pedestrian search model; assuming pr=r, the proxy feature is represented by the initial feature v pt Updated to v cur Compressing the pedestrian search model scale PR epsilon [0, r]Stage, using fixed proxy feature, gradually increasing duty ratio of pre-training feature until completely updating to v pt The specific update formula is as follows:
v fix =γ·v pt +(1-γ)·v cur
wherein, gamma is the super parameter of the variable OIM module, and the super parameter is dynamically changed to control the duty ratio of the pre-training feature in the agent feature;
in the fine tuning stage of step S5, the rule of the proxy feature is opposite to that of the pruning stage, and in the initial stage of the fine tuning stage, the proxy feature is kept fixed first, and then dynamic update is recovered; specifically, v, which remains fixed at the end of pruning, is maintained at the beginning of the trimming stage pt As proxy features to maintain the discriminant of the proxy features; and in the later stage of the fine tuning stage, adopting an updating strategy of a traditional OIM loss function module to dynamically update the proxy characteristics, so that the pedestrian search model can be fully optimized after pruning.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a pruning method for a deep pedestrian search model, and provides a pruning algorithm for the pedestrian search model based on subtask perception and balanced constraint of the number of channels, which overcomes the defect that the general pruning algorithm does not fully consider the characteristics of pedestrian search multitasking, and improves the reasoning precision of the pedestrian search model after compression.
2. The invention provides a subtask-aware channel importance estimation module. The module estimates importance indexes of each channel on subtasks based on loss criteria, and performs weighted fusion to reserve channels important for a plurality of subtasks.
3. The invention provides a channel quantity balancing constraint module. Based on the network layer position where the channels are located and the pruning scale, minimum channel quantity constraint is dynamically set for each layer, and network structure balance is maintained, so that the characterization performance of the pedestrian search model after pruning is maintained.
4. The invention provides a variable online instance matching OIM module. In the agent features of pruning and fine tuning stages, the module dynamically adjusts the duty weight of the identity features of the pedestrian search model to be pre-trained, and reduces the negative influence of fine granularity semantic discrimination degradation on the pruning and fine tuning of the pedestrian search model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a structured pruning method for a deep pedestrian search model of the present invention;
fig. 2 is a schematic flow chart of each pruning module in the pruning stage or the fine tuning stage according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The specific steps of the embodiment of the pruning method for the pedestrian search model of the present invention are as follows, as shown in fig. 1:
s1: and preparing a data set and constructing the pedestrian search model to be pruned. The image data set selection is completed, the image is preprocessed, the image data set is divided into a training set and a testing set, and the method specifically comprises the following steps:
the selected image dataset contains real images, pedestrian position and identity annotation information. To verify the pedestrian search model performance as well as robustness, the present embodiment selects the pedestrian search common reference dataset, namely the CUHK-SYSU dataset and the PRW dataset. The data set image and label organization mode is consistent with the MMdetection of the open source target detection framework.
The image is preprocessed. On the CUHK-SYSU and PRW datasets, the present example scaled the image to [900,1500], and randomly flipped the image at a 50% probability level, and finally performed the image normalization operation.
The data set partitioning is based on a standard given by each data set. For the CUHK-SYSU dataset, 11206 images of 5532 pedestrians are selected for training, 2900 images of 2900 pedestrians are used as query images, and 6978 images are used for the candidate test set. For the PRW dataset, 5704 images of 482 pedestrians were chosen for training, 2057 were chosen for use as query images, with 6112 images for the candidate test set.
And constructing the pedestrian search model to be pruned. In this embodiment, the end-to-end two-stage SeqNet pedestrian search model is selected for pruning compression. The pedestrian search model first extracts image features using the first 4 network layers of the ResNet50, then obtains candidate boxes via the RPN network and clips and pools feature images according to the candidate boxes. In order to preferentially improve the pedestrian detection precision, the pedestrian search model inputs the cut feature images into a Faster-RCNN detection head and cuts and pools the feature images again. Finally, the 5 th layer of ResNet50 is used to extract the feature vector from the detection frame for identification re-identification and prediction of the pedestrian positioning frame.
S2: and pre-training the pedestrian search model to be pruned. Maintaining the original pedestrian search model training settings, using a pre-trained ResNet50 network on ImageNet as the backbone network. The batch size was set to 4, the initial learning rate was set to 0.0024, and the training total discussion was set to 20. The learning rate is reduced by 10% in the training of the 1 st round and the training of the 16 th round by adopting a linear wakeup strategy. Using a momentum of 0.9 and a weight decay of 5 x 10 -4 And (3) carrying out network parameter optimization by the random gradient descent optimizer of the model (C), and continuously iterating the optimized network parameters until the pedestrian search model converges.
S3: and initializing a pruning module to finish the initialization setting of coupling layer grouping and super parameters of each module. In this embodiment, the super parameter α and β in the subtask aware channel importance estimation module are set to 0.1 and 0.5, respectively. In the channel quantity balance constraint module, the super parameter TB is initialized to 0.9; the hyper-parameter r in the variable online instance matching module, i.e., the variable OIM module, is set to 0.8, i.e., the fixed proxy feature is used starting when compressing the pedestrian search model size to 80%, while gamma increases from 0.2 to 1, with each 0.05 decrease in pr, gamma increases by 0.1.
S4: pruning is carried out on the pedestrian search model. According to the pruning process shown in fig. 2, in this embodiment, the target compression pedestrian search model is set to 25% and 10% of the original pedestrian search model, the batch size is 1, and the learning rate is fixed to 0.0001. Using a momentum of 0.9 and a weight decay of 5 x 10 -4 Is subjected to parameter optimization. Further, in the present embodiment, pruning is performed 1 time every 10 cumulative importance metrics, and this process is iterated until the pedestrian search model calculation amount reaches a threshold value.
S5: and fine-tuning the pedestrian search model after pruning. According to fig. 2, the fine tuning process is further included, in this embodiment, the batch size is set to be 4, the initial learning rate is 0.00024, and the training is performed for 20 rounds. The learning rate is reduced by 10% in the training of the 1 st round and the training of the 16 th round by adopting a linear wakeup strategy. Using a momentum of 0.9 and a weight decay of 5 x 10 -4 Network parameter optimization by a random gradient descent optimizer of (a). In the variable OIM module, the first 15 training rounds use fixed proxy features for the pedestrian search model compressed to 25% flow. For the pedestrian search model compressed to 10% flops, the first 5 training rounds used fixed proxy features.
S6: and reasoning by using the trimmed pedestrian search model. Inputting the candidate images into the trimmed pedestrian search model, outputting the detected positions and identity characteristics of the candidate persons, inputting the candidate images into the trimmed pedestrian search model, inputting the identity characteristics of the candidate persons, and comparing the output pedestrian to be queried with the identity characteristics of the candidate persons to obtain an inference result. On the CUHK-SYSU data set, the average precision value mAP of the pedestrian search model compressed to 25% FLPs reaches 91.90%, and the precision of Top-1 and Top-5 reaches 92.66% and 97.45% respectively; the average accuracy value mAP of the pedestrian search model compressed to 10% FLPs is 90.84%, and the Top-1 and Top-5 accuracy respectively reach 91.24% and 96.90%. The technology provided by the invention can obtain higher precision under the same compression scale, namely, the parameter quantity of the pedestrian search model can be effectively reduced under the condition of smaller precision loss, and the reasoning speed is improved.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (10)

1. The structured pruning method for the deep pedestrian search model is characterized by comprising the following specific steps of:
s1: preparing an image dataset and constructing a pedestrian search model to be pruned: selecting an image data set, preprocessing the image data set, dividing the image data set, and constructing the pedestrian search model to be pruned based on a convolutional neural network;
s2: pre-training the pedestrian search model to be pruned: maintaining the original OIM loss function and super-parameter setting of the pedestrian search model, and iteratively updating the network parameters of the pedestrian search model until the pedestrian search model converges;
s3: initializing a pruning module: the pruning module comprises a subtask perceived channel importance estimation module, a channel quantity balancing constraint module and a variable OIM module; setting a target pruning scale, grouping the coupling layers of the convolutional neural network by combining a calculation graph structure automatically constructed in a deep learning framework, and dividing the convolutional layers under the input of the same feature graph into a group of coupling layers; loading the network parameters of the pre-trained pedestrian search model, replacing the OIM loss function in the step S2 with a variable OIM module, and initializing the super parameters of each pruning module;
S4: pruning the pedestrian search model to obtain a pruned pedestrian search model: in the pruning stage, the pruning process is iterated, and a group of coupling channels are pruned each time, wherein the coupling channels refer to channels at the same position in a coupling layer, and in order to ensure the continuity of activation transmission, pruning is carried out on the coupling channels until a preset pruning rate is reached by sharing a pruning Mask for the coupling layer; the method specifically further comprises the steps of forwarding input data, calculating and updating channel importance metrics of the subtask perceived channel importance estimation module, wherein the channel importance metrics are combined with the channel quantity balance constraint module to jointly determine a channel capable of pruning; simultaneously updating network parameters of the pedestrian search model by using gradients, and updating super parameters of the pruning module to obtain a pruned pedestrian search model;
s5: finely adjusting a pedestrian search model after pruning: in the fine tuning stage, only the pedestrian search model and the variable OIM module after pruning are reserved, network parameters of the pedestrian search model after pruning are loaded, super parameters of the pedestrian search model before pruning are adjusted, and the pedestrian search model after pruning is retrained until convergence;
S6: reasoning by using the trimmed pedestrian search model: and respectively inputting the query image and the candidate image into the pedestrian search model after the fine-tuned pruning, comparing output results of the two images after the fine-tuned pruning is input into the pedestrian search model, and selecting the pedestrians with the most similar identity characteristics to those in the query image in the candidate pedestrians detected in all the candidate images, namely the reasoning results of the pedestrian search model.
2. The structured pruning method for a deep pedestrian search model of claim 1,
in step S1, the selected image dataset comprises a real image, pedestrian position and identity marking information; the image preprocessing comprises the operations of adjusting the image proportionally, overturning the image and normalizing the image; dividing the image data set according to the standard given by the selected image data set;
the pedestrian search model to be pruned is an end-to-end two-stage pedestrian search model based on a convolutional neural network, and is specifically selected as a SeqNet network model; the pedestrian search model to be pruned comprises the following three parts:
Backbone network: extracting image features from the input image data;
network transition layer: combining or processing the image features extracted through the backbone network;
network head: the pedestrian recognition method comprises two task heads of a pedestrian detection subtask and a pedestrian recognition subtask, and is used for predicting pedestrian positioning and recognizing pedestrian identity.
3. The structured pruning method for a deep pedestrian search model of claim 2,
in step S2, a random gradient descent optimizer is adopted to optimize network parameters.
4. A structured pruning method facing a deep pedestrian search model as defined in claim 3,
in step S3, the target compression scale is expressed as a ratio of floating point calculated amounts FLOPs of the pruned pedestrian search model and the original pedestrian search model; the floating point calculation amount FLPs of each convolution layer is calculated as follows: for a convolutional layer c i To input the number of channels of the feature diagram, h i And w i The height and width of the input feature map are respectively; at the same time, c k For the number of convolution kernel channels, h k And w k Respectively the height and width of the convolution kernel, n k The number of convolution kernels; c o To output the number of channels of the characteristic diagram, h o And w o The height and width of the output feature map are respectively; when the size of the image data of the input batch is N, the floating point calculated amount FLOPs generated by the convolution layer calculation is approximately expressed by the following formula:
FLOPs=N×c i ×h k ×w k ×h o ×w o ×c o
5. the structured pruning method for a deep pedestrian search model of claim 4,
in the step S4, the pruning stage is further performed by deleting the input feature map channel; specifically, for a certain convolution layer, the input feature diagram is A epsilon R N×C×H×W Wherein N represents the size of image data of an input batch, C represents the number of channels of the feature map, and H and W represent the height and width of the feature map respectively; m is E R C Mask representing convolved input channel, R c Representing a C-dimensional vector space, the value of all components of the mask m is initially set to 1, and the mask m of the jth channel of the convolutional layer is pruned j Setting to 0; after pruning, the input feature matrix is transformed intoHere, +.is shown that in the second dimension of the channel dimension point multiplication, i.e., in the input feature map, all elements of the pruned channel become 0.
6. The structured pruning method for a deep pedestrian search model of claim 5,
in step S3, the subtask perceived channel importance estimation module: the subtask perceived channel importance estimation module calculates a subtask importance value of a channel based on a loss criterion and performs weighted fusion; for the N-th training image data, n=1, 2, 3..n, in the forward propagation process of the convolutional neural network, the subtask perceived channel importance estimation module records an input feature map matrix a of each network layer n For the n training image data input in the back propagation process of the convolutional neural network, the subtask perceived channel importance estimation module records the subtask loss function relative to the input feature map matrix A of each network layer n Gradient of (2)According to the transformation formula->Calculating the gradient of the subtask penalty function with respect to the mask m>L n Searching a model for the pedestrian corresponding to a subtask loss function of the n-th training image data;
s j the change quantity of the subtask loss function of the pedestrian search model is obtained according to a Taylor formula and is approximately expressed as:
wherein L is a subtask loss function of the pedestrian search model, T represents transposed operation of a matrix,representing the gradient; e, e j ∈R C In the form of a one-hot vector, namely, the j-th component of the vector takes a value of 1, and the other component takes a value of 0; expressed in g +.>Wherein the component corresponding to each channel j +.>m j The component representing the corresponding channel j in the mask vector, i.e. the j-th component, is +.> H is the Heisen matrix of the loss function of the pedestrian search model with respect to the mask m, get +.>According to the Fisher information formula, for the (j, j) component H of the hessian matrix of channel j jj The approximation is expressed as:
wherein E represents mathematical expectation, x represents an input sample, y represents a class deduced by model reasoning, and P (y|x) represents a probability that the sample x is classified into the y class;
the subtask loss function variation of the pedestrian search model is approximately expressed as:
combining a coupling layer shared pruning mask mechanism, and using all convolution layer gradients in the coupling layer group to change the variable quantity s of the subtask loss function of the pedestrian search model j And (3) expanding to obtain:
where CG represents the set of all convolution layers within the coupled group,representing the duplication of the pruning mask of the j-th lane on the convolutional layer/.
7. The structured pruning method for a deep pedestrian search model of claim 6,
in step S3, the channel number equalization constraint module: the channel quantity balancing constraint module dynamically sets a minimum residual channel quantity threshold according to the position of the convolution layer in the convolution neural network structure and the pruning scale so as to relieve the pruning phenomenon; in the initialization stage, dividing a network layer into a main network layer and other network layers; as long as the coupling layer packet contains layers in the backbone network, all layers in the coupling layer packet are divided into backbone network parts; using the ratio of the number of remaining channels of each layer TO the number of channels of the original pedestrian search model as a channel number constraint quantity, wherein thresholds of a trunk network part and other parts are respectively represented by TB and TO and are used as super parameters of the channel number balance constraint module; when the number of the remaining channels is lower than the threshold value, pruning the corresponding convolution layer is stopped; PR E [0,1] represents the current compression scale of the pedestrian search model; for other network layers, TO dynamically decreases with PR; for a backbone network layer, in order to protect the representation capability of the network in the early pruning stage, setting a TB initialization value to be a larger value and keeping the TB initialization value unchanged, and gradually reducing the TB threshold value after all layers in the network reach the threshold value and cannot prune; the above process is specifically expressed as follows:
In step S3, the variable OIM module: the variable OIM module is used for replacing an original OIM loss function in the pedestrian search model, and the characteristic distribution of the pedestrian search model after compression is optimized by utilizing the discriminant identity characteristic in the pre-training stage so as to alleviate the characteristic degradation problem of the pedestrian search model after compression; specifically, after the pre-training is finished, storing the pedestrian search model parameter file and simultaneously storing the proxy feature v of each identity in the OIM loss function feature library of the pedestrian search model pt The method comprises the steps of carrying out a first treatment on the surface of the Loading and pre-storing the agent characteristics in a pruning stage or a fine tuning stage of the pedestrian search model;
the agent characteristic v is applied in pruning stage or fine tuning stage pt Introduced in different proportionsThe agent characteristics of the OIM loss function of the pedestrian search model are fixed, and the fixed agent characteristics are defined as v fix The method comprises the steps of carrying out a first treatment on the surface of the LB is the number of tagged identities, Q is the number of untagged identities, v lb Proxy feature representing the lb-th identity in a tagged identity set, u q Representing the q-th identity feature in the label-free queue; on the basis, for the network extracted feature f, when the proxy feature is not fixed, the probability p that the network extracted feature f is determined as the identity a a Expressed as:
while when the proxy feature is fixed, the probability p that the network extracted feature f is determined to be identity a a Expressed as:
wherein T represents matrix transposition operation, τ is a super parameter and is specified by the original pedestrian search model;
on the basis of the above, the OIM loss function of the pedestrian search model is expressed as l=e x [log p a ]And the channel importance calculation module is used for updating the network parameters of the pedestrian search model and calculating the channel importance of the importance estimation module.
8. The structured pruning method for a deep pedestrian search model of claim 7,
in step S4, the entire pruning phase comprises the following steps:
s41, forward propagating input data in the convolutional neural network three times: after the first forward transmission, storing the input feature images of all the convolution layers; after the second forward pass, calculating the loss function of the pedestrian detection subtask and carrying out gradient reverse pass, and simultaneously calculating the change quantity of the loss function of the detection subtask of each channel; after the third forward pass, calculating a loss function of the pedestrian re-identification subtask, carrying out gradient reverse pass, and calculating the change quantity of the loss function of the re-identification subtask of each channel;
s42, calculating each channel importance measure S of the subtask by using the channel importance estimation module perceived by the subtask j After pruning based on the importance measurement, updating network parameters of the module by using gradients accumulated in the two subsequent frontward passes in order to keep the pedestrian search model convergence assumption;
specifically, based on the calculation of the importance of the subtasks, an importance measure S of each channel is obtained j The steps of (a) are as follows:
firstly, respectively using the loss functions of the pedestrian detection subtask and the pedestrian re-identification subtask to carry out back transmission, and calculating the variation s of the loss functions of the pedestrian detection subtask by all channels of the pedestrian search model det And the variation s of the loss function of all channels of the pedestrian search model on the pedestrian re-identification subtask reid The method comprises the steps of carrying out a first treatment on the surface of the For channel ch, the convolution computation directly related to it is F ch =h k ×w k ×h o ×w o ×c o The method comprises the steps of carrying out a first treatment on the surface of the Correcting the subtask loss function gradient corresponding to a channel which does not act on another subtask in the subtask by using the maximum subtask loss function variable quantity under the unit calculated quantity, wherein the subtask loss function gradient is 0; detecting the variation s of the subtask loss function of the pedestrian by using the channel in the pedestrian re-identification subtask det (ch) for example, the specific formula is as follows:
wherein CH represents a set containing all channels in the network model, F CH Namely, the convolution calculated quantity related to the channel CH;
Then, normalizing the variation of the channel on the loss functions of the two subtasks to the same scale, wherein the normalization standard is the first alpha% value of each descending order in order to eliminate the influence of the numerical value long tail distribution; in particular, for the pedestrianLoss function variation s of re-identification subtask reid Scaling, and scaling the variable s' reid The formula is as follows:
on the basis, alpha and beta are super parameters of a channel importance estimation module perceived by a subtask, the subtask duty ratio is reflected by the super parameters beta, the subtask loss function variable quantity is re-weighted and fused, and the fused variable quantity is used as an importance index S of the channel to the whole pedestrian search model fuse The calculation formula is as follows:
finally, the importance index S is subjected to memory reduction caused by channel pruning fuse Normalizing to obtain importance measure S of each channel j The specific formula is as follows:
s43, pruning operation is carried out once every d groups of image data, wherein d is a preset numerical value, and only importance values are accumulated for the previous d-1 times; after the d accumulation, selecting a convolution layer which can still be pruned by using the channel quantity balancing constraint module, and selecting a channel with the lowest importance measure for pruning; and clearing the accumulated importance measurement after pruning each time, calculating floating point calculated quantity of the rest pedestrian search models, judging whether pruning is stopped, and updating the super parameters TB and TO of the channel quantity balance constraint module.
9. The structured pruning method for a deep pedestrian search model of claim 8,
in the pruning stage in the step S4, pruning operation is completed only by zeroing the channel positions corresponding to the convolutional layer masks; and for the fine tuning stage in step S5, deleting the network structure corresponding to the channel according to the mask.
10. The structured pruning method for a deep pedestrian search model of claim 9,
in the pruning stage of the step S4, the agent features are dynamically updated in the early stage of the pruning stage, and pre-training features are introduced in the later stage of the pruning stage; specifically, PR ε [ r,1 ] at the early stage of pruning]R is a preset numerical value, in order to meet the convergence assumption of the pedestrian search model as far as possible, the same agent characteristic dynamic updating strategy as the traditional OIM loss function is still adopted, and the rule is consistent with the original OIM loss function learning strategy adopted by the pedestrian search model; assuming pr=r, the proxy feature is represented by the initial feature v pt Updated to v cur Compressing the pedestrian search model scale PR epsilon [0, r]Stage, using fixed proxy feature, gradually increasing duty ratio of pre-training feature until completely updating to v pt The specific update formula is as follows:
v fix =γ·v pt +(1-γ)·v cur
wherein, gamma is the super parameter of the variable OIM module, and the super parameter is dynamically changed to control the duty ratio of the pre-training feature in the agent feature;
in the fine tuning stage of step S5, the rule of the proxy feature is opposite to that of the pruning stage, and in the initial stage of the fine tuning stage, the proxy feature is kept fixed first, and then dynamic update is recovered; specifically, v, which remains fixed at the end of pruning, is maintained at the beginning of the trimming stage pt As proxy features to maintain the discriminant of the proxy features; and in the later stage of the fine tuning stage, adopting an updating strategy of a traditional OIM loss function module to dynamically update the proxy characteristics, so that the pedestrian search model can be fully optimized after pruning.
CN202311235935.3A 2023-09-22 2023-09-22 Structured pruning method for deep pedestrian search model Pending CN117217282A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311235935.3A CN117217282A (en) 2023-09-22 2023-09-22 Structured pruning method for deep pedestrian search model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311235935.3A CN117217282A (en) 2023-09-22 2023-09-22 Structured pruning method for deep pedestrian search model

Publications (1)

Publication Number Publication Date
CN117217282A true CN117217282A (en) 2023-12-12

Family

ID=89042214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311235935.3A Pending CN117217282A (en) 2023-09-22 2023-09-22 Structured pruning method for deep pedestrian search model

Country Status (1)

Country Link
CN (1) CN117217282A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497194B (en) * 2023-12-28 2024-03-01 苏州元脑智能科技有限公司 Biological information processing method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117497194B (en) * 2023-12-28 2024-03-01 苏州元脑智能科技有限公司 Biological information processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106980858B (en) Language text detection and positioning system and language text detection and positioning method using same
CN113378632A (en) Unsupervised domain pedestrian re-identification algorithm based on pseudo label optimization
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN111062382A (en) Channel pruning method for target detection network
CN111160407B (en) Deep learning target detection method and system
CN111723915B (en) Target detection method based on deep convolutional neural network
CN112001477A (en) Deep learning-based model optimization algorithm for target detection YOLOv3
WO2021088365A1 (en) Method and apparatus for determining neural network
JP2010529529A (en) Specific subject detection device, learning device and learning method thereof
CN109472191A (en) A kind of pedestrian based on space-time context identifies again and method for tracing
CN109740734B (en) Image classification method of convolutional neural network by optimizing spatial arrangement of neurons
CN117217282A (en) Structured pruning method for deep pedestrian search model
CN114332670A (en) Video behavior recognition method and device, computer equipment and storage medium
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN113111814A (en) Regularization constraint-based semi-supervised pedestrian re-identification method and device
CN114170512A (en) Remote sensing SAR target detection method based on combination of network pruning and parameter quantification
CN112507114A (en) Multi-input LSTM-CNN text classification method and system based on word attention mechanism
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN116434002A (en) Smoke detection method, system, medium and equipment based on lightweight neural network
CN115049952A (en) Juvenile fish limb identification method based on multi-scale cascade perception deep learning network
CN114972753A (en) Lightweight semantic segmentation method and system based on context information aggregation and assisted learning
CN114529552A (en) Remote sensing image building segmentation method based on geometric contour vertex prediction
CN114973350A (en) Cross-domain facial expression recognition method irrelevant to source domain data
CN112446428B (en) Image data processing method and device
CN115393631A (en) Hyperspectral image classification method based on Bayesian layer graph convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination