CN111782857A

CN111782857A - Footprint image retrieval method based on mixed attention intensive network

Info

Publication number: CN111782857A
Application number: CN202010710865.2A
Authority: CN
Inventors: 朱明�; 赵琛; 陈春; 王年; 唐俊; 张艳; 鲍文霞; 江畅
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-16
Anticipated expiration: 2040-07-22
Also published as: CN111782857B

Abstract

The invention discloses a footprint image retrieval method based on a mixed attention-intensive network, which comprises the following steps: 1. preparing a footprint image data set; 2. establishing a footprint image preprocessing module; 3. establishing an initial feature extraction module; 4. establishing a mixed attention intensive network module; 5. establishing a final characteristic output module; 6. initializing the weight; 7. and training, testing and optimizing the network. The method can acquire more abundant feature information of the footprint image and extract feature information of different personal differences as much as possible, thereby improving the precision and speed of the footprint image retrieval.

Description

Footprint image retrieval method based on mixed attention intensive network

Technical Field

The invention relates to the field of image processing, in particular to a footprint image retrieval method based on a mixed attention-intensive network.

Background

Due to the difference of human skeletons and walking postures, the footprints of every person are unique and are more difficult to disguise compared with characteristics such as fingerprints. Therefore, the method has great scientific research value for relevant research of footprints, and can be applied to criminal reconnaissance, safety protection and the like.

With the rapid development of deep learning, the neural network is widely applied in the field of computer vision, and the footprint image retrieval is also advanced due to the introduction of the deep learning method. In the past, the retrieval of the footprint images is mostly finished by footprint experts, and the method is not only easily influenced by personal subjectivity, but also has low speed.

Disclosure of Invention

The invention provides a footprint image retrieval method based on a mixed attention-intensive network to solve the defects of the prior art, so that more abundant feature information of the footprint image can be acquired, and feature information of different personal differences can be extracted as much as possible, thereby improving the precision and speed of the footprint image retrieval.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a footprint image retrieval method based on a mixed attention-intensive network, which is characterized by comprising the following steps of:

step 1: constructing a training set and a test set;

step 1.1: acquiring a footprint image containing a plurality of footprints of a test object in a walking state;

step 1.2: denoising the footprint image to obtain a processed footprint image sample;

step 1.3: cutting the footprint image sample, and extracting the outline of the footprint image by using a canny operator to obtain a footprint sample set containing a plurality of single footprint outlines;

step 1.4: defining a tag for each individual footprint outline in the footprint sample set that can distinguish different ID information;

step 1.5: repeating the steps 1.1-1.4, so as to collect a plurality of footprint images of a plurality of test objects, and carrying out corresponding processing, thereby forming a footprint data set D;

step 1.6: dividing the footprint data set D into a training set X and a test set Y, and subdividing the test set into a test query set Y₁And testing the bottom library set Y₂(ii) a The training set X comprises A kinds of ID information, and the test query set and the test bottom library set both comprise B kinds of same ID information;

step 2: establishing a footprint image retrieval model based on a mixed attention-intensive network, wherein the footprint image retrieval model consists of a preprocessing layer, an initial feature extraction module, a mixed attention-intensive network module and a final feature output module;

step 2.1: the preprocessing layer is used for carrying out normalization processing on the training set X to obtain a footprint sample set X 'and inputting the footprint sample set X' into an initial feature extraction module, wherein X 'is { X'_s|s＝1,2,…S}，x′_sIs the S-th footprint sample in the training set X', S representing the total number of footprint samples;

step 2.2: constructing an initial feature extraction module consisting of M layers of convolutional neural networks; any mth layer of convolutional neural network sequentially comprises: the mth convolution layer, the mth active layer, the mth pooling layer;

step 2.2.1: initializing weights of all convolution layers in the initial characteristic extraction module by using a Gaussian random initialization method;

step 2.2.2: obtaining an output Z of the mth convolutional layer by using the formula (1)_m：

Z_m＝W_m·X_m+B_m(1)

In the formula (1), X_mAn input image of a portion to be convolved for the mth convolution layer; b is_mIs the step length S of the mth convolution layer_mLower bias, W_mIs the shared weight of the mth convolutional layer;

step 2.2.3: output result Z of mth convolutional layer_mObtaining output Z 'of the mth layer convolutional neural network through the mth active layer and the mth pooling layer'_mAnd Z'_m＝{z′_ms|s＝1,2,…S}，z′_msRepresents output Z'_mThe output of the s-th footprint sample;

step 2.2.4: output Z 'of the m-th layer convolutional neural network'_mAs an input of the (M + 1) th layer of convolutional neural network, after the processing of the M layer of convolutional neural network, finally outputting the initial footprint characteristic F, and inputting the initial footprint characteristic F into the mixed attention-intensive network module, wherein F ═ { F ═ F_s|s＝1,2,…S}，f_sRepresenting the initial feature of the s th footprint sample in the footprint initial features F;

step 2.3: establishing a mixed attention intensive network module consisting of K intensive blocks, wherein each intensive block is connected by a convolution layer;

any one of the dense blocks is composed of N layers of mixed attention dense networks, wherein the mixed attention dense network of the nth layer in any one of the dense blocks sequentially comprises: the nth convolution layer, the nth mixed attention layer and the nth splicing layer;

step 2.3.1: initializing weights of all convolution layers in the mixed attention intensive network module by using a Gaussian random initialization method;

step 2.3.2: the nth convolution layer in any one of the dense blocks obtains an output ZZ result by using the formula (1)_nWherein ZZ_n＝{zz_ns|s＝1,2,…S}，zz_nsIndicating said output result ZZ_nThe output of the s-th footprint sample;

according to the spatial position i and the channel position c of the convolution layer output characteristic, outputting the output result ZZ_nIs shown as ZZ_n＝{ZZ_{n i,c}|i＝1,2,…I,c＝1,2…,C}，ZZ_{n i,c}A value representing the ith spatial position and the c-th channel position in the nth convolutional layer output; i represents the number of spatial positions, C represents the number of channels;

step 2.3.3: obtaining an output result ZZ 'of an n-th mixed attention layer of an n-th mixed attention dense network in any one dense block by using formula (2)'_nAnd ZZ'_n＝{zz′_ns|s＝1,2,…S}，zz′_nsRepresents an output result ZZ'_nOutput of the s-th footprint sample:

ZZ′_{n i,c}＝(1+M_i,c)·ZZ_{n i,c}(2)

in the formula (2), M_i,cOutput result ZZ representing nth mixed attention layer_nThe weight of the ith space position and the c channel position is as follows:

step 2.3.4: obtaining the output result ZZ' of the nth layer mixed attention dense network by using the formula (4)_nAnd ZZ ″)_n＝{zz″_ns|s＝1,2,…S}，zz″_nsIndicates that the output result ZZ ″)_nOutput of the s-th footprint sample:

ZZ″_n＝concat(ZZ′₁,ZZ′₂,...,ZZ′_n) (4)

in formula (4), concat (·) represents a splicing operation;

step 2.3.5: the output result of the N-th layer mixed attention dense network in each dense block is processed by one layer of convolution layer and then input into the next dense block, so that the intermediate characteristic F ' is finally output after the processing of K dense blocks and the corresponding convolution layers thereof, and the F ' is { F '_s|s＝1,2,…S}，f′_sIntermediate features representing the s-th footprint sample in the intermediate features F';

step 2.4: establishing a connection consisting of a final convolutional layer, a final pooling layer, two full-connection layers FC₁And FC₂And a final feature output module consisting of two parallel sub-networks;

step 2.4.1: initializing weights of the convolution layer and the full-connection layer in the final characteristic output module by using a Gaussian random initialization method;

step 2.4.2: the final convolution layer and the final pooling layer sequentially perform convolution and pooling operations on the footprint characteristics F' and then sequentially pass through the full-connection layer FC₁And FC₂The final characteristic F "is then obtained, and F ═ F ″_s|s＝1,2,…S}，f″_sRepresenting the final feature of the s-th footprint sample in the footprint feature information F';

step 2.4.3: feeding said final feature F "into two parallel sub-networks;

step 2.4.3.1: the first subnetwork inputs said final feature F' into the fully connected output layer FC with the same number A of classes of all ID information in said sample set₃Processing, and finally outputting a probability set P (P ═ P) by processing the obtained result through a SoftMax function_s|s＝1,2,…S},p_sAn output probability set representing the s-th footprint sample in the probability set P, and having: p is a radical of_s＝{p_s0,p_s1,...,p_sa,...,p_sA-1In which p is_saRepresenting the probability that the s-th footprint sample belongs to the a-th ID information, from the output probability set { p_s0,p_s1,...,p_sa,...,p_sA-1Selecting a subscript corresponding to the maximum value as ID information identified by the s-th footprint sample;

step 2.4.3.2: the final feature F ″, which the second subnetwork will output_cObtaining the average value according to each ID information to obtain the central feature set F ″' of each ID information_cAnd F ″)_c＝{F″_c0,F″_c1,...,F″_ca,...,F″_cA-1Wherein, F ″)_caIn the representation of the a-th ID informationHeart characteristics;

step 2.4.4: the probability set P output by the first sub-network and the central feature set F' output by the second sub-network are compared_cReversely propagating to the footprint image retrieval model for self-adaptive updating of corresponding network parameters, thereby obtaining the footprint image retrieval model;

and step 3: ID information identifying the new sample;

step 3.1: test query set Y containing the same ID information₁And testing the bottom library set Y₂Inputting the feature set into the footprint image retrieval model, and finally outputting a feature set containing a plurality of ID information

And

step 3.2: computing feature sets extracted from a test query set

In the feature set extracted from each ID information and the test base library set respectively

Sorting according to the Euclidean distance in ascending order of the Euclidean distance among the ID information, and setting a retrieval threshold value according to a sorting result;

step 3.3: inputting any one to-be-identified footprint sample into the footprint image retrieval model and then outputting a final feature to be identified; respectively connecting the final features to be identified with the test bottom library set Y₂Feature set of output

And performing Euclidean distance calculation, and taking the ID information corresponding to the calculation result smaller than the retrieval threshold value as the ID information of the footprint sample to be identified.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention combines image processing, deep learning and footprint image retrieval to form a set of complete footprint image retrieval framework. In terms of image processing: a whole set of preprocessing method is adopted for the footprint image, so that the footprint image sample is optimized; in terms of network structure: the footprint image retrieval model consists of a preprocessing layer, an initial feature extraction module, a mixed attention intensive network module and a final feature output module.

2. The image processing part of the invention enables the footprint image sample to be cleaned by removing the background noise of the image, and the integrity of the original footprint image information is kept to the maximum extent;

3. the preprocessing layer of the invention can lead the footprint images to be uniformly put into a neural network for training by adjusting the size of the footprint images in the footprint sample set to the same size; the images are normalized, so that the training time is shortened, and the obtained model is more suitable for the actual situation.

4. The mixed attention-intensive network module can more effectively extract the characteristic information. This module is implemented by a combination of a densely connected convolutional network and a hybrid attention mechanism. Each layer in the dense connection convolution network is directly connected with the previous layer, so that the characteristic information can be repeatedly utilized; meanwhile, each layer of the network is designed to be narrower, and redundant information can be reduced. The hybrid attention mechanism takes the advantages of the space attention mechanism and the channel attention mechanism into consideration, and can extract more representative characteristic information.

Drawings

FIG. 1 is an overall flow diagram of the lap footprint image retrieval in the present invention;

FIG. 2 is a diagram of a hybrid attention-intensive network architecture in accordance with the present invention;

fig. 3 is a dense block diagram of a mixed attention dense network in the present invention.

Detailed Description

In this embodiment, a footprint image retrieval method based on a mixed attention-intensive network mainly extracts feature information of a footprint image by using the mixed attention-intensive network. The neural network can extract detailed characteristic information of the footprint images through training, and then retrieval of the footprint images is carried out, so that retrieval speed is increased, and retrieval accuracy is greatly improved.

The data set adopted by the invention comprises 3500 footmark data, 35000 single footmark image data after preprocessing, and totally comprises 100 persons, wherein each person comprises at least 35 footmark data images, and each image is provided with a person ID information label. As shown in fig. 1: the whole process can be divided into the following steps:

step 1, taking a continuous footprint image of any test object in a walking state, and carrying out preprocessing operations of denoising and normalization to obtain a processed footprint image sample.

And 2, cutting the footprint sample containing the plurality of footprint images in the step 1, and extracting the outline of the footprint image by using a canny operator to obtain a footprint sample set containing a plurality of single footprint outlines. The invention designs an algorithm, the pixel information of each column of a footprint image sample is counted, the temporary average pixel less than ten is taken as a gap in a footprint image, and when the length of continuous gaps exceeds a certain set threshold value, the column in which the centers of the continuous gaps are located is taken as the cut column. This algorithm can divide one footprint image into separate footprint image sample sets.

Step 3, defining the footprint samples in the footprint sample set in the step 2 into labels capable of distinguishing different ID information;

step 4, repeating the step 1 to the step 3, collecting a plurality of continuous footprint images of a plurality of test objects in a walking state and carrying out corresponding processing, thereby forming a footprint data set D;

step 5, dividing the data set D into three parts according to a ratio of 9:4:2, wherein the first part is a training set X; the second part is a test bottom library set Y₁(ii) a The third part is a test query library set Y₂. Wherein the first part of data and the second part of data do not existThe second part and the third part are different data of the same person for the repeated persons.

And 6, sending the footprint data set into a footprint image retrieval model based on a mixed attention-intensive network for training, and obtaining a footprint image retrieval pre-training model based on the mixed attention-intensive network through a preprocessing layer, an initial feature extraction module, a mixed attention-intensive network module and a final feature output module. The initial feature information of the footprint image is extracted through the initial feature extraction module, the initial feature information is processed through the mixed attention intensive network module, finally the features are integrated through the final feature output module, and the rich and concise final feature information is output and acquired, so that the retrieval precision and speed of the footprint image retrieval are greatly improved.

As shown in FIG. 2, the stepping footprint image retrieval model is composed of a preprocessing layer, an initial feature extraction module, a mixed attention intensive network module and a final feature output module:

step 6.1: the preprocessing layer normalizes the footprint image training set X to obtain a footprint sample set X 'and inputs the footprint sample set X' into the initial feature extraction module, wherein X 'is { X'_s|s＝1,2,…S}，x′_sIs the S-th footprint sample in sample set X', S represents the total number of footprint samples;

step 6.2: constructing an initial feature extraction module consisting of M layers of convolutional neural networks; any mth layer of convolutional neural network sequentially comprises: the mth convolution layer, the mth active layer, the mth pooling layer;

step 6.2.1: initializing weights of all convolution layers in the initial characteristic extraction module by using a Gaussian random initialization method;

step 6.2.2: obtaining an output Z of the mth convolutional layer by using the formula (1)_m：

Z_m＝W_m·X_m+B_m(1)

In the formula (1), X_mAn input image of a portion to be convolved for the mth convolution layer; b is_mIs the step length S of the mth convolution layer_mLower bias, W_mIs m atThe shared weight of the convolutional layer;

step 6.2.3: output result Z of mth convolutional layer_mObtaining output Z 'of the mth layer convolutional neural network through the mth active layer and the mth pooling layer'_mAnd Z'_m＝{z′_ms|s＝1,2,…S}，z′_msRepresents output Z'_mThe output of the s-th footprint sample;

step 6.2.4: output Z 'of the m-th layer convolutional neural network'_mAs an input of the (M + 1) th layer of convolutional neural network, after the processing of the M layer of convolutional neural network, finally outputting the initial footprint characteristic F, and inputting the initial footprint characteristic F into the mixed attention-intensive network module, wherein F ═ { F ═ F_s|s＝1,2,…S}，f_sRepresenting the initial feature of the s th footprint sample in the footprint initial features F;

step 6.3: establishing a mixed attention intensive network module consisting of K intensive blocks, wherein each intensive block is connected by a convolution layer;

step 6.3.1: initializing all convolution layers in the mixed attention intensive network module by using a Gaussian random initialization method;

step 6.3.2: the nth convolution layer in any one of the dense blocks obtains an output ZZ result by using the formula (1)_nWherein ZZ_n＝{zz_ns|s＝1,2,…S}，zz_nsIndicating the output result ZZ_nThe output of the s-th footprint sample;

according to the spatial position i and the channel position c of the convolution layer output characteristic, outputting the result ZZ_nIs shown as ZZ_n＝{ZZ_{n i,c}|i＝1,2,…I,c＝1,2…,C}，ZZ_{n i,c}A value representing the ith spatial position and the c-th channel position in the nth convolutional layer output; i represents the number of spatial positions, C represents the number of channels;

step 6.3.3: obtaining the nth layer in any one dense block by using the formula (2)Output result ZZ 'of n-th mixed attention layer of mixed attention dense network'_nAnd ZZ'_n＝{zz′_ns|s＝1,2,…S}，zz′_nsRepresents an output result ZZ'_nOutput of the s-th footprint sample:

ZZ′_{n i,c}＝(1+M_i,c)·ZZ_{n i,c}(2)

step 6.3.4: obtaining the output result ZZ' of the nth layer mixed attention dense network by using the formula (4)_nAnd ZZ ″)_n＝{zz″_ns|s＝1,2,…S}，zz′_nsRepresents an output result ZZ'_nOutput of the s-th footprint sample:

ZZ″_n＝concat(ZZ′₁,ZZ′₂,...,ZZ′_n) (4)

in formula (4), concat (·) represents a splicing operation;

step 6.3.5: the output result of the N-th layer mixed attention dense network in each dense block is processed by one layer of convolution layer and then input into the next dense block, so that the intermediate characteristic F ' is finally output after the processing of K dense blocks and the corresponding convolution layers thereof, and the F ' is { F '_s|s＝1,2,…S}，f′_sIntermediate features representing the s-th footprint sample in the intermediate features F';

step 6.4: establishing a connection consisting of a final convolutional layer, a final pooling layer, two full-connection layers FC₁And FC₂And a final feature output module consisting of two parallel sub-networks;

step 6.4.1: initializing weights of the convolution layer and the full-connection layer in the final characteristic output module by using a Gaussian random initialization method;

step 6.4.2: middle feature of final convolutional layer and final pooling layer pair footprintThe F' is subjected to convolution and pooling operations in sequence and then sequentially passes through the full connection layer FC₁And FC₂The final characteristic F "is then obtained, and F ═ F ″_s|s＝1,2,…S}，f″_sRepresenting the final feature of the s-th footprint sample in the footprint feature information F';

step 6.4.3: feeding the final feature F' into two parallel sub-networks;

step 6.4.3.1: the first subnetwork will input the final feature F' into the fully connected output layer FC with the same number A of classes of all ID information in the sample set₃Processing, and finally outputting a probability set P (P ═ P) by processing the obtained result through a SoftMax function_s|s＝1,2,…S},p_sAn output probability set representing the s-th footprint sample in the probability set P, and having: p is a radical of_s＝{p_s0,p_s1,...,p_sa,...,p_sA-1In which p is_saRepresenting the probability that the s-th footprint sample belongs to the a-th ID information, from the output probability set { p_s0,p_s1,...,p_sa,...,p_sA-1Selecting a subscript corresponding to the maximum value as ID information identified by the s-th footprint sample;

step 6.4.3.2: the final feature F ″, which the second subnetwork will output_cObtaining the average value according to each ID information to obtain the central feature set F ″' of each ID information_cAnd F ″)_c＝{F″_c0,F″_c1,...,F″_ca,...,F″_cA-1Wherein, F ″)_caA center feature indicating the a-th kind of ID information;

step 6.4.4: the probability set P output by the first sub-network and the central feature set F' output by the second sub-network are compared_cMatching with a cross entropy loss function, reversely transmitting a central loss function to a footprint image retrieval model to perform self-adaptive updating on corresponding network parameters, so that each footprint sample is divided into correct ID information as much as possible, and the Euclidean distance between final features extracted from the footprint samples divided into the same ID information in the model is made as small as possible, thereby obtaining the footprint image retrieval model;

and 7: ID information identifying the new sample;

step 7.1: test query set Y containing the same ID information₁And testing the bottom library set Y₂Inputting the obtained feature set into the footprint image retrieval model, and finally outputting a final feature set containing a plurality of ID information

And

step 7.2: computing feature sets extracted from a test query set

Sorting the Euclidean distances among the ID information in an ascending order according to the Euclidean distances, and setting a proper retrieval threshold value according to a sorting result;

step 7.3: inputting any one to-be-identified footprint sample into the footprint image retrieval model and then outputting a final feature to be identified; respectively connecting the final features to be identified with the test bottom library set Y₂Feature set of output

Claims

1. A footprint image retrieval method based on a mixed attention-intensive network is characterized by comprising the following steps:

step 1: constructing a training set and a test set;

Z_m＝W_m·X_m+B_m(1)

according to the spatial position i and the channel position c of the convolution layer output characteristic, outputting the output result ZZ_nIs shown as ZZ_n＝{ZZ_{n i,c}|i＝1,2,…I,c＝1,2…,C}，ZZ_{n i,c}Representing the ith spatial position in the output of the nth convolutional layerAnd the value of the c-th channel position; i represents the number of spatial positions, C represents the number of channels;

ZZ′_{n i,c}＝(1+M_i,c)·ZZ_{n i,c}(2)

ZZ″_n＝concat(ZZ′₁,ZZ′₂,...,ZZ′_n) (4)

in formula (4), concat (·) represents a splicing operation;

step 2.4.3: feeding said final feature F "into two parallel sub-networks;

step 2.4.3.1: the first subnetwork inputs said final feature F' into the fully connected output layer FC with the same number A of classes of all ID information in said sample set₃Processing, and finally outputting a probability set P (P ═ P) by processing the obtained result through a SoftMax function_s|s＝1,2,…S},p_sAn output probability set representing the s-th footprint sample in the probability set P, and having: p is a radical of_s＝{p_s0,p_s1,...,p_{s a},...,p_{s A-1}In which p is_{s a}Representing the probability that the s-th footprint sample belongs to the a-th ID information, from the output probability set { p_s0,p_s1,...,p_{s a},...,p_{s A-1}Selecting a subscript corresponding to the maximum value as ID information identified by the s-th footprint sample;

step 2.4.3.2: the final feature F ″, which the second subnetwork will output_cObtaining the average value according to each ID information to obtain the central feature set F ″' of each ID information_cAnd F ″)_c＝{F″_c0,F″_c1,...,F″_ca,...,F″_cA-1Wherein, F ″)_caA center feature indicating the a-th kind of ID information;

step 2.4.4: combining the probability set P output by the first sub-network with the center feature set F output by the second sub-network_c"reversely transmitting to the footprint image retrieval model for self-adaptive updating of corresponding network parameters, thereby obtaining the footprint image retrieval model;

and step 3: ID information identifying the new sample;

And

step 3.2: computing feature sets extracted from a test query set