CN112966716A

CN112966716A - Sketch-guided shoe print image retrieval method

Info

Publication number: CN112966716A
Application number: CN202110152568.5A
Authority: CN
Inventors: 王新年; 姜浩; 王琳
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-06-15
Anticipated expiration: 2041-02-03
Also published as: CN112966716B

Abstract

The invention provides a sketch guided shoe print image retrieval method, which comprises the following steps: constructing a shoe printing pattern content and semantic information collaborative model; and on-line retrieval of the shoe print image based on the cooperation of the pattern content characteristics and the semantic information. The invention introduces sketch guide in the defect area, can generate complete shoe print images which accord with the subjective feeling of human eyes no matter the random defect or the defect of a large area or even a half sole, and then uses the complete shoe print images for retrieval, thus solving the problems of insufficient information, unavailable cognition of patterns and the like, and further solving the problems of low retrieval precision and inaccurate matching of the shoe print images with large-area defect. In addition, the invention can also search the shoe print image in a complete hand-drawing sketch mode, thereby solving the search problem under the conditions that the background is complex and the on-site shoe print image can not be extracted.

Description

Sketch-guided shoe print image retrieval method

Technical Field

The invention relates to the technical field of image recognition, in particular to a sketch-guided shoe print image retrieval method.

Background

The current shoe print image retrieval algorithms are mainly divided into two categories. The first type is a shoe print image retrieval algorithm based on a conventional algorithm. The second category is the shoe print image retrieval algorithm based on deep learning. The specific contents of each method are as follows: (1) the shoe print image retrieval algorithm based on the traditional method mainly uses various image processing technologies to extract features. Such as fourier spectral features, histograms, SIFT features, etc. And then, measuring and sequencing the similarity by calculating to obtain a retrieval result of the shoe print image. If the neighborhood similarity estimation shoe print retrieval algorithm is adopted, the algorithm extracts three features of the shoe print image to be retrieved, namely the regional feature, the global feature and the Gabor feature, and then the shoe print retrieval performance is improved by utilizing the information contained in the adjacent image. (2) The shoe print image retrieval algorithm based on deep learning mainly extracts the depth features of the images by using a convolutional neural network, and then performs matching sequencing on the depth features through various similarity matching algorithms to obtain the retrieval result of the shoe print images. Such as shoe print image automatic retrieval algorithm based on neural network VGG16, the algorithm firstly performs rotation compensation on the shoe print image, eliminates the influence of rotation, and then obtains the matching score of the test image and the reference image by calculating the weighted sum of cosine similarity of the neural codes of the two areas.

The traditional shoe print image retrieval algorithm or the shoe print image retrieval algorithm based on deep learning has better retrieval effect on the defective shoe print images in small areas. But the method has no better retrieval effect on the shoe print images with defects in the large areas, mainly because the residual areas of the shoe print images are too few, the content information and the texture information are seriously insufficient, the retrieval of the defective shoe print images only depends on the designed characteristics, and the recognition of people on patterns cannot be added. Therefore, the problems of low retrieval precision, inaccurate matching and the like of the shoe print images with large area defects can occur.

Disclosure of Invention

According to the technical problem that the shoe print image with a defect large area does not have a good searching effect, the method for searching the shoe print image through sketch guide is characterized by comprising the following steps of:

step S1: constructing a shoe printing pattern content and semantic information collaborative model;

step S2: and on-line retrieval of the shoe print image based on the cooperation of the pattern content characteristics and the semantic information.

Further, the step S1 further includes the following steps:

step S11: fusing a shoe print image to generate a network through the full convolution span interpolation of the cavity;

step S12: introducing expansion convolution extraction features, and fusing a full convolution shoe print image discrimination network through a cavity;

step S13: constructing a loss function; the loss functions of the void full-convolution span interpolation fusion shoe printing image generation network comprise countermeasure loss, perception loss and content loss;

the calculation formula is as follows:

wherein the alpha, beta, gamma,

lambda is a weighting coefficient and is a coefficient,

the formula for the penalty is as follows:

is R_iI is 1,2, …, n and F_iI-1, 2, …, n, measured as R_iI is 1,2, …, n and F_i,i＝1,2…, the mean square error between n and the L1 distance, the formula is as follows:

wherein c, W and H are the number of channels, width values and height values of the image, and G represents the generated network.

Is R_iI is 1,2, …, n and F_iI-1, 2, …, n, measured as R_iR is 1,2, …, n and F_iAnd i is 1,2, …, n is the mean square error and the L1 distance between deep feature maps extracted by the VGG19, and the formula is as follows:

where φ represents the deep profile extracted through VGG 19.

The loss function of the hole fusion full-convolution shoe print image discrimination network adopts the loss function of WGAN-GP, and comprises antagonistic loss and gradient punishment. The calculation formula is as follows:

wherein x₁＝εS_i+(1-ε)S_i,i＝1,2,…,n，x₂＝εF_i+(1-ε)R_i,i＝1,2,…,n，ε～uniform[0,1]G denotes passing through the generating network, DRepresenting the lambda gradient penalty parameter across the discriminating network.

Further, the generating network of the hole full convolution span interpolation fusion shoe print image in the step S11 further includes: 7 void span convolution modules; 7 deconvolution fusion modules and 5 bilinear interpolation upsampling modules;

each cavity span convolution module comprises a cavity convolution with an expansion coefficient of 2 and a step length of 1 and a span convolution with an expansion coefficient of 1 and a step length of 2; each deconvolution fusion module comprises a deconvolution with the step length of 2 and a series connection of three feature maps with the same size; each bilinear interpolation upsampling module performs bilinear interpolation operation on the target feature map, the size of the target feature map is amplified by one time, and in order to match the number of channels, 1 × 1 convolution operation is performed on the first three bilinear interpolation upsampling modules.

The hollow hole fusion full convolution shoe print image discrimination network in the step S12 further includes: 3 void convolution feature fusion modules and 3 span convolution down-sampling modules;

each cavity convolution feature fusion module comprises a common convolution with the step size of 1 and the expansion coefficient of 1, an expansion convolution with the step size of 1 and the expansion coefficient of 2, and an expansion convolution with the step size of 1 and the expansion coefficient of 3, and feature graphs extracted by three times of convolution are spliced with feature graphs which are not subjected to convolution; each span convolution downsampling module comprises a convolution with the step size of 2 to carry out downsampling operation on the feature map.

Further, the on-line shoe print image retrieval based on the cooperation of the pattern content features and the semantic information further comprises the following steps:

step S21: drawing pattern semantic information; supplementing the incomplete part with patterns in a sketch mode for the actually shot incomplete shoe printing image to form a mixed image of real-scene patterns and sketch patterns;

if the actually shot shoe print image is not available, and only the pattern form is known, then:

a. manually drawing a sketch of the shoe print on the blank paper, and then scanning to form the sketch of the shoe print;

b. and (4) directly drawing a sketch of the shoe print on the blank image through an input device such as a stylus and a mouse.

Step S22: retrieving the sole pattern image guided by sketch;

inputting a mixed picture of real-scene patterns and sketch patterns or a sketch map into the constructed shoe-print pattern content and semantic information collaborative model to generate a virtual sole pattern image;

and calculating the similarity score of the virtual sole pattern image and each shoe print image in the sole pattern image data set by adopting the conventional sole pattern image retrieval algorithm, sequencing the sole pattern images in the sole pattern image data set according to the similarity score by a preset rule, and outputting according to the sequencing mode.

Compared with the prior art, the invention has the following advantages:

the invention introduces sketch guide in the defect area, can generate complete shoe print images which accord with the subjective feeling of human eyes no matter the random defect or the defect of a large area or even a half sole, and then uses the complete shoe print images for retrieval, thus solving the problems of insufficient information, unavailable cognition of patterns and the like, and further solving the problems of low retrieval precision and inaccurate matching of the shoe print images with large-area defect. In addition, the invention can also search the shoe print image in a complete hand-drawing sketch mode, thereby solving the search problem under the conditions that the background is complex and the on-site shoe print image can not be extracted.

(1) According to the method, sketch guide is added to missing parts (including small-range missing, random missing and large-area missing) of shoe print images according to complete sole pattern textures, the sketch guide image is used for generating complete virtual shoe print images which are in line with the visual perception of human eyes, and then the complete shoe print images are used for shoe print image retrieval, so that the problems that information is insufficient, the cognition of patterns cannot be added and the like can be solved, and the problems that the retrieval accuracy of the residual and large-area missing shoe print images is low and the matching is inaccurate are solved.

(2) The problem of the complicated scene shoe print image of leading to the fact the complicated shoe print retrieval under the condition can't be extracted and the pattern semantic information introduction problem in the shoe print retrieval process is solved, help criminal investigation personnel to utilize the on-the-spot information of case issue as much as possible to carry out case detection, improve the efficiency of solving a case.

(3) The generation of the countermeasure network is improved on the basis of algorithm, a void full convolution span interpolation fusion shoe print image generation network and a void fusion full convolution shoe print image discrimination network are provided, the quality of generated images is improved on the basis of saving a large amount of calculation overhead, the loss function of the network is improved, and the difference between a shoe print image and a real complete shoe print image can be generated by the network in a pixel angle and a characteristic angle in a minimum mode.

For the above reasons, the present invention can be applied to popularization in the criminal investigation field.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is an overall flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of the expansion of the present invention; wherein (a) is prior to expansion; (b) after expansion.

FIG. 3 is a schematic diagram of a void full convolution span interpolation fusion shoe print image generation network according to the present invention.

FIG. 4 is a schematic diagram of a void fusion full-convolution shoe print image discrimination network according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in the figure, the invention provides a sketch guided shoe print image retrieval method, which is characterized by comprising the following steps of:

step S1: and constructing a shoe printing pattern content and semantic information collaborative model.

As a preferred embodiment, in the present application, the process of constructing the collaborative model of the shoe print pattern content and the semantic information further comprises the following steps:

step S11: and fusing the shoe print image to generate a network through the full convolution span interpolation of the cavity. The hollow full convolution span interpolation fusion shoe print image generation network in the step S11 further includes: 7 void span convolution modules; 7 deconvolution fusion modules and 5 bilinear interpolation upsampling modules;

Introducing double-line interpolation in a generating network to finish up-sampling and use for feature fusion; performing feature fusion on the feature map which is not subjected to down-sampling, the feature map which is subjected to up-sampling after interpolation and the feature map which is subjected to up-sampling by deconvolution, combining the feature map after down-sampling, performing bilinear interpolation on the feature map after the down-sampling next time and the multi-scale information of the feature map which is subjected to up-sampling after the down-sampling to the bottom layer and then subjected to deconvolution amplification, and enhancing the fitting force of the network;

step S12: and introducing expansion convolution to extract features, and fusing full-convolution shoe print image discrimination networks through the holes. The hollow hole fusion full convolution shoe print image discrimination network in the step S12 further includes: 3 void convolution feature fusion modules and 3 span convolution down-sampling modules;

In the discrimination network, convolution with stride of 2 is used to replace maximum pooling to realize downsampling, but before each downsampling, convolution with expansion coefficient of 1, expansion coefficient of 2 and expansion coefficient of 3 is used to extract features, and the three feature maps and the non-convolution feature map are subjected to feature fusion, so that features with different scales are extracted on the basis of saving calculation cost, discrimination capability of the discrimination network is improved, and the generation of the network can be better guided to generate images with higher quality.

Further, the feature diagram is directly judged to be true or false. The feature map with the size of 32 multiplied by 16 multiplied by 1 is finally output in the judgment network, the feature map is directly judged without mapping a full connection layer into a vector, the speed is high, the calculation cost is low, one point of the feature map is equivalent to a small area in a picture, and the picture is equivalent to the regional judgment.

the calculation formula is as follows:

wherein the alpha, beta, gamma,

lambda is a weighting coefficient and is a coefficient,

the formula for the penalty is as follows:

is R_iI is 1,2, …, n and F_iI-1, 2, …, n, measured as R_iI is 1,2, …, n and F_iI is 1,2, …, the mean square error between n and the L1 distance, the formula is as follows:

where φ represents the deep profile extracted through VGG 19.

Meanwhile, the loss function of the hole fusion full convolution shoe print image discrimination network adopts the loss function of WGAN-GP, including the countermeasure loss and the gradient punishment. The calculation formula is as follows:

wherein x₁＝εS_i+(1-ε)S_i,i＝1,2,…,n，x₂＝εF_i+(1-ε)R_i,i＝1,2,…,n，ε～uniform[0,1]G represents the lambda gradient penalty parameter passing through the generation network, and D represents the lambda gradient penalty parameter passing through the discrimination network.

The data augmentation and training comprises the following steps: as shown in fig. 1, the image expansion and enhancement includes the following steps:

taking the collected complete shoe print image as a set A to form a target image set.

Recording a set formed by incomplete shoe print images corresponding to each complete shoe print in the set A as a set B, namely an incomplete image set, and randomly erasing pattern contents on the images in the set A to form corresponding incomplete images if the images in the set A have no corresponding incomplete images;

and thirdly, drawing the missing part of the incomplete shoe print image in the set B by input equipment such as a handwriting pen, a mouse and the like to form a mixed image of the real-scene pattern and the sketch pattern. The set of the mixed images of the real-scene patterns and the sketch patterns formed by all the images in the set B in this way is marked as C, namely the set of the mixed images of the real-scene patterns and the sketch patterns;

manually drawing a paper sketch of each image in the A on blank paper, scanning to form a sketch of the shoe print, and recording a set of the shoe print sketch formed in the way as a digital set of the paper sketch;

directly drawing a sketch of the shoe print on each image in the A through input equipment such as a stylus and a mouse, and recording a set of the shoe print sketch formed in this way as a digital sketch set;

sixthly, horizontally turning over the left foot and right foot shoe print images in the set C, D, E, and marking the left foot shoe print image obtained after turning over the right foot shoe print image as a left foot shoe print image amplification image; in the same way, the left and right shoe print images are complemented and enhanced, and C, D, E and the set of its augmented images are denoted as S, S ═ S { (S)_i1,2, …, n }; take out from set A and S ═ S_iThe shoe print image corresponding to | i ═ 1,2, …, n } is taken as the target shoe print image set and is marked as R, R ═ R_i|i＝1,2,…,n}；

2. Will S_iAnd i is 1,2, …, n is input into a generating network, and the generating network generates a complete virtual shoe print image which is marked as F_iI ═ 1,2, …, n; r is a handle_iI is 1,2, …, n and F_iInputting i to 1,2, …, and n to VGG19 to obtain a deep feature map; will S_iI is 1,2, …, n and F_iI-1, 2, …, n are input into the discrimination network together with the image pair, and S is input_iI is 1,2, …, n and R_iI is 1,2, …, n is input into the discrimination network together with the image pair;

3. when S is_iI is 1,2, …, n and F_iWhen the i is 1,2, …, n constitutes an image pair and is input into the discrimination network together, the discrimination network wants to discriminate the result as false, and the generation network wants to discriminate the result as true; when S is_iI is 1,2, …, n and R_iI is 1,2, …, nWhen the image pairs are input into the discrimination network together, the discrimination network hopes to discriminate the result as true; the generation network and the discrimination network continuously resist and are continuously promoted in the training process until the discrimination network cannot judge whether the input is true or false, namely a Nash equilibrium state is reached, and the generation effect of the generation network reaches the best;

4. and training and storing the cavity full-convolution span interpolation fused shoe print image generation network and the cavity full-convolution shoe print image discrimination network.

Step S2: and on-line retrieval of the shoe print image based on the cooperation of the pattern content characteristics and the semantic information. Further, the on-line shoe print image retrieval based on the cooperation of the pattern content features and the semantic information further comprises the following steps:

Step S22: retrieving the sole pattern image guided by sketch;

Example 1

The flow of the dilation convolution is shown in FIG. 2, where a full large black square represents a 3X 3 volumeAnd the product kernel consists of 9 black small squares, and each small square represents a weight. The calculation formula of the size of the convolution kernel after expansion is f ═ d (f)₀-1)+1，f₀Since d represents the original convolution kernel size and the expansion coefficient (scaling rate), when d is 2, f is 5, as shown by the 5 × 5 convolution kernel in the figure, but the number of weights is not changed, and the remaining small white squares represent 0, the expansion convolution can increase the field of the convolution kernel without introducing extra parameters.

Fig. 3 is a network for generating a hole full convolution span interpolation fusion shoe print image, in which a solid line frame portion is a hole span convolution module, a dashed line frame portion is a deconvolution fusion module, a dot-dashed line frame portion is a bilinear interpolation upsampling module, a solid connecting line represents convolution, S represents a span (Stride), d represents an expansion coefficient (scaling rate), Conv3 represents that a convolution kernel size is 3 × 3, Conv1 represents that a convolution kernel size is 1 × 1, and numbers following Conv3 represent the number of convolution kernels. For example, in the figure, conv3,512, S is 1, d is 2, and it indicates that the convolution kernel size of the convolution layer is 3 × 3, the number of convolution kernels is 512, the span is 1, and the expansion coefficient is 2. The dotted lines represent the concatenation (Concat), the dotted lines represent the deconvolution, and ConvT3 represents the deconvolution with a convolution kernel size of 3 × 3. The thick connecting lines represent bilinear interpolation.

FIG. 4 is a void fusion full convolution shoe print image discrimination network, wherein the solid line frame portion is a void convolution feature fusion module and the dashed line frame portion is a span convolution down-sampling module. The solid connecting line indicates convolution, S indicates a span (Stride), d indicates a dilation rate (dilation rate), Conv3 indicates a convolution kernel size of 3 × 3, and the number following Conv3 indicates the number of convolution kernels. For example, in the figure, conv3,256, S is 1, d is 2, and it indicates that the convolution kernel size of the convolution layer is 3 × 3, the number of convolution kernels is 256, the span is 1, and the expansion coefficient is 2. The point connecting lines represent splicing (Concat), the final output of the network is a feature diagram of 32 multiplied by 16 multiplied by 1, a full connecting layer in a traditional discriminator is removed, and the feature diagram is directly judged to be true or false.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A sketch guided shoe print image retrieval method is characterized by comprising the following steps:

s1: constructing a shoe printing pattern content and semantic information collaborative model;

s2: and on-line retrieval of the shoe print image based on the cooperation of the pattern content characteristics and the semantic information.

2. The sketch-guided shoe print image retrieval method according to claim 1, wherein said step S1 further comprises the steps of:

s11: fusing a shoe print image to generate a network through the full convolution span interpolation of the cavity;

s12: introducing expansion convolution extraction features, and fusing a full convolution shoe print image discrimination network through a cavity;

s13: constructing a loss function; the loss functions of the void full-convolution span interpolation fusion shoe printing image generation network comprise countermeasure loss, perception loss and content loss;

the calculation formula is as follows:

wherein the alpha, beta, gamma,

lambda is a weighting coefficient and is a coefficient,

the formula for the penalty is as follows:

wherein phi represents a deep level feature map extracted by VGG 19;

the loss function of the hole fusion full-convolution shoe print image discrimination network adopts the loss function of WGAN-GP, including countermeasure loss and gradient punishment; the calculation formula is as follows:

3. The sketch-guided shoe print image retrieval method according to claim 1, wherein said step S11 of generating the hollow full-convolution span interpolation-fused shoe print image network further comprises: 7 void span convolution modules; 7 deconvolution fusion modules and 5 bilinear interpolation upsampling modules;

each cavity span convolution module comprises a cavity convolution with an expansion coefficient of 2 and a step length of 1 and a span convolution with an expansion coefficient of 1 and a step length of 2; each deconvolution fusion module comprises a deconvolution with the step length of 2 and a series connection of three feature maps with the same size; each bilinear interpolation upsampling module performs bilinear interpolation operation on the target characteristic diagram, the size of the target characteristic diagram is amplified by one time, and in order to match the number of channels, 1 multiplied by 1 convolution operation is performed on the first three bilinear interpolation upsampling modules;

4. The sketch-guided shoe print image retrieval method according to claim 1, wherein the on-line shoe print image retrieval based on the cooperation of the pattern content features and the semantic information further comprises the following steps:

s21: drawing pattern semantic information; supplementing the incomplete part with patterns in a sketch mode for the actually shot incomplete shoe printing image to form a mixed image of real-scene patterns and sketch patterns;

S22: retrieving the sole pattern image guided by sketch;