CN110570443A

CN110570443A - Image linear target extraction method based on structural constraint condition generation model

Info

Publication number: CN110570443A
Application number: CN201910753540.XA
Authority: CN
Inventors: 熊盛武; 林泽华; 李梦; 路雄博; 刁月月
Original assignee: Wuhan Water Elephant Electronic Technology Co Ltd; Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2019-12-13
Anticipated expiration: 2039-08-15
Also published as: CN110570443B

Abstract

The invention provides an image linear target extraction method based on a structural constraint condition generation model, which comprises the following steps: s1, constructing a training and testing data set, and acquiring a required original image and a linear target image corresponding to the original image to obtain a data set; s2, designing a condition generation model network structure, updating the condition generation model through a back propagation algorithm, and selecting an optimal image linear target extraction model by using a test set image; and S3, obtaining a linear target image corresponding to the given image by using the trained image linear target extraction model. The method trains the condition generation model with stronger image structure information capturing capability by combining the image structure information difference loss function, and has the advantages of strong applicability, high quality of the extracted linear target image, strong expandability and the like.

Description

Image linear target extraction method based on structural constraint condition generation model

Technical Field

the invention relates to the field of computer vision, in particular to an image linear target extraction method based on a structural constraint condition generation model.

Background

The image linear object extraction means: the content with the linear structure required by the user is extracted from the image with the complex background. Image linear object extraction methods have many important applications, such as extracting road and river position information from aerial or remote sensing images, extracting blood vessel and bone information from medical images, extracting vein structure information from plant leaf images, extracting contour information from human face images, and the like. Meanwhile, in application scenes such as target identification, fingerprint identification, retina pathology identification, plant variety identification and the like, image linear target extraction is usually used as an image preprocessing process and has important influence on the performance of related shape and texture feature extraction and subsequent identification tasks.

In the field of computer vision, how to extract linear objects in images has been a popular research topic. The traditional linear target extraction is mainly based on image gray information, and the detection of an edge or a linear object is realized by adopting the technologies of a filtering algorithm (such as a Sobel operator, a Canny operator and related improved operators), Hough transformation, IPM and the like. However, such a method is processed at a low-level pixel feature level of an image without involving semantic information, and linear objects which are not desired by a user in a complex background are also easily extracted, so that such a technique is only applicable to images with simple backgrounds.

In recent years, image processing techniques based on deep learning have been widely developed, in which a condition generation model, such as a condition variation self-encoder, a condition generation countermeasure network, and the like, realizes conversion between two types of images with greatly different semantic contents from the perspective of cross-modal image generation. The linear target extraction can also be regarded as a cross-mode image generation task to a certain extent, and compared with the traditional method, the method based on the condition generation model has stronger robustness and universality and better extraction effect under a complex background. However, because the information contained in the linear object in the image has strong structure, the linear structure information is difficult to be described by the common condition generation model, and the linear object with perfect structure is still difficult to be extracted.

Disclosure of Invention

The invention aims to solve the technical problem of providing an image linear object extraction method based on a structural constraint condition generation model aiming at the defects of the traditional image linear object extraction technology based on a condition generation model, so that the image linear object extraction effect is improved, and technical support is provided for related applications.

The key of the technical scheme adopted by the invention for solving the technical problems is as follows: by combining with the image structure information difference loss function, the conditional generation model with stronger image structure information capturing capability is trained.

wherein the structural information difference loss function term is defined as: after extracting the high-level features of the result image and the target image through a pre-trained VGG19 neural network model, calculating the mean square error between the features.

the condition generation model is based on the optimal transmission theory, comprises two deep convolution neural network submodels and is marked as g_ζandThe former is used for calculating the optimal transmission mapping to realize the extraction of the image linear target, and the latter is used for fitting the Wasserstein distance between the Kantorovich potential calculation result image and the target image.

the technical scheme of the invention specifically comprises the following steps:

s1, constructing a training and testing data set, and acquiring a required original image x_iAnd corresponding linear object imageObtain a data set, recordM is the number of images, and randomly divided into a training set and a test set according to a certain proportion, and respectively recorded asAnd

S2, designing a network structure of a structural constraint condition generation model, inputting the training set image into the structural constraint condition generation model for training, and selecting an optimal image linear target extraction model by using the test set image;

The structural constraint condition generation model comprises two deep convolutional neural network submodels which are marked as g_ζAndthe former is used for calculating the optimal transmission mapping to realize the extraction of the image linear target, and the latter is used for fitting the Wasserstein distance between the Kantorovich potential calculation result image and the target image: generator g_ζBased on a U-Net network, the system comprises n convolution modules with the structure of 'two-dimensional convolution Conv 2D-normalized BatchNorm-activation function' and n deconvolution modules with the structure of 'deconvolution Deconv 2D-normalized BatchNorm-activation function';

Distinguishing deviceBased on a discriminator network of a DCGAN network model, the system comprises 1 input convolution layer, 5 convolution modules with the structure of 'Conv 2D-BatchNorm-LeaklyReLU activation function' and 1 final output convolution layer;

And S3, obtaining a linear target image corresponding to the given image by using the trained image linear target extraction model.

further, the specific implementation manner of step S1 is as follows,

s11, according to the specific application scene, manually collecting the corresponding original image xi, and then manually marking the corresponding linear target image

S12, zooming the images of the data set to a uniform and proper resolution ratio, and reducing data dimensionality as much as possible under the condition of ensuring that the linear target structure information in the original image and the linear target image is not damaged, thereby facilitating the processing of subsequent models;

S13, binarizing the linear target image to obtain a binary image of a black background and a white line, wherein the binarization operation adopts a threshold value method, if the pixel value is greater than a certain threshold value delta, the pixel value is set to be 1 to obtain a white pixel point, and if not, the pixel value is set to be 0 to obtain a black pixel point;

S14, the processed data setRandomly dividing the training set into a training set and a testing set according to a ratio of about 5:1, and recording the training set and the testing set as the training set and the testing set respectivelyAnd

further, the generator g of step S2_ζIn the method, except that the last deconvolution module activation function adopts a Sigmoid function, other modules all adopt Tanh functions, and meanwhile, the output of the ith convolution module is simultaneously used as the input of the (n-i) th deconvolution module; distinguishing deviceThe middle input convolutional layer adopts an LeaklyReLU activation function, and the output convolutional layer does not use the activation function.

Further, the step S2 inputs the structural constraint condition generating model as the original image x_iIt is desired to output a linear object image corresponding to the input original imagethe actual output of the generator in the model is taken as the prediction output, and the training target of the generator is as follows: minimizing Wasserstein distance between model predicted output and expected outputPixel level differenceAnd structural information differencesThe general expression is: Wherein alpha is a Wasserstein distance term weight coefficient, beta is a structural information difference term weight coefficient and is used for adjusting each importance;

the specific steps of model training are as follows:

s21, from training set datam groups of original images and corresponding linear target images are sampled at randomm is the number of images in the training batch;

s22, inputting the original image into the generator network to obtain the predicted linear target imagethen output according to the predictionAnd expected outputComputing model loss function

S23, calculatingderivative of generator network parametersthen updating generator parameters through a gradient descent algorithm;

S24, repeating S21 and S23 until the generator network parameters converge.

Further, the pixel level differencetaking a binary cross entropy function, wherein the expression is as follows:

In the formula, x represents an image predicted and output by the model, y represents a linear target image expected to be output by the model, i represents an image pixel position index, the value range of i is 1-w × h, and w and h are the image length and width pixel numbers respectively.

Further, the structure information differenceis defined as: after extracting the high-level features of the result image and the target image through a pre-trained VGG19 neural network model, calculating the mean square error between the features, wherein the expression is,

Wherein x represents an image predicted and output by the model, y represents a linear target image expected to be output by the model, N is the number of high-level features extracted by the VGG19 model involved in calculation, and M is the number of high-level features extracted by the model_nThe number of channels of the feature map output by the nth layer,for the mth channel of the profile output by the nth layer,To representAndThe euclidean distance between them.

further, the Wasserstein distancethe method is realized by optimizing a discriminator, and the optimization process of the discriminator comprises the following steps:

S221, solving the following linear programming problem by using a simplex method, wherein the optimal solution is the Wasserstein distance, and obtaining the optimal value H of each dual variable^*：

Whereinis the value of Wasserstein distance, m is the number of images involved in the calculation, H_jAnd H_iTo assist in the computation of the dual variable of the Wasserstein distance, where i, j are indices, describing the generated image and the given image respectively,As a defined distance function;

S222, passing discriminatorFitting the Wasserstein distance, and in the regression problem, the number of solving iterations is recorded as N_r(ii) a Aiming at the regression problem, the optimal solution H of the linear programming dual variable obtained in the previous step^*Taking the predicted value as the predicted output of the model, the target loss function is expressed as follows:

where m is the number of images involved in the calculation,AndAs calculated in the step S221, it is,a network of finger discriminators is referred to,Andrespectively, a given vein image and a generated vein image.

further, in the condition generating model, the updating algorithms of the generator and the discriminator both use Adam optimization algorithm, the learning rate is set to 0.001, and the parameter beta in the Adam optimization algorithm is (0.5, 0.999).

compared with the prior art, the technical scheme of the invention has the following technical effects:

(1) The applicability is strong: aiming at the problem of image linear target extraction, the invention adopts a condition generation model to realize image linear target extraction on the semantic level from the perspective of cross-mode image generation, and compared with the technology based on gray information, the invention has better extraction effect in the image with complex background.

(2) The generated image has high quality: aiming at the problem that the prior art is difficult to depict and capture image linear structure information, the invention creatively combines the structure information difference loss function to train the condition generation model with stronger image structure information capture capability, so that the extracted linear target image structure information is more complete.

(3) the expandability is strong: the image linear target extraction method based on the structural constraint condition generation model has high principle universality, selects proper training data and network structures according to actual requirements, and can be applied to different types of linear target extraction tasks.

Drawings

FIG. 1 is a flowchart of an image linear target extraction method based on a structural constraint condition generation model according to the present invention;

FIG. 2 is a visualization result of soybean leaf image data in an embodiment of the present invention;

FIG. 3 is a visualization result of vein images corresponding to soybean leaf data in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a structural constraint condition generation model for soybean vein extraction according to an embodiment of the present invention.

Detailed Description

the invention will be further described with reference to the drawings and examples for illustrating the objects, aspects, advantages and realizability of the invention in detail. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting. In addition, the technical features mentioned in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The technical terms involved in the present invention are explained and illustrated below:

DCGAN network: a neural network for image generation comprises a generator and a discriminator, wherein the generator is used for realizing mapping from low-dimensional Gaussian distribution sampling points to a high-dimensional target image, and the discriminator is used for comparing the true degree of the generated image; the generator comprises a plurality of deconvolution modules consisting of deconvolution layers, normalization layers and activation function layers, and the characteristic size of the amplified output image is gradually realized; the discriminator comprises a plurality of convolution modules consisting of convolution layers, normalization layers and activation function layers, gradually extracts the characteristics of the input image, and calculates the similarity between the generated image and the real image.

U-Net network: a neural network for image processing comprises a coding-decoding structure connected by layer skipping, the network has n convolution modules with structures of a convolution layer, a normalization layer and an activation function layer, and n deconvolution modules with structures of an deconvolution layer, a normalization layer and an activation function layer, and the output of the ith convolution module is simultaneously used as the input of the (n-i) th deconvolution module.

Wasserstein distance: a measurement mode for describing the distance between two data distributions is characterized in that the Wasserstein distance is smoother relative to Jansen-Shannon divergence, KL divergence and the like, and the problem of gradient disappearance and the like is not easy to occur when the Wasserstein distance is used for measuring image data by a neural network model.

In this embodiment, linear vein extraction in a soybean leaf image is taken as a scene, and a detailed description is given to the image linear object extraction method based on the structural constraint condition generation model provided by the invention.

As shown in fig. 1, when the image linear object extraction method based on the structural constraint condition generation model is applied to a soybean vein extraction task, the method comprises the following detailed steps:

S1, constructing a training and testing data set, and acquiring a needed soybean leaf image x_iAnd corresponding vein imageafter the data set is obtained, the data set is divided into a training set and a test set, as shown in fig. 2, and the specific steps are as follows:

S11, manually collecting corresponding soybean leaf mapImage x_iThen, the corresponding vein image is marked manually by a drawing board and other tools

S12, zooming the images of the data set to a uniform and proper resolution ratio, and reducing data dimensionality as much as possible under the condition that the leaf structure information in the soybean leaf image and the leaf vein image is not damaged, so that the subsequent model processing is facilitated;

s13, binarizing the vein image to obtain a binary image of a black background and a white line, wherein the binarization operation adopts a threshold value method, if the pixel value is greater than a certain threshold value delta, the threshold value is set to be 1 to obtain a white pixel point, otherwise, the threshold value is set to be 0 to obtain a black pixel point;

s2, designing a condition generation model network structure, as shown in FIG. 3, inputting the training set images into a structure constraint condition generation model, calculating a total loss function combining structure information difference, and updating the condition generation model through a back propagation algorithm; and selecting an optimal soybean vein image extraction model by using the test set image (namely calculating the difference between the model generation image and the test set image, namely Wasserstein distance + structure loss, and selecting the model with the minimum difference).

the condition generation model is based on the optimal transmission theory, comprises two deep convolution neural network submodels and is marked as g_ζAndthe former is used for calculating the optimal transmission mapray-enabled image line object extraction, which was used to fit the warsperstein distance between the Kantorovich potential calculation result image and the object image.

Wherein, the generator g_ζbased on a U-Net network, the coding-decoding structure comprises skip layer connection. The network has 7 convolution modules with the structure of 'two-dimensional convolution Conv 2D-normalized BatchNorm-activation function' and 7 deconvolution modules with the structure of 'deconvolution Deconv 2D-normalized BatchNorm-activation function', except that the activation function of the last deconvolution module adopts Sigmoid function, the other modules all adopt Tanh function, and meanwhile, the output of the ith convolution module is simultaneously used as the input of the 7 th-i deconvolution module.

distinguishing deviceBased on a discriminator network of a DCGAN network model, the system comprises 1 input convolutional layer, 5 convolution modules with the structure of 'Conv 2D-BatchNorm-LeaklyReLU activation function', and 1 final output convolutional layer, wherein the input convolutional layer adopts the LeaklyReLU activation function, and the output convolutional layer does not use the activation function.

The input of the condition generation model is a soybean leaf image x_ivein image corresponding to soybean leaf image expected to be output as inputthe actual output of the model generator is taken as the prediction output. The model generator training targets are: minimizing Wasserstein distance between model predicted output and expected outputPixel level differenceand structural information differencesThe general expression is:Wherein alpha is the weight coefficient of the Wasserstein distance term, beta is the weight coefficient of the structural information difference term, and is used for adjusting each importance. The specific steps of model training are as follows:

s21, from training set datamiddle random sampling m groups of soybean leaf images and corresponding vein imagesm is the number of images in the training batch, and 32 are taken;

s22, inputting the soybean leaf image into a generator network to obtain a predicted vein imageThen output according to the predictionAnd expected outputComputing model loss function

Wherein, the pixel level differenceTaking a binary cross entropy function, wherein the expression is as follows:

In the formula, x represents an image predicted and output by the model, y represents a vein image expected to be output by the model, i represents an image pixel position index, the value range of i is 1-w × h, and w and h are the image length and width pixel numbers respectively.

the structural information differenceIs defined as: after extracting the high-level features of the result image and the target image through a pre-trained VGG19 neural network model, calculating the mean square error between the features. Structural information differencesThe expression is as follows:

wherein x represents the image predicted and output by the model, y represents the vein image expected to be output by the model, N is the number of high-level features extracted by the VGG19 model participating in calculation, and M is the number of high-level features extracted by the VGG19 model_nthe number of channels of the feature map output by the nth layer,for the mth channel of the profile output by the nth layer,To representandThe euclidean distance between them. The high-level features extracted by the VGG19 can contain more topological structure information, and thereforeIs more sensitive to vein structure information of image line shape.

wasserstein distanceNeed to pass throughThe method is realized by optimizing a discriminator, and the optimization process of the discriminator comprises the following steps:

WhereinIs the value of Wasserstein distance, m is the number of images involved in the calculation, H_jAnd H_iTo assist in the computation of the dual variables of the Wasserstein distance, where i, j are indices, respectively describing the generated vein image and the given vein image,Is a defined distance function.

S222, passing discriminatorfitting the Wasserstein distance, and in the regression problem, the number of solving iterations is recorded as N_rUsually 5 is taken; aiming at the regression problem, the optimal solution H of the linear programming dual variable obtained in the previous step^*taking the predicted value as the predicted output of the model, the target loss function is expressed as follows:

S24, repeating S21 and S23 until the generator network parameters converge.

In the condition generation model optimization process, Adam optimization algorithm is adopted for updating algorithms of the generator and the discriminator, the learning rate is set to be 0.001, and the parameter beta in the Adam optimization algorithm is (0.5, 0.999);

s3, obtaining a soybean vein image corresponding to the given soybean leaf image by using the trained soybean vein image extraction model, and the specific steps are as follows:

And inputting a given soybean leaf image into the trained generator network, wherein the output of the generator is the extracted vein image.

It will be appreciated by those skilled in the art that modifications and variations may be made to the present invention and its principles and in light of the above teachings or by applying the present invention to similar image line object extraction tasks, and all such modifications and variations are within the scope of the present invention as defined in the following claims.

Claims

1. An image linear target extraction method based on a structural constraint condition generation model is characterized by comprising the following steps: training a condition generation model with stronger image structure information capturing capacity by combining an image structure information difference loss function, and extracting an image linear target; the method comprises the following steps of,

2. the method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 1, wherein: the specific implementation of step S1 is as follows,

S11, manually collecting corresponding original images x according to specific application scenes_ithen, the corresponding linear target image is marked manually

s14, the processed data setrandomly dividing the training set into a training set and a testing set according to a ratio of about 5:1, and recording the training sets and the testing sets as training sets and testing sets respectivelyAnd

3. The method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 1, wherein: generator g of step S2_ζIn the method, except that the last deconvolution module activation function adopts a Sigmoid function, other modules all adopt Tanh functions, and meanwhile, the output of the ith convolution module is simultaneously used as the input of the (n-i) th deconvolution module; distinguishing deviceThe middle input convolutional layer adopts an LeaklyReLU activation function, and the output convolutional layer does not use the activation function.

4. the method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 1, wherein: the step S2 inputs the structural constraint condition generating model as an original image x_iIt is desired to output a linear object image corresponding to the input original imageThe actual output of the generator in the model is taken as the prediction output, and the training target of the generator is as follows: minimizing Wasserstein distance between model predicted output and expected outputpixel level differenceand structural information differencesThe general expression is:wherein alpha is a Wasserstein distance term weight coefficient, beta is a structural information difference term weight coefficient and is used for adjusting each importance;

The specific steps of model training are as follows:

S24, repeating S21 and S23 until the generator network parameters converge.

5. A method as in claim 4 based on structural constraintsA method for extracting an image linear object of a generative model, comprising: the pixel level differencetaking a binary cross entropy function, wherein the expression is as follows:

6. The method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 4, wherein: the structural information differenceIs defined as: after extracting the high-level features of the result image and the target image through a pre-trained VGG19 neural network model, calculating the mean square error between the features, wherein the expression is,

7. The method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 4, wherein: the Wasserstein distancethe method is realized by optimizing a discriminator, and the optimization process of the discriminator comprises the following steps:

WhereinIs the value of Wasserstein distance, m is the number of images involved in the calculation, H_jAnd H_iTo assist in the computation of the dual variable of the Wasserstein distance,as a defined distance function;

S222, passing discriminatorfitting the Wasserstein distance, and in the regression problem, the number of solving iterations is recorded as N_r(ii) a Aiming at the regression problem, the above steps are obtainedLinear programming dual variable optimal solution H^*Taking the predicted value as the predicted output of the model, the target loss function is expressed as follows:

8. The method for extracting the image linear object based on the structural constraint condition generating model as claimed in claim 4, wherein: in the condition generation model, the updating algorithms of the generator and the discriminator both adopt an Adam optimization algorithm, the learning rate is set to be 0.001, and the parameter beta in the Adam optimization algorithm is (0.5, 0.999).