CN114529737A - Optical red footprint image contour extraction method based on GAN network - Google Patents

Optical red footprint image contour extraction method based on GAN network Download PDF

Info

Publication number
CN114529737A
CN114529737A CN202210158517.8A CN202210158517A CN114529737A CN 114529737 A CN114529737 A CN 114529737A CN 202210158517 A CN202210158517 A CN 202210158517A CN 114529737 A CN114529737 A CN 114529737A
Authority
CN
China
Prior art keywords
network
generator
image
sample
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210158517.8A
Other languages
Chinese (zh)
Inventor
唐俊
蒋文龙
朱明�
王年
张艳
鲍文霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202210158517.8A priority Critical patent/CN114529737A/en
Publication of CN114529737A publication Critical patent/CN114529737A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an optical red footprint image contour extraction method based on a GAN network, which comprises the following steps: collecting an original optical red footprint image through an optical footprint collector; making a training set and a testing set; constructing a generator through a residual error network; constructing a discriminator through a PatchGAN Markov discriminator; forming a cycleGAN cycle by the generator and the discriminator to generate an antagonistic network as a training network; sending the training set into a training network for training; and (3) using the trained generator as a contour extraction test network, inputting source domain data of the test set, and obtaining the optical red footprint image contour. The method can directly perform a set of operation of unified flow on the simply preprocessed original optical red footprint image to extract the outline image, thereby simplifying the flow of outline extraction; the invention can ignore the difference between different original images, uses the network with the same parameter to process the data of the same batch, and reduces the calculation amount of contour extraction.

Description

Optical red footprint image contour extraction method based on GAN network
Technical Field
The invention relates to the technical field of image processing and contour extraction, in particular to an optical red footprint image contour extraction method based on a GAN network.
Background
The research on footprints is not limited to the criminal investigation field, but also relates to various aspects in our lives. For example, by researching the difference of foot prints of human bodies during different sports, the shoe making technology is improved, and sports shoes with high technology content suitable for different sports are developed, so that the bodies of people are better protected during sports. In the medical field, medical experts can explore the relation between footprints and certain diseases by researching the influence of the diseases on the footprints generated during walking of patients, predict the disease types and stages of the patients, improve the diagnosis speed and the like.
The morphological characteristics of the optical red footprint image outline are one of the very important characteristics in the red footprint research. In the current research process, the method of extracting the contour from the red footprint image usually adopts a mode of first median filtering and then binarization, and the mode is generally called as a traditional method. The median filtering processes the gray value of each pixel point of the image into the median of the gray values of all the pixel points in a neighborhood window of the pixel point, thereby eliminating isolated noise points and erasing texture information in the optical red footprint. And (4) setting the gray value of a pixel point on the image to be 0 or 255 by binarization, and displaying the whole image to have an obvious black-white effect so as to reduce the data volume in the image and highlight the outline information of the red footprint in the image. However, the traditional method has high quality requirements on the original optical red footprint image, and for the situation that a large area of block noise exists in the original image or a foot print arch region is broken, a traditional method cannot be used for extracting a high-reliability outline image, however, the quality of the footprint image acquired at a crime scene is often not high.
Disclosure of Invention
The invention aims to provide a GAN network-based optical red footprint image contour extraction method which can process block noise in an original image, fill the defect of a footprint in the original image and more flexibly generate a contour image with higher reliability.
In order to achieve the purpose, the invention adopts the following technical scheme: an optical red footprint image contour extraction method based on a GAN network comprises the following steps in sequence:
(1) collecting an original optical red footprint image through an optical footprint collector;
(2) making a training set and a testing set;
(3) constructing a generator through a residual error network;
(4) constructing a discriminator through a PatchGAN Markov discriminator;
(5) the generator and the discriminator form a CycleGAN cycle generation countermeasure network, and the CycleGAN cycle generation countermeasure network is used as a training network;
(6) sending the training set into a training network for training;
(7) and (3) using the trained generator as a contour extraction test network, inputting source domain data of the test set, and obtaining the optical red footprint image contour.
The step (2) specifically comprises the following steps:
(2a) unifying all the original optical red footprint images into a right foot image, and erasing a scale in the image;
(2b) randomly extracting images from the original optical red footprint image as source domain data of a training set and a testing set, and then randomly extracting the images to be used as target domain data of the training set after processing.
The step (3) specifically comprises the following steps:
(3a) the generator is a residual error network structure with 9 residual error blocks, c7s1_ k represents a down-sampling layer with k filters, convolution kernel size of 7 × 7 and step size of 1; a down-sampling layer containing k filters, convolution kernel size 3 x 3 and step size 2 is represented by dk; let rk denote a residual block comprising two convolutional layers with k filters and a convolutional kernel size of 3 x 3; denote an upsampled layer with k filters, convolution kernel size 3 x 3, step size 1/2 by uk; the structure of the generator is represented as: c7s1_64, d128, d256, r256, r256, r256, r256, r256, r256, r256, r256, r256, u128, u64, c7s1_ 3;
adding a self-attention module after two convolution layers of the last residual block of the generator, and recalibrating the obtained characteristics, wherein the formula is as follows:
Figure BDA0003513233270000031
Figure BDA0003513233270000032
wherein: e (x)i) Is the ith row, g (x) in the feature map e (x) in space ej) Represents the j-th line, s in the feature map g (x) in space gi,jRepresenting the ith row and jth column element, β, in a two-dimensional matrix si,jIs the element of the ith row and the jth column of the two-dimensional matrix beta, i represents the row number, N represents the maximum value range of i, h (x)i) Is line i in the feature map h (x) in space h; formula (1) is an attention mask calculation formula, and formula (2) is a feature recalibration calculation formula;
(3b) a spectral normalization operation is performed after each convolutional layer in the generator:
fw(x)=Wx=U∑Vx (3)
wherein, U represents an orthogonal matrix of m × m, Σ is a diagonal matrix of m × n, the number on the diagonal is the singular value of the weight matrix W, V is an orthogonal matrix of n × n, and x is a variable.
The step (5) specifically comprises the following steps:
by aiAnd biRepresenting one sample in the source domain and the target domain, respectively, the CycleGAN cycle generation countermeasure network is represented as follows:
ai→G_A(ai)→G_B(G_A(ai))
bi→G_B(bi)→G_A(G_B(bi))
one sample a in the source domain aiTransition into the target domain via generator G _ A, labeled G _ A (a)i) Then G _ A (a)i) And transits to the source domain through the generator G _ B, labeled G _ B (G _ A (a))i));
One sample B in the target field BiTransition into the source domain via generator G _ B, labeled G _ B (B)i) Then G _ B (B)i) And transited to the target domain through the generator G _ A, marked as G _ A (G _ B (B))i));
G_A(ai) And G _ B (B)i) Are all generating samples, G _ B (G _ A (a)i) G _ A (G _ B (B))i) All are cyclically generated samples;
(5a) the Wasserstein distance is used to replace the traditional countermeasure loss based on the cross entropy, and the calculation formula of the Wasserstein distance is as follows:
Figure BDA0003513233270000041
in the formula II (P)r,Pg) Representing a true distribution PrAnd generating a distribution PgA set of all joint distributions combined, γ represents any one of the set of joint distributions, variable x represents one true sample in γ, variable y represents one generated sample in γ, E(x,y)~γ[||x-y||]Representing the expectation of gamma versus the distance x and y, the lower boundary of the expectation in the set of union distributions is calculated
Figure BDA0003513233270000042
To add the Wasserstein distance to the training process of the CycleGAN cycle to generate the countermeasure network, equation (4) is transformed into equation (5):
Figure BDA0003513233270000043
wherein | f | non-conducting phosphorL≤KRepresenting that the function f is continuous in LepuShz on K, namely the absolute value of the derivative function of f is less than K, K is more than or equal to 0, and sup represents that the upper boundary of the function f meeting the condition is taken;
in the training process of generating the countermeasure network by the CycleGAN cycle, defining a function f (x) as an objective function containing a weight w to solve, namely converting the formula (5) into a formula (6):
Figure BDA0003513233270000044
optimizing an objective function fw(x) I.e. find fw(x) The optimal weight w value of (a) is to find the optimal solution of w, and the function f (x) is defined as an objective function f containing the weight ww(x) Due to fw(x) Wx, W is a weight matrix, W is a weight; i fw||L≤KRepresenting function fwIs continued in Li PuShtz at K, i.e. fwThe absolute value of the derivative function of (A) is less than K; e denotes expectation, PrRepresents the true distribution, x-PrIndicating that sample x is from a true distribution, PgRepresents a generation distribution, x to PgIndicating that the sample x is from the generated distribution,
Figure BDA0003513233270000045
representing samples x from the true distribution PrTime fw(x) The expectation of the value of,
Figure BDA0003513233270000046
representing the sample x from the generation profile PgTime fw(x) A desire for a value;
according to the formula (6), the target function of the arbiter in the reconstructed cycleGAN cycle generation countermeasure network is formula (7) In the assurance function fw(x) With the premise of lipschitz continuity, the optimization arbiter is equivalent to optimizing the true distributions and generating the Wasserstein distances between the distributions at this time:
Figure BDA0003513233270000047
d (x) is the objective function of the arbiter, which is equal to the true distribution and the Wasserstein distance between the resulting distributions;
after the Wasserstein distance is introduced, the countermeasure loss of the generator is shown in a formula (8), and the countermeasure loss of the discriminator is shown in a formula (9):
Figure BDA0003513233270000051
Figure BDA0003513233270000052
wherein G represents a generator, D represents a discriminator, N represents the maximum value range of i, and aiRepresents one source domain sample of N pairs of samples, biRepresenting a target domain sample;
(5b) introducing a cyclic consistent loss to define a one-to-one correspondence relationship between the original image and the contour image, wherein the cyclic consistent loss is shown as the following formula:
Figure BDA0003513233270000053
wherein, | G _ B (G _ A (a))i))-ai||1To obtain G _ B (G _ A (a))i) A and aiManhattan distance of; a isiRepresenting a source domain sample, through the generator G _ A and G _ B, a new image G _ B (G _ A (a)) in the source domain is obtainedi));biRepresents one target domain sample, G _ A (G _ B (B))i) Represents the new image of bi through the generator twice.
The step (6) specifically comprises the following steps:
(6a) setting network hyper-parameters, respectively taking out a sample from a source domain and a target domain of a training set, and generating a corresponding generation sample and a cyclic generation sample for the antagonistic network through the cyclic GAN cycle;
(6b) calculating the countermeasure loss lossD _ a, lossD _ B of the discriminator;
(6c) the opposing losses lossG _ a, lossG _ B and the cyclic consistent loss lossC of the generator are calculated, i.e. the total loss of the generator is as follows:
lossGtotal=lossG_A+lossG_B+lossC
(6d) and selecting a RMSProp optimizer, setting the parameters of the optimizer, training and saving a cycleGAN cycle to generate an antagonistic network.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, the method can directly perform a set of unified flow operation on the simply preprocessed original optical red footprint image to extract the outline image, thereby simplifying the flow of outline extraction; secondly, the method can ignore the difference between different original images, process the data of the same batch by using the network with the same parameters, and reduce the calculated amount of contour extraction; thirdly, the invention uses a more flexible GAN network algorithm, can weaken the corresponding relation between pixel values of pixel points at the same position in the original image and the contour image, neglects flaws in the original image and processes the situations which can not be processed by some traditional methods; fourthly, the invention adds a self-attention mechanism in the CycleGAN structure, so that the whole network can learn complete structural information according to the relation between the structures in the image, the learning direction is led, and the contour extraction effect is improved; fifthly, the spectrum normalization operation is added in the CycleGAN structure, and the Wasserstein distance is used for replacing the traditional countermeasure loss based on the cross entropy, so that the problem that the gradient of the GAN network disappears in the training process can be effectively avoided, and the training of the GAN network is more stable.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a schematic diagram of a fourth generation optical footprint harvester;
FIG. 3 is an original image;
FIG. 4 is a pre-processed image;
FIG. 5 is a profile image;
FIG. 6 is a schematic diagram of a self-attention mechanism;
FIG. 7 is a schematic representation of the structure of cycleGAN;
FIG. 8 is a schematic of the cyclic uniform loss.
Detailed Description
As shown in fig. 1, a GAN network-based optical red footprint image contour extraction method includes the following sequential steps:
(1) collecting an original optical red footprint image through an optical footprint collector;
(2) making a training set and a testing set;
(3) constructing a generator through a residual error network;
(4) constructing a discriminator through a PatchGAN Markov discriminator;
(5) the generator and the discriminator form a CycleGAN cycle generation countermeasure network, and the CycleGAN cycle generation countermeasure network is used as a training network;
(6) sending the training set into a training network for training;
(7) and (3) using the trained generator as a contour extraction test network, inputting source domain data of the test set, and obtaining the optical red footprint image contour.
The step (2) specifically comprises the following steps:
(2a) unifying all the original optical red footprint images into a right foot image, and erasing a scale in the image;
(2b) randomly extracting images from the original optical red footprint image as source domain data of a training set and a testing set, and then randomly extracting the images to be used as target domain data of the training set after processing.
The step (3) specifically comprises the following steps:
(3a) the generator is a residual error network structure with 9 residual error blocks, and a down-sampling layer containing k filters, the convolution kernel size is 7 x 7 and the step size is 1 is represented by c7s1_ k; a down-sampling layer containing k filters, convolution kernel size 3 x 3 and step size 2 is represented by dk; let rk denote a residual block comprising two convolution layers with k filters and a convolution kernel size of 3 x 3; denote an upsampled layer with k filters, convolution kernel size 3 x 3, step size 1/2 by uk; the structure of the generator is represented as: c7s1_64, d128, d256, r256, r256, r256, r256, r256, r256, r256, r256, r256, u128, u64, c7s1_ 3;
adding a self-attention module after two convolution layers of the last residual block of the generator, and recalibrating the obtained characteristics, wherein the formula is as follows:
Figure BDA0003513233270000071
Figure BDA0003513233270000072
wherein: e (x)i) Is the ith row, g (x) in the feature map e (x) in space ej) Represents the j (th) line, s in the feature map g (x) in space gi,jRepresenting the ith row and jth column element, beta, in a two-dimensional matrix si,jIs the element of the ith row and the jth column of the two-dimensional matrix beta, i represents the row number, N represents the maximum value range of i, h (x)i) Is line i in the feature map h (x) in space h; formula (1) is an attention mask calculation formula, and formula (2) is a feature recalibration calculation formula;
(3b) a spectral normalization operation is performed after each convolutional layer in the generator:
fw(x)=Wx=U∑Vx (3)
wherein, U represents an orthogonal matrix of m × m, Σ is a diagonal matrix of m × n, the number on the diagonal is the singular value of the weight matrix W, V is an orthogonal matrix of n × n, and x is a variable.
The step (5) specifically comprises the following steps:
by aiAnd biRepresenting one sample in the source domain and the target domain, respectively, the CycleGAN cycle generation countermeasure network is represented as follows:
ai→G_A(ai)→G_B(G_A(ai))
bi→G_B(bi)→G_A(G_B(bi))
one sample a in the source domain aiTransition into the target domain via generator G _ A, labeled G _ A (a)i) Then G _ A (a)i) And transits to the source domain through the generator G _ B, labeled G _ B (G _ A (a))i));
One sample B in the target field BiTransition into the source domain via generator G _ B, labeled G _ B (B)i) Then G _ B (B)i) And transited to the target domain through the generator G _ A, marked as G _ A (G _ B (B))i));
G_A(ai) And G _ B (B)i) Are all generating samples, G _ B (G _ A (a)i) G _ A (G _ B (B))i) All are cyclically generated samples;
(5a) the Wasserstein distance is used to replace the traditional countermeasure loss based on the cross entropy, and the calculation formula of the Wasserstein distance is as follows:
Figure BDA0003513233270000081
in the formula II (P)r,Pg) Representing true distributions PrAnd generating a distribution PgA set of all joint distributions combined, γ represents any one of the set of joint distributions, variable x represents one true sample in γ, variable y represents one generated sample in γ, E(x,y)~γ[||x-y||]Representing the expectation of gamma versus the distance x and y, the lower boundary of the expectation in the set of union distributions is calculated
Figure BDA0003513233270000082
To add the Wasserstein distance to the training process of the CycleGAN cycle to generate the countermeasure network, equation (4) is transformed into equation (5):
Figure BDA0003513233270000083
wherein | f | non-conducting phosphorL≤KThe expression function f is continuous in Li Puchiz on K, namely the absolute value of the derivative function of f is smaller than K, K is larger than or equal to 0, and sup represents that the function f meeting the condition is subjected to upper boundary;
in the training process of generating the countermeasure network by the CycleGAN cycle, defining a function f (x) as an objective function containing a weight w to solve, namely converting the formula (5) into a formula (6):
Figure BDA0003513233270000091
optimizing an objective function fw(x) I.e. find fw(x) The optimal weight w value of (a) is to find the optimal solution of w, and the function f (x) is defined as an objective function f containing the weight ww(x) Due to fw(x) Wx, W is a weight matrix, W is a weight; | fw||L≤KRepresenting function fwIs continued in Li PuShtz at K, i.e. fwThe absolute value of the derivative function of (A) is less than K; e denotes expectation, PrRepresents the true distribution, x-PrIndicating that sample x is from a true distribution, PgRepresents a generation distribution, x to PgIndicating that the sample x is from the generated distribution,
Figure BDA0003513233270000092
representing samples x from the true distribution PrTime fw(x) The expectation of the value of,
Figure BDA0003513233270000093
representing the sample x from the generation profile PgTime fw(x) A desire for a value;
according to the formula (6), the objective function of the discriminator in the reconstructed cycleGAN cycle generation countermeasure network is the formula (7), and the function f is ensuredw(x) With a continuous pre-plum hoxizUnder the condition, the optimization discriminator is equivalent to optimizing the real distribution and generating Wasserstein distance between the distributions:
Figure BDA0003513233270000094
d (x) is the objective function of the arbiter, which is equal to the true distribution and the Wasserstein distance between the resulting distributions;
after the Wasserstein distance is introduced, the countermeasure loss of the generator is shown in a formula (8), and the countermeasure loss of the discriminator is shown in a formula (9):
Figure BDA0003513233270000095
Figure BDA0003513233270000096
wherein G represents a generator, D represents a discriminator, N represents the maximum value range of i, and aiRepresents one source domain sample of N pairs of samples, biRepresenting a target domain sample;
(5b) introducing a cyclic consistent loss to define a one-to-one correspondence relationship between the original image and the contour image, wherein the cyclic consistent loss is shown as the following formula:
Figure BDA0003513233270000097
wherein, | G _ B (G _ A (a))i))-ai||1To obtain G _ B (G _ A (a))i) A and aiManhattan distance of; a isiRepresenting a source domain sample, through a generator G _ A and then a generator G _ B, a new image G _ B (G _ A (a)) in the source domain is obtainedi));biRepresents one target domain sample, G _ A (G _ B (B))i) Represents the new image obtained by bi through the generator twice.
The step (6) specifically comprises the following steps:
(6a) setting network hyper-parameters, respectively taking out a sample from a source domain and a target domain of a training set, and generating a corresponding generation sample and a cyclic generation sample for the antagonistic network through the cyclic GAN cycle;
(6b) calculating the countermeasure loss lossD _ a, lossD _ B of the discriminator;
(6c) the opposing losses lossG _ a, lossG _ B and the cyclic consistent loss lossC of the generator are calculated, i.e. the total loss of the generator is as follows:
lossGtotal=lossG_A+lossG_B+lossC
(6d) and selecting a RMSProp optimizer, setting the parameters of the optimizer, training and saving a cycleGAN cycle to generate an antagonistic network.
Example one
Step 1: collecting an optical red footprint image through an optical footprint collector;
the fourth generation optical footprint acquisition instrument (FMC500IV) developed by hangzhou innovation and electronics technology development limited company is used for acquiring optical bare images, as shown in fig. 2, the effective acquisition area of the device is the black part of the front of the instrument, power interfaces and USB interfaces distributed on the left side and the right side of the instrument are connected according to the specification, and corresponding drivers and computer software are installed on a computer for acquisition. When the footprint is in contact with the acquisition surface, the triple prism below the acquisition surface of the equipment can be totally reflected, and the captured image is transmitted to a computer software interface and stored locally in a computer. The device can collect a left foot or right foot optical red footprint image at a time, and stores the collected image data according to a uniform naming specification, wherein the resolution of the stored image data is 325 x 670.
And 2, step: and preprocessing the acquired optical red footprint image and making a data set. In step 1, 300 optical red footprint images with different IDs are collected, each ID comprises 10 left foot images and 10 right foot images, and 6000 images in total, and the original images are shown in fig. 3.
Due to the arrangement of the acquisition instrument, there is a ruler for marking the length in the original optical red footprint image, on the thumb side of the footprint, as shown in fig. 3. All left foot images are converted into right foot images by detecting the direction of the scale in the images, then partial pixel values of the left side of the images with the size of 35 pixel points are all set to be 255 so as to erase the scale, and finally the images are converted into black and white colors, so that the processed image data is shown in fig. 4.
450 different images were randomly selected from the group, and divided into 3 groups of 150 images. The first set is defined as the source domain data of the training set, the second set is defined as the source domain data of the test set, and the third set is manually processed into the target domain data (i.e., the optical red footprint outline image, as shown in fig. 5) using photoshos software, defined as the target domain data of the training set.
And step 3: the generator is defined, the basic structure of the generator used by the invention is a ResNet structure with 9 residual blocks, and the ResNet structure is added with a residual unit through a short circuit mechanism, so that the degradation problem of a deep network can be well solved. The basic structure of the generator of the present invention can be expressed as: c7s1_64, d128, d256, r256, r256, r256, r256, r256, r256, r256, r256, r256, u128, u64, c7s1_ 3.
In order to enable the network to pay attention to the correlation between the red footprint structures and improve the quality of the generated outline, a self-attention module is added after two Convolition layers of the last residual block of a generator, and the obtained features are recalibrated.
The self-attention module can focus on the relevance between points in the feature map, namely, the influence of other points on the current point is calculated, so that the structural features of the image are obtained, and the description capacity of the network on the image details is improved. A detailed block diagram of the self-attention module is shown in fig. 6. In order to reduce the computation amount of the network, firstly, a feature diagram x (with dimension H W C) is respectively convolved by 3 pieces of 1W 1 to obtain 3 feature spaces e, g and H, then the feature diagram of the space e, g and H is expanded to HW C, the expanded feature diagram row represents all channel information at a certain position before expansion, the expanded feature diagram row represents all position information at a certain channel before expansion, the feature diagram in the space e is multiplied by the feature diagram in the space g after being transformed, and the attention mask is obtained after being normalized by a softmax layer, as shown in formula (1), the formulaMiddle betai,jThe degree of influence of the generation of the jth position on the ith position in the graph is represented. And finally, multiplying the feature map of the space H by an attention mask as shown in a formula (2), so as to increase the response of the key area, reduce the response of other areas, and then recovering the size to obtain an output feature map y (dimension H W C). The self-attention module can enable the whole network to learn complete structural information according to the connection between the structures in the image, namely, local information learning to global information learning, so that blind learning of the network is avoided, the learning direction is led, and the network performance is improved.
In order to solve the problems that the countermeasure network generated by the cycleGAN cycle is unstable during training and gradient loss is easy to occur, Wasserstein distance is introduced subsequently to replace the traditional countermeasure loss based on cross entropy. Therefore, the invention performs a spectrum normalization operation after each contribution layer in the generator (before the self-attention module when the self-attention module is involved) to provide the condition that the Wasserstein distance holds, i.e., the function satisfies 1-Lipschitz continuity, and 1-Lipschitz continuity means that the maximum gradient of any point of the function is 1.
The spectral normalization operation may be to convert fw(x) Is limited within a certain range, thereby ensuring the function fw(x) Satisfying 1-Lipschitz continuity. As shown in equation (3), for the function fw(x) Weight matrix W ofm*nA Singular Value (SVD) decomposition is performed. After the weight matrix W is decomposed, U represents an orthogonal matrix of m x m, Σ is a diagonal matrix of m x n, the number on the diagonal is the singular value of the weight matrix W, and V is an orthogonal matrix of n x n. Through SVD, a complex matrix can be decomposed into 3 simple matrices to be multiplied, and the 3 matrices respectively represent 3 matrix transformations of W, wherein U and V represent rotation transformations, and Σ represents stretching transformations. The mode length of the vector is changed only by the stretching process, so the function fw(x) 1-Lipschitz in succession only correlates with sigma. Defining the maximum singular value obtained after SVD decomposition as the spectral coefficient of the matrix, dividing the weight matrix W by the spectral coefficient after each convolution operation in the training process, and ensuring that the stretching matrix sigma after SVD decomposition is less than 1, namely ensuring that the function fw(x) Satisfying 1-Lipschitz continuity. The above process is the spectrum normalization operation on the weight matrix W.
And 4, step 4: the discriminator is defined, the basic structure of the discriminator used by the invention is PatchGAN, this structure is used to judge whether the patch of a certain size covered by the image is from the original image, compared with the global discriminator, the discriminator of such patch level has less parameters, and can process the image of any size in a complete convolution mode. If a Convolume-LeakyReLU layer containing k filters is represented by cxsi _ k, the Convolution kernel size is x, and the step size is i; cksi denotes a Convolume-instanceNorm-LeakyReLU layer with k filters, Convolution kernel size 4 x 4, step size i, where the slope of LeakyReLU is 0.2. The discriminator basic structure of the invention can be expressed as follows: c4s2_64, c128s2, c256s2, c512s1, c4s1_ 1. The last layer deletes the LeakyReLU layer.
For the same reason as in step 3.1 and step 3.2, a spectral normalization operation is performed after each contribution layer in the discriminator, and a self-attention module (after spectral normalization operation) is added after the contribution layer in c512s 1.
And 5: a training network is built, the training network model of the invention refers to a CycleGAN structure, as shown in FIG. 7, the structure can well complete image translation work under the condition of lacking of paired data sets. Wherein A represents a training set source domain, namely an optical red footprint image data set, and B represents a training set target domain, namely a contour data set; g _ A and G _ B are the generator structures defined in step 3, G _ A is used for generating the image in the target domain from the image in the source domain, G _ B is used for generating the image in the source domain from the image in the target domain; d _ A and D _ B are the discriminator structure defined in step 4, D _ A is used for judging whether the image is a true image in the target domain, D _ B is used for judging whether the image is an image in the source domain; g _ A and D _ A form a forward GAN network, and G _ B and D _ B form a reverse GAN network. The forward GAN network and the antagonistic loss can ensure that a sufficiently realistic target domain image is generated; and adding a reverse GAN network and circulating consistency loss to limit the strict one-to-one correspondence relationship between the target domain image obtained by translating the source domain image and the source domain image in the image translation process.
Because the conventional GAN network has certain defects of the countermeasure loss based on the cross entropy, and the problems of gradient disappearance and mode collapse can occur in the training process, the Wassertein distance is used for replacing the conventional countermeasure loss based on the cross entropy, so that the stable training is ensured, and the structure of a loss function is simplified.
Compared with the countermeasure loss based on the cross entropy, the Wasserstein distance does not have the mode collapse phenomenon because no intersection exists between the distributions. When the two distributions do not have intersection, the resistance loss based on the cross entropy is abrupt and cannot provide gradient information, and the Wasserstein distance is smooth and can provide gradient information. To add the Wasserstein distance to the GAN training process, equation (4) is first transformed into equation (5), where | | f | | computationallyL≤KThe expression function f is continuous on K by Lipschitz, namely the absolute value of the derivative function f does not exceed K (K is more than or equal to 0), and sup represents that the upper boundary of the function f meeting the condition is taken.
In the training process of generating the countermeasure network by the CycleGAN cycle, the function f (x) is defined as an objective function containing the weight w to be solved, namely, the formula (5) is converted into the formula (6). The function f must be guaranteed due to the establishment of equation (6)w(x) Is 1-Lipschitz continuous in K, so it is necessary to use the spectral normalization operation limit f in the training processw(x) W is within a certain range, when fw(x) The derivative function for x is limited to a certain range.
According to the formula (6), the target function of the discriminator in the countermeasure network generated by the cycleGAN cycle can be reconstructed to be the formula (7), and the function f is ensuredw(x) With the premise of Lipschitz continuity, the optimization arbiter is equivalent to optimizing the true distributions and generating Wasserstein distances between the distributions at this time. The problem of disappearance of the gradient does not occur.
After the Wasserstein distance is introduced, the countermeasure loss of the generator is shown in a formula (8), and the countermeasure loss of the discriminator is shown in a formula (9). Wherein G represents a generator and D representsDiscriminator, N represents the number of samples in a round of training, aiRepresenting one source domain sample of N samples, biRepresenting a target domain sample.
The CycleGAN structure considers that the learned function should have cyclic consistency, as shown in fig. 8, each image a in the data field a should be able to return a to the original point of translation in cyclic translation, so as to ensure that there is a one-to-one relationship between the image translation result and the original image, and vice versa:
ai→G_A(ai)→G_B(G_A(ai))≈ai
bi→G_B(bi)→G_A(G_B(bi))≈bi
if this introduces a cyclic consistent loss, as shown in equation (6).
Step 6: and (3) sending the data set manufactured in the step 2.2 into a built training network for training.
Setting the network training epoch number to be 200, the batch size to be 1, the source domain and source domain samples to be respectively represented by A and a, and the target domain and target domain samples to be represented by B and B, namely, the process of translating one sample twice is as follows:
real_a→fake_b→rec_a
real_b→fake_a→rec_b
respectively taking a sample real _ a and a sample real _ B from a source domain and a target domain, generating a generated sample fake _ B in the target domain by the real _ a through a generator G _ A, and generating a generated sample rec _ a in the source domain by the fake _ B through a generator G _ B; the generation process of real _ b → fake _ a → rec _ b, and so on.
Calculating the loss of the discriminator D _ B as lossD _ B from real _ a and fake _ a according to the formula (9); the loss of discriminator D _ a is calculated as lossD _ a from real _ b and fake _ b.
Calculating the loss of the generator G _ A as lossG _ A by fake _ b according to a formula (8); the loss of generator G _ B is calculated from fake _ a as lossG _ B. According to equation (10), the cycle consistent loss is calculated from real _ a, rec _ a, real _ b, rec _ b as lossC. I.e. the total loss of the generator is:
lossGtotal=lossG_A+lossG_B+lossC
an initial learning rate of 0.002 was set, a fixed learning rate was used in the first 100 epochs, and the learning rate was linearly attenuated to 0 in the last 100 epochs. Using the RMSProp optimizer, the discriminator D _ A parameters are optimized according to lossD _ A, the discriminator D _ B parameters are optimized according to lossD _ B, and the discriminator G parameters are optimized according to lossGtotalAnd optimizing the parameters of the generators G _ A and G _ B, and storing the trained network.
And 7: and building a contour extraction test network by using the trained generator G _ A, inputting source domain data of a test data set, outputting the obtained target domain data, and testing the network.
In conclusion, the method can directly perform a set of operation with a unified flow on the simply preprocessed original optical red footprint image to extract the outline image, thereby simplifying the flow of outline extraction; the invention can ignore the difference between different original images, and uses the network with the same parameter to process the data of the same batch, thereby reducing the calculation amount of contour extraction; the invention uses a more flexible GAN network algorithm, can weaken the corresponding relation between pixel values of pixels at the same position in the original image and the contour image, neglects the flaw in the original image and processes the situations which can not be processed by some traditional methods.

Claims (5)

1. An optical red footprint image contour extraction method based on a GAN network is characterized in that: the method comprises the following steps in sequence:
(1) collecting an original optical red footprint image through an optical footprint collector;
(2) making a training set and a testing set;
(3) constructing a generator through a residual error network;
(4) constructing a discriminator through a PatchGAN Markov discriminator;
(5) the generator and the discriminator form a CycleGAN cycle generation countermeasure network, and the CycleGAN cycle generation countermeasure network is used as a training network;
(6) sending the training set into a training network for training;
(7) and (3) using the trained generator as a contour extraction test network, inputting source domain data of the test set, and obtaining the optical red footprint image contour.
2. The GAN network-based optical red footprint image contour extraction method of claim 1, wherein: the step (2) specifically comprises the following steps:
(2a) unifying all the original optical red footprint images into a right foot image, and erasing a scale in the image;
(2b) randomly extracting images from the original optical red footprint image as source domain data of a training set and a testing set, and then randomly extracting the images to be used as target domain data of the training set after processing.
3. The GAN network-based optical red footprint image contour extraction method of claim 1, wherein: the step (3) specifically comprises the following steps:
(3a) the generator is a residual error network structure with 9 residual error blocks, and a down-sampling layer containing k filters, the convolution kernel size is 7 x 7 and the step size is 1 is represented by c7s1_ k; a down-sampling layer containing k filters, convolution kernel size 3 x 3 and step size 2 is represented by dk; let rk denote a residual block comprising two convolutional layers with k filters and a convolutional kernel size of 3 x 3; denote an upsampled layer with k filters, convolution kernel size 3 x 3, step size 1/2 by uk; the structure of the generator is represented as: c7s1_64, d128, d256, r256, r256, r256, r256, r256, r256, r256, r256, r256, u128, u64, c7s1_ 3;
adding a self-attention module after two convolution layers of the last residual block of the generator, and recalibrating the obtained characteristics, wherein the formula is as follows:
Figure FDA0003513233260000021
Figure FDA0003513233260000022
wherein: e (x)i) Is the ith row, g (x) in the feature map e (x) in space ej) Represents the j-th line, s in the feature map g (x) in space gi,jRepresenting the ith row and jth column element, β, in a two-dimensional matrix si,jIs the element of the ith row and the jth column of the two-dimensional matrix beta, i represents the row number, N represents the maximum value range of i, h (x)i) Is line i in the feature map h (x) in space h; formula (1) is an attention mask calculation formula, and formula (2) is a feature recalibration calculation formula;
(3b) a spectral normalization operation is performed after each convolutional layer in the generator:
fw(x)=Wx=U∑Vx (3)
wherein, U represents an orthogonal matrix of m × m, Σ is a diagonal matrix of m × n, the number on the diagonal is the singular value of the weight matrix W, V is an orthogonal matrix of n × n, and x is a variable.
4. The GAN network-based optical red footprint image contour extraction method of claim 1, wherein: the step (5) specifically comprises the following steps:
by aiAnd biRepresenting one sample in the source domain and the target domain, respectively, the CycleGAN cycle generation countermeasure network is represented as follows:
ai→G_A(ai)→G_B(G_A(ai))
bi→G_B(bi)→G_A(G_B(bi))
one sample a in the source domain aiTransition into the target domain via generator G _ A, labeled G _ A (a)i) Then G _ A (a)i) And transits to the source domain through the generator G _ B, labeled G _ B (G _ A (a))i));
One sample B in the target field BiTransition into the source domain via generator G _ B, labeled G _ B (B)i) Then G _ B (B)i) And transited to the target domain through the generator G _ A, marked as G _ A (G _ B (B))i));
G_A(ai) And G _ B (B)i) Are all generating samples, G _ B (G _ A (a)i) G _ A (G _ B (B))i) All are cyclically generated samples;
(5a) the Wasserstein distance is used to replace the traditional countermeasure loss based on the cross entropy, and the calculation formula of the Wasserstein distance is as follows:
Figure FDA0003513233260000023
in the formula II (P)r,Pg) Representing a true distribution PrAnd generating a distribution PgA set of all joint distributions combined, γ represents any one of the set of joint distributions, variable x represents one true sample in γ, variable y represents one generated sample in γ, E(x,y)~γ[||x-y||]Representing the expectation of gamma versus the distance x and y, the lower boundary of the expectation in the set of the joint distribution is calculated
Figure FDA0003513233260000031
To add the Wasserstein distance to the training process of the CycleGAN cycle to generate the countermeasure network, equation (4) is transformed into equation (5):
Figure FDA0003513233260000032
wherein | | f | non-calculationL≤KRepresenting that the function f is continuous in LepuShz on K, namely the absolute value of the derivative function of f is less than K, K is more than or equal to 0, and sup represents that the upper boundary of the function f meeting the condition is taken;
in the training process of generating the countermeasure network by the CycleGAN cycle, defining a function f (x) as an objective function containing a weight w to solve, namely converting the formula (5) into a formula (6):
Figure FDA0003513233260000033
optimizing an objective function fw(x) I.e. find fw(x) The optimal weight w value of (a) is to find the optimal solution of w, and the function f (x) is defined as an objective function f containing the weight ww(x) Due to fw(x) Wx, W is a weight matrix, W is a weight; | fw||L≤KRepresenting function fwIs continued in Li PuShtz at K, i.e. fwThe absolute value of the derivative function of (A) is less than K; e denotes expectation, PrRepresents the true distribution, x-PrRepresenting samples x from the true distribution, PgRepresents a generation distribution, x to PgIndicating that the sample x is from the generated distribution,
Figure FDA0003513233260000034
representing samples x from the true distribution PrTime fw(x) The expectation of the value of the position of the optical fiber,
Figure FDA0003513233260000035
representing samples x from the generated distribution PgTime fw(x) A desire for a value;
according to the formula (6), the objective function of the discriminator in the countermeasure network generated by reconstructing the cycleGAN cycle is the formula (7), and the function f is ensuredw(x) With the premise of Lipschitz continuity, the optimization discriminator is equivalent to optimizing the true distribution and generating Wasserstein distance between the distributions:
Figure FDA0003513233260000036
d (x) is the objective function of the arbiter, which is equal to the true distribution and the Wasserstein distance between the resulting distributions;
after the Wasserstein distance is introduced, the countermeasure loss of the generator is shown in a formula (8), and the countermeasure loss of the discriminator is shown in a formula (9):
Figure FDA0003513233260000041
Figure FDA0003513233260000042
wherein G represents a generator, D represents a discriminator, N represents the maximum value range of i, and aiRepresenting one source domain sample of N pairs of samples, biRepresenting a target domain sample;
(5b) introducing cycle consistent loss to define the one-to-one correspondence relationship between the original image and the contour image, wherein the cycle consistent loss is shown as the following formula:
Figure FDA0003513233260000043
wherein, | G _ B (G _ A (a))i))-ai||1To obtain G _ B (G _ A (a))i) A and aiManhattan distance of; a isiRepresenting a source domain sample, through a generator G _ A and then a generator G _ B, a new image G _ B (G _ A (a)) in the source domain is obtainedi));biRepresents one target domain sample, G _ A (G _ B (B))i) Represents the new image of bi through the generator twice.
5. The GAN network-based optical red footprint image contour extraction method of claim 1, wherein: the step (6) specifically comprises the following steps:
(6a) setting network hyper-parameters, respectively taking out a sample from a source domain and a target domain of a training set, and generating a corresponding generation sample and a cyclic generation sample for the antagonistic network through the cyclic GAN cycle;
(6b) calculating the countermeasure loss lossD _ a, lossD _ B of the discriminator;
(6c) the opposing losses lossG _ a, lossG _ B and the round-robin uniform loss lossC of the generator are calculated, i.e. the total generator loss is as follows:
lossGtotal=lossG_A+lossG_B+lossC
(6d) and selecting a RMSProp optimizer, setting the parameters of the optimizer, training and storing the cycleGAN cycle to generate the countermeasure network.
CN202210158517.8A 2022-02-21 2022-02-21 Optical red footprint image contour extraction method based on GAN network Pending CN114529737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210158517.8A CN114529737A (en) 2022-02-21 2022-02-21 Optical red footprint image contour extraction method based on GAN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210158517.8A CN114529737A (en) 2022-02-21 2022-02-21 Optical red footprint image contour extraction method based on GAN network

Publications (1)

Publication Number Publication Date
CN114529737A true CN114529737A (en) 2022-05-24

Family

ID=81624298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210158517.8A Pending CN114529737A (en) 2022-02-21 2022-02-21 Optical red footprint image contour extraction method based on GAN network

Country Status (1)

Country Link
CN (1) CN114529737A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205738A (en) * 2022-07-05 2022-10-18 广州和达水务科技股份有限公司 Emergency drainage method and system applied to urban inland inundation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205738A (en) * 2022-07-05 2022-10-18 广州和达水务科技股份有限公司 Emergency drainage method and system applied to urban inland inundation
CN115205738B (en) * 2022-07-05 2023-08-01 广州和达水务科技股份有限公司 Emergency drainage method and system applied to urban inland inundation

Similar Documents

Publication Publication Date Title
CN111627019B (en) Liver tumor segmentation method and system based on convolutional neural network
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN107730451A (en) A kind of compressed sensing method for reconstructing and system based on depth residual error network
CN112733950A (en) Power equipment fault diagnosis method based on combination of image fusion and target detection
CN111369565A (en) Digital pathological image segmentation and classification method based on graph convolution network
CN111080531B (en) Super-resolution reconstruction method, system and device for underwater fish image
CN110705340B (en) Crowd counting method based on attention neural network field
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN116682120A (en) Multilingual mosaic image text recognition method based on deep learning
CN112669248A (en) Hyperspectral and panchromatic image fusion method based on CNN and Laplacian pyramid
Li et al. Single image dehazing with an independent detail-recovery network
CN111489305B (en) Image enhancement method based on reinforcement learning
CN116486074A (en) Medical image segmentation method based on local and global context information coding
CN113870327B (en) Medical image registration method based on prediction multi-level deformation field
CN114529737A (en) Optical red footprint image contour extraction method based on GAN network
CN114155171A (en) Image restoration method and system based on intensive multi-scale fusion
CN116229083A (en) Image denoising method based on lightweight U-shaped structure network
CN116071229A (en) Image super-resolution reconstruction method for wearable helmet
CN115346259A (en) Multi-granularity academic emotion recognition method combined with context information
CN114862696A (en) Facial image restoration method based on contour and semantic guidance
CN115035170A (en) Image restoration method based on global texture and structure
CN115018864A (en) Three-stage liver tumor image segmentation method based on adaptive preprocessing
CN114529519A (en) Image compressed sensing reconstruction method and system based on multi-scale depth cavity residual error network
CN114022362A (en) Image super-resolution method based on pyramid attention mechanism and symmetric network
CN113269702A (en) Low-exposure vein image enhancement method based on cross-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination