CN110580726A - Dynamic convolution network-based face sketch generation model and method in natural scene - Google Patents

Dynamic convolution network-based face sketch generation model and method in natural scene Download PDF

Info

Publication number
CN110580726A
CN110580726A CN201910772659.1A CN201910772659A CN110580726A CN 110580726 A CN110580726 A CN 110580726A CN 201910772659 A CN201910772659 A CN 201910772659A CN 110580726 A CN110580726 A CN 110580726A
Authority
CN
China
Prior art keywords
network
convolution
face
features
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910772659.1A
Other languages
Chinese (zh)
Other versions
CN110580726B (en
Inventor
林倞
陈景文
刘凌波
李冠彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910772659.1A priority Critical patent/CN110580726B/en
Publication of CN110580726A publication Critical patent/CN110580726A/en
Application granted granted Critical
Publication of CN110580726B publication Critical patent/CN110580726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face sketch generation model and a method under a natural scene based on a dynamic convolution network, wherein the method comprises the following steps: step S1, initializing all convolution network and full-connection network parameters; step S2, acquiring a face image, and extracting the layering characteristics of the face image by using a full convolution neural network; step S3, the obtained features are up-sampled by using a transposed convolution network, and the potential area of the human face and the information of the human face change form are mined by using a deformable convolution network; step S4, dividing the features into multi-scale regions, dynamically calculating self-adaptive filter weights in each region, carrying out convolution calculation on the filter weights and the features to obtain new features, and combining all the region features under multiple scales to generate high-quality face sketch; step S5, updating model parameters according to the comparison between the generated face sketch and the real face sketch; and step S6, performing step S2-S5 training in multiple iterations.

Description

Dynamic convolution network-based face sketch generation model and method in natural scene
Technical Field
The invention relates to the technical field of computer vision based on deep learning, in particular to a face sketch generation model and method in a natural scene based on a dynamic convolution network.
background
face sketch generation refers to automatically generating corresponding face sketch from a face photo. Face sketching is a classic task in the field of computer vision. The face sketch has a wide application scene in reality, such as law enforcement agencies and the field of digital entertainment, and attracts academia and industry to conduct a lot of research work on the face sketch.
in recent years, The successful application of convolutional neural networks has brought about a major breakthrough to face sketch generation, for example, Liliang Zhang et al, work in 2015 "End-to-End photo-deletion generation of a fully functional presentation learning" (The Annual ACM International conference on Multimedia Retrieval (ICMR),2015), and Phillip Isola et al, In 2017, "Image-to-Image transformation with conditional adaptive networking" (In Proceedings of the IEEE Conference on Computer Vision and Pattern recognition,2017), all focused on the generation of face sketches using convolutional neural network modeling, however, these methods of modeling by using convolutional neural networks and generation of countermeasure networks based on the deep learning theory can ensure good performance only under limited conditions, for example, the background of a face image needs to be processed into a pure color background, and the orientation of the face needs to be limited.
However, since most face images in real natural scenes are generated under unlimited conditions, these methods lack versatility in real natural scenes. In recent two years, some face sketch generation methods under non-limited conditions have also made certain progress. For example, Zhang et al in 2017 "Content-adaptive panel transport generation by secondary composite representation learning." (IEEEtransactions on Image Processing, 2017) and Jun Yu et al in 2017 "Composition-aid face photo-mask synthesis". In order to ensure the performance of generating the face sketch under non-limiting conditions, the image is preprocessed before generating the face sketch, and the preprocessing comprises removing a disordered background and analyzing the face into different components (such as hair, eyes and mouth). However, these preprocessing methods are very time-consuming and may even fail in a complicated scene, and these disadvantages may cause the existing face sketch generation method to have a serious performance degradation and a poor generalization performance in a real natural scene.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a model and a method for generating a face sketch in a natural scene based on a dynamic convolutional network, so as to achieve the purpose of effectively generating the face sketch in the natural scene without depending on any preprocessing method.
In order to achieve the above object, the present invention provides a face sketch generation model in a natural scene based on a dynamic convolution network, including:
The initialization unit is used for initializing the network parameters of the model;
The encoder unit is used for acquiring a face image under a natural scene without preprocessing and extracting the layering characteristics of the face image by using a full convolution neural network;
The decoder unit gradually samples the hierarchical features generated by the encoder unit by using a transposed convolutional network, and excavates potential areas of the human face and information of the change form of the human face by using a deformable convolutional network;
The human face sketch generation unit is used for dividing the features output by the deformable convolution network into multi-scale regions, dynamically calculating self-adaptive filter weights in each region, carrying out convolution calculation on the filter weights and the features of the corresponding regions to obtain new features, and combining the features of all the regions under multiple scales to generate a high-quality human face sketch;
The updating unit is used for comparing the face sketch generated by the face sketch generating unit with a real face sketch and updating the model parameters by a strategy of an optimized objective function;
And the iterative training unit is used for carrying out the training processes of the encoder unit, the decoder unit, the face sketch generation unit and the updating unit in a multi-iteration mode until the training process converges or the maximum iteration times is reached to obtain a final model.
Preferably, the facial image in the natural scene without preprocessing is from a pre-established training set, and the training set is established by the following processes:
Collecting face images of different data sources and face sketches thereof, establishing a face sketches data set containing the face images and the corresponding face sketches thereof in a natural scene, and not using any preprocessing process for removing a background;
And randomly selecting a plurality of pairs of images from the established face sketch data set as a training set, and using the rest pairs of images as a test set.
Preferably, the fully convolutional neural network is provided with eight layers of convolutional neural networks in sequence, each layer of convolutional neural network is followed by a modified linear unit, the convolution kernel size of each layer of convolutional neural network is 2 × 2, and the number of output channels is 64,128, 192,256 and 256 respectively.
Preferably, in the full convolutional neural network, after the second, fourth and sixth layers of convolutional neural networks, a layer of pooling network is inserted for pooling downsampling, and the step size and pooling size of the pooling network are both 2 × 2.
Preferably, the decoder unit comprises:
The transposition convolution network is used for acquiring the layering characteristics output by the encoder unit and up-sampling the characteristics;
the deformable convolution neural network is used for acquiring the characteristics sampled on the transposed convolution network and calculating each pixel point in the characteristics by utilizing the convolution layer of the characteristics to obtain the offset OpFor each pixel point p, at offset Opis guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution neural network to obtain an output characteristic Fo;
and the splicing module is used for splicing the characteristics sampled on the transposed convolutional network and the characteristics with the same resolution of the full convolutional neural network of the encoder unit by a jump connection method, and adding a layer of standard convolutional network after the spliced characteristics to reduce the number of channels.
Preferably, the deformable convolutional neural network is divided into two steps: the first step is to generate a positional offset, the feature FiInputting the data into a convolution layer with convolution kernel size of 3 multiplied by 3 and output channel number of 18, calculating all pixel points in the characteristics by the convolution layer to obtain offset O, and recording the offset of each pixel point p as Op(ii) a Second, for each pixel p, at offset Opis guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution network to obtain an output characteristic Fo
Preferably, the decoder unit is formed by stacking 3 layers of networks, each layer of network includes a transposed convolutional network and a deformable convolutional network, after each layer of transposed convolutional network, the splicing module splices the features with the same resolution in the full convolutional encoder by using a jump connection technology, and can add a layer of standard convolutional network after the 3 rd spliced feature to reduce the number of channels.
Preferably, the face sketch generating unit further comprises:
a dividing module for equally dividing the final output feature of the deformable convolution network into n × n regions, each region having a resolution ofThe ith region is denoted as Ri
A mapping module for mapping R at different scales using a spatial pooling layeriMapping low dimensional features called fixed size;
A weight calculation module for inputting the pooled features into three continuous full-connection networks with dimensions of 256, 512 and 18432, the output of the last full-connection network is reorganized to 64 × 3 × 3 × 32 and recorded as the weight w of the adaptive convolutional networki
A convolution module for calculating the weight w obtained by the three-layer full-connection networkiand regionRiThe characteristics of the target are subjected to convolution calculation to obtain new specialized characteristics
and the characteristic combination module is used for combining the characteristics of all the areas and generating a high-quality face sketch.
Preferably, the feature combination module combines the features of all the regions at various scalesReorganizing the features with the resolution of H multiplied by W, splicing the features under all scales, and inputting the spliced features into a standard convolution network with the convolution kernel size of 1 multiplied by 1 to generate the final face sketch.
in order to achieve the above object, the present invention further provides a method for generating a face sketch in a natural scene based on a dynamic convolution network, comprising the following steps:
Step S1, initializing the network parameters of all the convolution networks and the full-connection network;
step S2, acquiring a face image under a natural scene without preprocessing, and extracting the layering characteristics of the face image by using a full convolution neural network;
Step S3, the hierarchical characteristics obtained in step S2 are up-sampled by using a transposed convolution network, and the potential area of the human face and the information of the human face change form are mined by using a deformable convolution network;
Step S4, dividing the characteristics output by the deformable convolution network into multi-scale areas, dynamically calculating self-adaptive filter weights in each area, carrying out convolution calculation on the filter weights and the characteristics to obtain new characteristics, and combining all the area characteristics under multiple scales to generate a high-quality face sketch;
step S5, updating the parameters of the model according to the contrast between the generated face sketch and the real face sketch;
And step S6, performing step S2-S5 training in a multi-iteration mode until the training process converges or the maximum iteration times is reached to obtain the final model.
compared with the prior art, the invention relates to a face sketch generation model and a method under a natural scene based on a dynamic convolution network, which select a face sketch data set under the natural scene, initialize the weight of a target model without any preprocessing process such as background clearing, input a face image under the natural scene into a full convolution network consisting of continuous convolution layers and pooling layers to extract hierarchical features, input the hierarchical features into a transposed convolution network and a deformable convolution network for up-sampling and calculating the information features containing potential face regions, divide the features into multi-scale regions, dynamically calculate the self-adaptive filter weight in each region, perform convolution calculation on the filter weight and the features to obtain new features, combine all the region features under three scales to generate a high-quality face sketch, update the parameters of the model according to the comparison between the generated face sketch and the real face sketch, the method provided by the invention dynamically and adaptively calculates the characteristics of the face components in different scale areas in the optimization process, can generate high-quality sketch for the face in an unrestricted natural environment without any preprocessing process, and finally the effect of generating the face sketch in a natural scene under the restricted and unrestricted conditions exceeds that of all the existing methods.
Drawings
FIG. 1 is a system architecture diagram of a human face sketch generation model in a natural scene based on a dynamic convolution network according to the present invention;
FIG. 2 is a schematic diagram of a deformable convolution network in accordance with an embodiment of the present invention;
FIG. 3 is a diagram of a dynamic convolution network framework under non-limiting conditions in an embodiment of the present invention;
FIG. 4 is a diagram of an adaptive convolutional network in an embodiment of the present invention
FIG. 5 is a flowchart of steps of a method for generating a face sketch in a natural scene based on a dynamic convolutional network according to the present invention.
Detailed Description
other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
Fig. 1 is a system architecture diagram of a human face sketch generation model in a natural scene based on a dynamic convolution network. As shown in fig. 1, the invention relates to a face sketch generation model in a natural scene based on a dynamic convolution network, which comprises:
The initialization unit 101 is configured to initialize network parameters of the model, and specifically, the initialization unit 101 randomly initializes the network parameters by using a normal distribution with a standard deviation of 0.02 for all convolution layers and all connection layers of the model.
an encoder unit 102, configured to obtain a face image in a natural scene without preprocessing, extract hierarchical features of the face image by using a full convolutional neural network, where the full convolutional neural network is composed of consecutive convolutional layers and pooling layers,
in an embodiment of the present invention, the face image in the natural scene without being preprocessed may be from a pre-established training set. The training set may be established by the following process:
A face sketch data set containing a face image and a face sketch corresponding to the face image in a natural scene is established, and any preprocessing processes such as background clearing and the like are not used. In the embodiment of the invention, the face images and the face sketches thereof from two data sources are collected, the first data source is a Facececescub data set which comprises 530 individual face images, and for each individual, a face image and a corresponding face sketch are randomly selected and added into the face sketches data set; the second data source is 270 face images collected from the internet, and the professional is asked to draw a face sketch for each image, and finally a face sketch data set containing 800 face images and corresponding face sketches is established.
The 400 pairs of images from the created face sketch dataset were randomly selected as the training set, and the remaining 400 images were used as the test set (which can be used to evaluate the effect of the present invention and other existing methods). The face image in the face sketch data set has more complicated changes of background, illumination, age, expression and the like, and can better reflect the real situation.
Therefore, the encoder unit 102 acquires the face image in the natural scene without preprocessing from the training set, and extracts the hierarchical features of the face image by using the full convolution neural network. Specifically, the obtained face image is input into the full convolution neural network, the full convolution neural network is sequentially provided with eight layers of convolution neural networks, each layer of convolution neural network is followed by a correction linear unit, the convolution kernel size of each layer of convolution neural network is 2 multiplied by 2, and the number of output channels is 64,128, 192,256 and 256 respectively. Preferably, after the second, fourth and sixth layers of convolutional neural networks, a layer of pooling network is inserted for pooling downsampling, and the step size and pooling size of the pooling network are both 2 × 2.
After passing through a full convolution neural network comprising 8 layers of convolution neural networks, 8 correction linear units and 3 pooling layers, the resolution of an input image is H multiplied by W, and the size of the feature output after passing through the full convolution neural network is H multiplied by W
The decoder unit 103 gradually upsamples the hierarchical features generated by the encoder unit 102 by using a transposed convolutional network, and mines face potential areas and face change form information by using a deformable convolutional network. Specifically, for the input feature Fithe decoder unit 103 dynamically calculates the offset position using a convolutional neural network with an output channel number of 18 and a convolutional kernel size of 2 × 2, and for a pixel p, its offset can be organized as a tensor of 3 × 3, denoted as OpIn the embodiment of the present invention, the up-sampling ratio of the transposed convolution network is set to 2, and the features after up-sampling are compared with the full featuresFeatures with the same resolution in the convolutional neural network are spliced together by a jump connection method to generate new face features. The decoder unit 103 may suppress clutter and make the network computationally prone to the face region through a deformable convolutional network.
specifically, the decoder unit 103 further includes:
The transposed convolution network is used for acquiring the hierarchical features output by the encoder unit and up-sampling the features, that is, the hierarchical features output by the encoder unit are input to the transposed convolution network, and the features are up-sampled by using the transposed convolution network. In a specific embodiment of the present invention, the transposed convolution network is used to perform a scale-2 upsampling on a feature.
The deformable convolution neural network is used for acquiring the characteristics sampled on the transposed convolution network and calculating each pixel point in the characteristics by utilizing the convolution layer of the characteristics to obtain the offset OpFor each pixel point p, at offset Opis guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution neural network to obtain an output characteristic Fo. Specifically, the characteristic after the up-sampling of the transposed convolution network is input into a deformable convolution neural network, and the convolution layer of the characteristic is used for calculating each pixel point in the characteristic to obtain the offset Op. The deformable convolutional network is shown in FIG. 2, where FiAs input features, Fois the output characteristic. The deformable convolution is divided into two steps, the first step being the generation of the position offset: will be characterized by FiInputting the data into a convolution layer with convolution kernel size of 3 multiplied by 3 and output channel number of 18, calculating all pixel points in the characteristics by the convolution layer to obtain offset O, and recording the offset of each pixel point p as 3 multiplied by 3 and Op(ii) a Second, for each pixel p, at offset OpIs guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution network to obtain an output characteristic Fo. A grid G of 3 × 3 is G { (-1, -1), (-1, 0),. -, (0.1), (1.1) }
Output characteristic diagram of pixel point pFoIs represented as follows:
And the splicing module is used for splicing the features sampled on the transposed convolutional network and the features with the same resolution in the full convolutional neural network of the encoder unit by a jump connection method.
As shown in fig. 3, the FCE on the left is an encoder unit 102, fully called a full convolutional network encoder, consisting of 4 Standard Convolutions (SC), 4 convolution types have been labeled in fig. 3, and the DCD in the middle is a decoder unit 103 consisting of 3 deformable convolutional networks (DC) and 3 transposed convolutional networks (TC) with an up-sampling ratio of 2 (i.e. three layers stacked by a network comprising one layer of transposed convolutional networks + one layer of deformable convolutional networks). After each layer of the transposed convolutional network, features with the same resolution in the full convolutional encoder are spliced by using a jump connection technology, and a layer of standard convolutional network (SC) can be added after the 3 rd spliced feature (i.e. after the last layer of the transposed convolutional network TC) to reduce the number of channels from 128 to 64, thereby reducing the amount of model calculation.
The face sketch generation unit 104 divides the features generated by the deformable convolution network into multi-scale regions, dynamically calculates the adaptive filter weights in each region, performs convolution calculation on the filter weights and the features of the corresponding regions to obtain new features, combines the features of all the regions under three scales and generates a high-quality face sketch, and the generated face sketch is YF
Specifically, the face sketch generating unit 104 further includes
A dividing module for equally dividing the final output characteristics of the deformable convolution network in the decoder unit 103 into n × n regions, each region having a resolution ofThe ith region is denoted as Ri. Since the scales of the components in the human face are not all consistent, the regions are divided in a single scaleThe method of (2) does not cover a certain component of the face well. Therefore, if n is 3, n is 4, and n is 5, three division regions under different scales are obtained, and the extraction capability of the model on the face component features of different scales is improved.
a mapping module to map R at different scales using a spatial pooling layer of 32 x 32 sizeiThe mapping is referred to as a fixed-size low-dimensional feature, which reduces the number of parameters of the model and reduces computational complexity.
A weight calculation module for inputting the pooled features into three continuous full-connection networks with dimensions of 256, 512 and 18432, respectively, the output of the last full-connection network is reorganized to 64 × 3 × 3 × 32, and recorded as the weight w of the adaptive convolutional network (AC)i,. Note that the output of the fully-connected network is computed independently at different regions of different scales, which makes w for the adaptive convolutional networkiIs associated with the region RiThe feature content of (a) varies and thus the weight calculation process is said to be adaptive. That is, the present invention uses fully connected network outputs as weights rather than directly using fixed weights, so that different inputs have different weights.
A convolution module for calculating the weight w obtained by the three-layer full-connection networkiAnd region RiThe convolution calculation is carried out on the characteristics, the number of output channels is 32, and new specialized characteristics are obtainedThe adaptive convolutional network described above is shown in fig. 4.
A feature combination module for combining the features of all regions and generating a high-quality face sketch, specifically, specializing the features of all regions at each scaleReorganizing the features with the resolution of H multiplied by W, splicing the features under the three scales, and inputting the spliced features into a standard convolution network with the convolution kernel size of 1 multiplied by 1 to generate the final face sketch.
And the updating unit 105 is used for comparing the generated face sketch with the real face sketch and updating the model parameters by optimizing the strategy of the objective function. Specifically, the optimization objective function is composed of a countervailing loss function and a euclidean space loss function, and the objective function used for the optimization model is as follows:
where Y is the true face sketch from the training set, YFand F, generating a network for the generated face sketch. D is a discriminator which distinguishes the generated face sketch and the real face sketch one by one. In the above objective function, the left term is the penalty function and the right term is the Euclidean penalty function. In the training process, the whole model parameters are optimized by using an Adam optimization algorithm. The learning rate is set to 2e-4 and the batch size is set to 1.
and the iterative training unit 106 is configured to iteratively perform the training processes of the encoder unit 102, the decoder unit 103, the face sketch generating unit 104, and the updating unit 105 for multiple times until the training process converges or a maximum iteration number is reached to obtain a final model.
FIG. 5 is a flowchart of steps of a method for generating a face sketch in a natural scene based on a dynamic convolutional network according to the present invention. As shown in fig. 5, the method for generating a face sketch in a natural scene based on a dynamic convolution network of the present invention includes the following steps:
Step S1, configured to initialize network parameters of all convolutional networks and fully-connected networks used, specifically, randomly initialize network parameters of all convolutional layers and fully-connected layers using normal distribution with a standard deviation of 0.02.
and step S2, acquiring the face image under the natural scene without preprocessing, and extracting the layering characteristics of the face image by using a full convolution neural network. The full convolutional neural network consists of continuous convolutional layers and pooling layers
In an embodiment of the present invention, the face image in the natural scene without being preprocessed may be from a pre-established training set. The training set may be established by the following process:
A face sketch data set containing a face image and a face sketch corresponding to the face image in a natural scene is established, and any preprocessing processes such as background clearing and the like are not used. In the embodiment of the invention, the face images and the face sketches thereof from two data sources are collected, the first data source is a Facececescub data set which comprises 530 individual face images, and for each individual, a face image and a corresponding face sketch are randomly selected and added into the face sketches data set; the second data source is 270 face images collected from the internet, and the professional is asked to draw a face sketch for each image, and finally a face sketch data set containing 800 face images and corresponding face sketches is established.
and randomly selecting 400 pairs of images from the established face sketch data set as a training set, and using the remaining 400 images as a test set. The face image in the face sketch data set has more complicated changes of background, illumination, age, expression and the like, and can better reflect the real situation.
therefore, in step S2, a face image in a natural scene without preprocessing is obtained from the training set, and the hierarchical features of the face image are extracted by using a full convolution neural network. Specifically, the human face image is input into the full convolution neural network, the full convolution neural network is sequentially provided with eight layers of convolution neural networks, each layer of convolution neural network is followed by a correction linear unit, the convolution kernel size of each layer of convolution neural network is 2 multiplied by 2, and the number of output channels is 64,128, 192,256 and 256 respectively. Preferably, after the second, fourth and sixth layers of convolutional neural networks, a layer of pooling network is inserted for pooling downsampling, and the step size and pooling size of the pooling network are both 2 × 2.
specifically, step S2 further includes:
And step S200, directly inputting the face image into a two-layer convolution neural network with convolution kernels of 2 multiplied by 2 and output channels of 64 and 64 respectively.
Step 201, a layer of modified linear unit is inserted after each layer of convolutional neural network in step 200, and the features output by the layer of convolutional neural network are downsampled by using pooling layers with step length and size both being 2 × 2 after the second layer of convolutional neural network.
Step S202, repeating steps S200-S201 according to the output of step S201, changing the number of output channels of two layers of convolutional neural networks into 128 and 128,192 and 192,256 and 256 respectively, and finally outputting the hierarchical features of the face image, specifically, inputting the output of step S101 into a third layer and a fourth layer of convolutional neural networks with convolutional cores of 2 x 2 and output channels of 128 and 128 respectively, inserting a layer of correction linear unit behind each layer of convolutional neural network, and down-sampling the features output by the layer of convolutional neural network by using a pooling layer with the step length and the size of 2 x 2 behind the fourth layer of convolutional neural network; inputting the output of the pooling layer after the fourth layer of convolutional neural network into a fifth layer and a sixth layer of convolutional neural network, wherein the sizes of convolutional kernels are 2 multiplied by 2, the number of output channels is 192 and 92 respectively, inserting a layer of correction linear unit after each layer of convolutional neural network, and downsampling the output characteristics of the layer of convolutional neural network by using the pooling layer with the step length and the size of 2 multiplied by 2 after the sixth layer of convolutional neural network; and (3) inputting the output of the pooling layer after the sixth layer of convolutional neural network into a seventh layer and an eighth layer of convolutional neural network, wherein the sizes of convolutional kernels are 2 multiplied by 2, the number of output channels is 256 and 256 respectively, and a layer of correction linear unit is inserted after each layer of convolutional neural network.
after the convolutional networks with different channel numbers are repeatedly stacked in step S202, 8 layers of convolutional neural networks, 8 modified linear units, and 3 pooling layers are provided. Assuming that the resolution of the input image in S100 is H × W, the output feature size after the steps S200-S202 is
and step S3, the hierarchical features obtained in the step S2 are up-sampled by using a transposed convolution network, and the potential area of the human face and the information of the human face change form are mined by using a deformable convolution network. In step S3, the cluttered background may be suppressed by the deformable convolutional network and the network is made computationally inclined to the face region.
Specifically, step S3 further includes:
in step S300, the features output in step S2 are input into a transposed convolution network, and the features are up-sampled by the transposed convolution network. In a specific embodiment of the present invention, the transposed convolution network is used to perform a scale-2 upsampling on a feature.
Step S301, inputting the feature sampled in step S300 into a deformable convolutional neural network, and calculating each pixel point in the feature by using the convolutional layer to obtain an offset Op. The deformable convolution is divided into two steps, the first step (i.e., step S201) is to generate a position offset: will be characterized by FiInputting the data into a convolution layer with convolution kernel size of 3 multiplied by 3 and output channel number of 18, calculating all pixel points in the characteristics by the convolution layer to obtain offset O, and recording the offset of each pixel point p as 3 multiplied by 3 and Op
Step S302, for each pixel point p, at offset OpIs guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution network to obtain an output characteristic Fo. A grid G of 3 × 3 is G { (-1, -1), (-1, 0),. -, (0.1), (1.1) }
output characteristic diagram F for pixel point poIs represented as follows:
In step S303, after transposing the convolutional network in each layer, features with the same resolution in the full convolutional neural network in step S2 are spliced together by using a jump connection technique, and a layer of standard convolutional network is added after the spliced features to reduce the number of channels from 128 to 64, so as to reduce the amount of model computation.
And step S4, dividing the features output by the deformable convolution network into multi-scale regions, dynamically calculating self-adaptive filter weights in each region, carrying out convolution calculation on the filter weights and the features to obtain new features, and combining the features of all the regions under three scales to generate the high-quality face sketch.
Specifically, step S4 further includes:
Step S400, equally dividing the final output characteristic of the deformable convolution network in the step S3 into n multiplied by n areas, wherein the resolution of each area isThe ith region is denoted as Ri. Because the scales of the components in the human face are not all consistent, the method of dividing the regions in a single scale cannot well cover a certain component of the human face. Therefore, if n is 3, n is 4, and n is 5, three division regions under different scales are obtained, and the extraction capability of the model on the face component features of different scales is improved.
Step S401, using spatial pooling layer of 32 × 32 size to pool R at different scalesiThe mapping is referred to as a fixed-size low-dimensional feature, which reduces the number of parameters of the model and reduces computational complexity.
Step S402, inputting the pooled features into three continuous full-connection networks with the dimensions of 256, 512 and 18432 respectively. The output of the last layer of fully-connected network is re-organized to 64 x 3 x 32 and recorded as the weight w of the adaptive convolutional networki. Note that the output of the fully-connected network is computed independently at different regions of different scales, which makes w for the adaptive convolutional networkiIs associated with the region RiThe feature content of (a) varies and thus the weight calculation process is said to be adaptive.
Step S403, calculating the weight w obtained by the three-layer full-connection networkiand region RiThe convolution calculation is carried out on the characteristics, the number of output channels is 32, and new specialized characteristics are obtainedThe adaptive convolution process of the above steps S402-S403The network is shown in fig. 4.
Step S404, combining the features of all the regions and generating a high-quality face sketch, specifically, specializing the features of all the regions under each scaleReorganizing the features with the resolution of H multiplied by W, splicing the features under the three scales, and inputting the spliced features into a standard convolution network with the convolution kernel size of 1 multiplied by 1 to generate the final face sketch.
step S5, updating parameters of the model according to the comparison between the generated face sketch and the real face sketch, specifically, optimizing the model by using the following objective function:
wherein Y is a true face sketch, YFAnd F, generating a network for the generated face sketch. D is a discriminator which distinguishes the generated face sketch and the real face sketch one by one. In the above objective function, the left term is the penalty function and the right term is the Euclidean penalty function. In the training process, the whole model parameters are optimized by using an Adam optimization algorithm. The learning rate is set to 2e-4 and the batch size is set to 1.
and step S6, performing training from step S2 to step S5 in a multi-iteration mode until the training process converges or the maximum iteration number is reached to obtain the final model.
To sum up, the invention relates to a face sketch generation model and method under natural scene based on dynamic convolution network, which is characterized in that a face sketch data set under natural scene is selected, the weight of a target model is initialized without any preprocessing process such as background clearing, the face image under natural scene is input into a full convolution network composed of continuous convolution layers and pooling layers to extract hierarchical features, the hierarchical features are input into a transposed convolution network and a deformable convolution network to be up-sampled and calculate the information features containing potential face regions, the features are divided into multi-scale regions, the self-adaptive filter weight is dynamically calculated in each region, the filter weight and the features are convolution calculated to obtain new features, all the region features under three scales are combined to generate high-quality face sketch, the parameters of the model are updated according to the contrast between the generated face sketch and the real face sketch, the method provided by the invention dynamically and adaptively calculates the characteristics of the face components in different scale areas in the optimization process, can generate high-quality sketch for the face in an unrestricted natural environment without any preprocessing process, and finally the effect of generating the face sketch in a natural scene under the restricted and unrestricted conditions exceeds that of all the existing methods.
the foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims (10)

1. A face sketch generation model under a natural scene based on a dynamic convolution network comprises the following steps:
the initialization unit is used for initializing the network parameters of the model;
The encoder unit is used for acquiring a face image under a natural scene without preprocessing and extracting the layering characteristics of the face image by using a full convolution neural network;
the decoder unit gradually samples the hierarchical features generated by the encoder unit by using a transposed convolutional network, and excavates potential areas of the human face and information of the change form of the human face by using a deformable convolutional network;
The human face sketch generation unit is used for dividing the features output by the deformable convolution network into multi-scale regions, dynamically calculating self-adaptive filter weights in each region, carrying out convolution calculation on the filter weights and the features of the corresponding regions to obtain new features, and combining the features of all the regions under multiple scales to generate a high-quality human face sketch;
The updating unit is used for comparing the face sketch generated by the face sketch generating unit with a real face sketch and updating the model parameters by a strategy of an optimized objective function;
And the iterative training unit is used for carrying out the training processes of the encoder unit, the decoder unit, the face sketch generation unit and the updating unit in a multi-iteration mode until the training process converges or the maximum iteration times is reached to obtain a final model.
2. The dynamic convolutional network-based face sketch generation model under natural scenes as claimed in claim 1, characterized in that: the face image under the natural scene without being preprocessed comes from a pre-established training set, and the training set is established through the following processes:
Collecting face images of different data sources and face sketches thereof, establishing a face sketches data set containing the face images and the corresponding face sketches thereof in a natural scene, and not using any preprocessing process for removing a background;
And randomly selecting a plurality of pairs of images from the established face sketch data set as a training set, and using the rest pairs of images as a test set.
3. The dynamic convolutional network-based face sketch generation model under natural scenes as claimed in claim 1, characterized in that: the full convolutional neural network is sequentially provided with eight layers of convolutional neural networks, each layer of convolutional neural network is followed by a correction linear unit, the convolutional kernel size of each layer of convolutional neural network is 2 multiplied by 2, and the number of output channels of each layer of convolutional neural network is 64,128, 192,256 and 256.
4. the dynamic convolutional network-based face sketch generation model under natural scenes as claimed in claim 1, characterized in that: and in the full convolutional neural network, after the second convolutional neural network, the fourth convolutional neural network and the sixth convolutional neural network, respectively inserting a layer of pooling network for pooling downsampling, wherein the step length and the pooling size of the pooling network are both 2 multiplied by 2.
5. The dynamic convolutional network-based face sketch generation model in natural scene as claimed in claim 1, wherein said decoder unit further comprises:
The transposition convolution network is used for acquiring the layering characteristics output by the encoder unit and up-sampling the characteristics;
The deformable convolution neural network is used for acquiring the characteristics sampled on the transposed convolution network and calculating each pixel point in the characteristics by utilizing the convolution layer of the characteristics to obtain the offset OpFor each pixel point p, at offset OpIs guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution neural network to obtain an output characteristic Fo
And the splicing module is used for splicing the characteristics sampled on the transposed convolutional network and the characteristics with the same resolution of the full convolutional neural network of the encoder unit by a jump connection method, and adding a layer of standard convolutional network after the spliced characteristics to reduce the number of channels.
6. The model of claim 5, wherein the deformable convolutional neural network is divided into two steps: the first step is to generate a positional offset, the feature Fiinputting the data into a convolution layer with convolution kernel size of 3 multiplied by 3 and output channel number of 18, calculating all pixel points in the characteristics by the convolution layer to obtain offset O, and recording the offset of each pixel point p as Op(ii) a Second, for each pixel p, at offset OpIs guided by the input feature FiPerforming convolution calculation with the filter weight w of the deformable convolution network to obtain an output characteristic Fo
7. The dynamic convolutional network-based face sketch generation model under natural scenes as claimed in claim 5, characterized in that: the decoder unit is formed by stacking 3 layers of networks, each layer of network comprises a transposed convolution network and a deformable convolution network, after each layer of transposed convolution network, the splicing module splices the features with the same resolution in the full convolution encoder by using a jump connection technology, and can add a layer of standard convolution network after the 3 rd spliced feature to reduce the number of channels.
8. the model for generating face sketch under natural scene based on dynamic convolution network as claimed in claim 1, wherein said face sketch generating unit further comprises:
a dividing module for equally dividing the final output feature of the deformable convolution network into n × n regions, each region having a resolution ofthe ith region is denoted as Ri
A mapping module for mapping R at different scales using a spatial pooling layeriMapping low dimensional features called fixed size;
A weight calculation module for inputting the pooled features into three continuous full-connection networks with dimensions of 256, 512 and 18432, and the output of the last full-connection network is reorganized to 64 × 3 × 3 × 32, and recorded as the weight w of the adaptive convolutional networki
A convolution module for calculating the weight w obtained by the three-layer full-connection networkiand region RiThe characteristics of the target are subjected to convolution calculation to obtain new specialized characteristics
And the characteristic combination module is used for combining the characteristics of all the areas and generating a high-quality face sketch.
9. The dynamic convolutional network-based natural scene human face as claimed in claim 8the sketch generative model is characterized in that: the characteristic combination module combines the characteristics of all the areas at each scaleReorganizing the features with the resolution of H multiplied by W, splicing the features under all scales, and inputting the spliced features into a standard convolution network with the convolution kernel size of 1 multiplied by 1 to generate the final face sketch.
10. A face sketch generation method under a natural scene based on a dynamic convolution network comprises the following steps:
Step S1, initializing the network parameters of all the convolution networks and the full-connection network;
Step S2, acquiring a face image under a natural scene without preprocessing, and extracting the layering characteristics of the face image by using a full convolution neural network;
Step S3, the hierarchical characteristics obtained in step S2 are up-sampled by using a transposed convolution network, and the potential area of the human face and the information of the human face change form are mined by using a deformable convolution network;
Step S4, dividing the characteristics output by the deformable convolution network into multi-scale areas, dynamically calculating self-adaptive filter weights in each area, carrying out convolution calculation on the filter weights and the characteristics to obtain new characteristics, and combining all the area characteristics under multiple scales to generate a high-quality face sketch;
Step S5, updating the parameters of the model according to the contrast between the generated face sketch and the real face sketch;
And step S6, performing step S2-S5 training in a multi-iteration mode until the training process converges or the maximum iteration number is reached to obtain the final model.
CN201910772659.1A 2019-08-21 2019-08-21 Dynamic convolution network-based face sketch generation model and method in natural scene Active CN110580726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910772659.1A CN110580726B (en) 2019-08-21 2019-08-21 Dynamic convolution network-based face sketch generation model and method in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910772659.1A CN110580726B (en) 2019-08-21 2019-08-21 Dynamic convolution network-based face sketch generation model and method in natural scene

Publications (2)

Publication Number Publication Date
CN110580726A true CN110580726A (en) 2019-12-17
CN110580726B CN110580726B (en) 2022-10-04

Family

ID=68811641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910772659.1A Active CN110580726B (en) 2019-08-21 2019-08-21 Dynamic convolution network-based face sketch generation model and method in natural scene

Country Status (1)

Country Link
CN (1) CN110580726B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488887A (en) * 2020-04-09 2020-08-04 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN112907708A (en) * 2021-02-05 2021-06-04 深圳瀚维智能医疗科技有限公司 Human face cartoon method, equipment and computer storage medium
CN117830083A (en) * 2024-03-05 2024-04-05 昆明理工大学 Method and device for generating face sketch-to-face photo

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108010031A (en) * 2017-12-15 2018-05-08 厦门美图之家科技有限公司 A kind of portrait dividing method and mobile terminal
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
CN108509978A (en) * 2018-02-28 2018-09-07 中南大学 The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN109359541A (en) * 2018-09-17 2019-02-19 南京邮电大学 A kind of sketch face identification method based on depth migration study
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
CN109920021A (en) * 2019-03-07 2019-06-21 华东理工大学 A kind of human face sketch synthetic method based on regularization width learning network
CN110023989A (en) * 2017-03-29 2019-07-16 华为技术有限公司 A kind of generation method and device of sketch image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110023989A (en) * 2017-03-29 2019-07-16 华为技术有限公司 A kind of generation method and device of sketch image
CN109426858A (en) * 2017-08-29 2019-03-05 京东方科技集团股份有限公司 Neural network, training method, image processing method and image processing apparatus
CN108010031A (en) * 2017-12-15 2018-05-08 厦门美图之家科技有限公司 A kind of portrait dividing method and mobile terminal
CN108399362A (en) * 2018-01-24 2018-08-14 中山大学 A kind of rapid pedestrian detection method and device
CN108509978A (en) * 2018-02-28 2018-09-07 中南大学 The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN109359541A (en) * 2018-09-17 2019-02-19 南京邮电大学 A kind of sketch face identification method based on depth migration study
CN109920021A (en) * 2019-03-07 2019-06-21 华东理工大学 A kind of human face sketch synthetic method based on regularization width learning network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵艳丹 等: "基于人脸特征和线积分卷积的肖像素描生成", 《计算机辅助设计与图形学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488887A (en) * 2020-04-09 2020-08-04 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN111488887B (en) * 2020-04-09 2023-04-18 腾讯科技(深圳)有限公司 Image processing method and device based on artificial intelligence
CN112907708A (en) * 2021-02-05 2021-06-04 深圳瀚维智能医疗科技有限公司 Human face cartoon method, equipment and computer storage medium
CN112907708B (en) * 2021-02-05 2023-09-19 深圳瀚维智能医疗科技有限公司 Face cartoon method, equipment and computer storage medium
CN117830083A (en) * 2024-03-05 2024-04-05 昆明理工大学 Method and device for generating face sketch-to-face photo
CN117830083B (en) * 2024-03-05 2024-05-03 昆明理工大学 Method and device for generating face sketch-to-face photo

Also Published As

Publication number Publication date
CN110580726B (en) 2022-10-04

Similar Documents

Publication Publication Date Title
CN111079532B (en) Video content description method based on text self-encoder
WO2020168844A1 (en) Image processing method, apparatus, equipment, and storage medium
CN111652966B (en) Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
CN110660062B (en) Point cloud instance segmentation method and system based on PointNet
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111681252A (en) Medical image automatic segmentation method based on multipath attention fusion
CN110580726B (en) Dynamic convolution network-based face sketch generation model and method in natural scene
CN111986075B (en) Style migration method for target edge clarification
CN109712108B (en) Visual positioning method for generating network based on diversity discrimination candidate frame
CN113379786B (en) Image matting method, device, computer equipment and storage medium
CN112288011A (en) Image matching method based on self-attention deep neural network
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
CN110415261B (en) Expression animation conversion method and system for regional training
CN110599495B (en) Image segmentation method based on semantic information mining
CN115565039A (en) Monocular input dynamic scene new view synthesis method based on self-attention mechanism
CN115115744A (en) Image processing method, apparatus, device, storage medium, and program product
CN111667401B (en) Multi-level gradient image style migration method and system
US11948090B2 (en) Method and apparatus for video coding
CN117499711A (en) Training method, device, equipment and storage medium of video generation model
CN117237623A (en) Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle
CN116091885A (en) RAU-GAN-based lung nodule data enhancement method
CN115937429A (en) Fine-grained 3D face reconstruction method based on single image
CN114092610B (en) Character video generation method based on generation of confrontation network
Wang et al. Unsupervised scene sketch to photo synthesis
CN116152263A (en) CM-MLP network-based medical image segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant