CN108510456A

CN108510456A - The sketch of depth convolutional neural networks based on perception loss simplifies method

Info

Publication number: CN108510456A
Application number: CN201810259452.XA
Authority: CN
Inventors: 徐雪妙; 谢敏珊; 缪佩琦
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2018-09-07
Anticipated expiration: 2038-03-27
Also published as: CN108510456B

Abstract

The invention discloses the sketches of the depth convolutional neural networks based on perception loss to simplify method, including step：1) data acquisition；Use rough draft figure and corresponding label data；2) data processing；It converts the rough draft figure of image data set and corresponding line manuscript base to the trained required format of depth convolutional neural networks by pretreatment；3) model construction；According to training objective, constructs one and be suitable for the depth convolutional neural networks that sketch simplifies problem；4) loss function is defined；5) model training；The penalty values of network are calculated according to loss function, then calculate the gradient of each network layer parameter by backpropagation, and the parameter of each layer of network is updated by stochastic gradient descent method；6) model is verified；The model that training obtains is verified using validation data set, tests its Generalization Capability.Method proposed by the present invention so that sketch simplifies network more coarse, the indefinite sketch of main structure line that can handle miscellaneous line, and the influence for illumination has stronger robustness.

Description

The sketch of depth convolutional neural networks based on perception loss simplifies method

Technical field

The present invention relates to the technical fields of Computer Image Processing, refer in particular to a kind of depth convolution based on perception loss The sketch of neural network simplifies method.

Background technology

Skeletonizing is the first step of artistic creation.In this step, creator usually merely desires to expression concept and composition, and It is not to focus on portraying for details.After having created sketch, creator needs to describe clean line original text on the basis of sketch, this is One lengthy and tedious and labor intensive work.

In recent years, with the fast development of deep learning clean line original text was may learn using depth convolutional neural networks The feature of figure, to which the sketch more than miscellaneous line to be changed into the simplification line manuscript base for possessing line manuscript base feature.But its there are the problem of It is that current deep learning method directly weighs loss by calculating the Euclidean distance of generation figure and artificial marking line original text, and Actually sketch and line original text are not matched one by one in pixel, but consistent in visual perception, this allows for existing depth Degree learning method can only handle fairly simple less mixed and disorderly line original text, and larger for some difficulty, and main structure line is unknown True, the problems such as needing the line original text semantically understood that lines will occur and lose, obscure, distort, simplify scarce capacity.

Invention content

The shortcomings that it is an object of the invention to overcome the prior art and deficiency provide a kind of depth volume based on perception loss The sketch of product neural network simplifies method, and this method can well simplify the sketch of arbitrary size, before breaching The problems such as loss, fuzzy, lines distort, simplify scarce capacity in sketch simplification method.

In order to achieve the above object, the present invention uses following technical scheme：

The present invention is based on the sketches of the depth convolutional neural networks of perception loss to simplify method, includes the following steps：

1) data acquisition；

Using rough draft figure and corresponding label data, training depth convolutional neural networks need training data, the label Data are exactly the correspondence line manuscript base manually drawn on the basis of rough draft figure, are then divided into training dataset and verify data Collection；

2) data processing；

It converts the rough draft figure of image data set and corresponding line manuscript base to trained depth convolutional neural networks by pretreatment Required format；

3) model construction；

According to training objective and the input/output format of model, constructs one and be suitable for the depth that sketch simplifies problem Convolutional neural networks；

4) loss function is defined；

According to training objective and the framework of model, required loss function is defined；

5) model training；

The parameter of each layer network is initialized, continuous iteration inputs training sample, network is calculated according to loss function Penalty values, then the gradient of each network layer parameter is calculated by backpropagation, by stochastic gradient descent method to each layer network Parameter is updated；

6) model is verified；

The model trained using verify data set pair is verified, its Generalization Capability is tested.

As a preferred technical solution, in the step 2), data processing is specially：

2.1) image is changed into gray-scale map；

2.2) horizontal, vertical, diagonal turnover is carried out to the image that data are concentrated, corresponding line manuscript base also does same overturning Operation；

2.3) picture after overturning is given to add tone reversal, noise and shade；

2.4) by treated, picture is cut to be sized, and corresponding line manuscript base is also cropped to correspondingly sized；

2.5) image concentrated to data is transformed into from [0,255] in the range of [- 1,1].

As a preferred technical solution, in the step 2.4), treated, and picture is cut to 384 × 384 pixel sizes.

As a preferred technical solution, in the step 3), model construction is specially：

3.1) sketch simplified network model is constructed；

The sketch simplified network model is responsible for the sketch of input to be simplified to line original text, and input is an arbitrary size Sketch, output are the line manuscript bases after a same size size simplifies；Sketch simplified network model is by two concatenated encoders Model and decoder model are constituted；The encoder model is for extracting high-rise semantic information from the sketch of input simultaneously It is saved in the coding of a low-dimensional；The decoder model is for after being disassembled in the coding of low-dimensional and restoring and be simplified Result figure；

3.2) construction feature extracts network model；

The feature extraction network model is used to extract high-rise semantic information from the image of input and in this height Be compared under layer semantic information, the input of feature extraction network be result figure after the simplification that step 3.1) is handled and Corresponding artificial marking line manuscript base needs input picture being extended to 3 × 384 × 384 before inputting pre-training VGG16 networks, Under a high-layer semantic information, result figure and the corresponding artificial respective characteristic pattern of marking line manuscript base is taken to be compared.

As a preferred technical solution, in the step 3.1), the encoder model includes multiple cascade down-samplings Layer, the down-sampling layer are made of concatenated convolutional layer, batch regularization layer and nonlinear activation layer, are increasing receptive field While be gradually reduced the size of image, make Web-based Self-regulated Learning down-sampling, batch regularization using the convolutional layer with step-length Layer plays the work stablized with acceleration model training by the mean value and standard deviation of the input sample of the same batch of normalization With it is simple linear model that the addition of nonlinear activation layer, which prevents model degradation, improves the descriptive power of model；

The decoder model includes multiple cascade up-sampling layers, the up-sampling layer by concatenated warp lamination, batch Regularization layer and nonlinear activation layer composition are measured, plays the role of encoding enlarged drawing size from low-dimensional, with the anti-of step-length Convolutional layer makes Web-based Self-regulated Learning up-sample, same in the effect of batch regularization layer and nonlinear activation layer and encoder, together When we also introduce residual block to accelerate training speed.

As a preferred technical solution, in the step 3.2),

The feature extraction network model includes multiple cascade down-sampling layers, and the down-sampling layer is by concatenated convolution Layer and nonlinear activation function layer, pond layer composition, wherein convolutional layer step-length is 1, and convolution kernel size is 3 × 3, is extracted Corresponding characteristic pattern, it is simple linear model that the addition of nonlinear activation layer, which prevents model degradation, improves the description energy of model Power, the effect of pond layer is to reduce the size of characteristic pattern, to increase the receptive field of convolution kernel.

As a preferred technical solution, in the step 4), defining loss function is specially：

The computational methods of the function be by the result figure after sketch simplifies network reduction and its it is corresponding in advance Through the good line manuscript base of handmarking together input feature vector extraction network result figure and line are taken under a high-rise semantic information The corresponding characteristic pattern of manuscript base is compared, and defining loss function makes the characteristic pattern of result figure and the characteristic pattern of line manuscript base to the greatest extent may be used Approach to energy；Therefore, perception loss can be defined as by simplifying the loss function of sketch, and formula is as follows：

Wherein, L_pecTo perceive loss function,And I_yIt indicates to simplify result figure line corresponding with its that network generates respectively Manuscript base, W_i,j, H_i,jAnd C_i,jThe width of characteristic pattern, height, number of channels, φ are indicated respectively_i,j(I) indicate characteristic pattern function, i and J indicates feature graph laplacian.

As a preferred technical solution, in the step 5), the specific method of model training is：

5.1) each layer parameter of initialization model

The initialization of each layer parameter is using commonly used method in depth convolutional neural networks, to feature extraction network Convolution layer parameter using the convolutional layer parameter value in the good VGG16 network models of ImageNet pre-training as initial value, grass Figure simplify network in convolutional layer, use mean value be 0, standard deviation use The Gaussian Profile being calculated carries out Initialization, wherein scale is result figure with respect to the multiple that artwork is amplified, fan_inIt is the picture number of input；

5.2) training network model

Stochastic inputs pass through the original image of step 2) processing, after the sketch simplification network generation by step 3.1) simplifies Result figure, result figure Jing Guo step 3.1) and the artificial marking line manuscript base of correspondence are inputted, by the feature extraction of step 3.2) Network obtains corresponding characteristic pattern, calculates corresponding perception penalty values and by step 4.1), then loses this perception Value can obtain the gradient of each layer parameter in step 3.1) sketch simplified network model by backpropagation, then pass through boarding steps The gradient that degree descent algorithm makes optimizes each layer parameter, you can realizes the training of a wheel network model；

5.3) step 5.2) is repeated until network reaches the set goal about the ability that sketch simplifies.

As a preferred technical solution, in the step 6), the specific method of model verification is：

It is concentrated at random from verify data and takes out some original images, after step 2) processing, be input to step 5) training Good network model allows the network model to go to simplify the sketch inputted, passes through result and the corresponding artificial marking line original text of output Figure is compared, to judge that the sketch of the trained network model simplifies ability.

Compared with prior art, the present invention having the following advantages that and advantageous effect：

1, the present invention proposes a simple depth convolutional neural networks and perception loss function (Perceptual Loss, with the good VGG feature extractions real-time performance of a pre-training), it solves the problems, such as that complicated sketch simplifies, reaches visual effect More preferably, the wider array of purpose of applicability,

2, the method that the present invention proposes an amplification data collection so that sketch simplifies shadow of the network for illumination and shade Ringing has stronger robustness, suitable for the sketch for scanning and arbitrarily shooting.

Description of the drawings

Fig. 1 is the method for the present invention flow diagram；

Fig. 2 (a), Fig. 2 (b) are sketch and corresponding artificial mark figure respectively；

Fig. 3 is the schematic diagram that sketch of the present invention simplifies network；

Fig. 4 is the schematic diagram that feature of present invention extracts network.

Specific implementation mode

Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.

Embodiment

As shown in Figure 1, the sketch of the depth convolutional neural networks based on perception loss described in the present embodiment simplifies method, Its specific practice is as follows：

1) sketch data set is obtained, its corresponding artificial mark is then obtained, i.e., is manually drawn out on the basis of sketch dry Net line original text, and it is divided into training dataset and validation data set.

The sketch of acquisition and corresponding artificial mark are as shown in Fig. 2 (a) and Fig. 2 (b)：

2) image of image data set and label data are converted to trained depth convolutional neural networks institute by pretreatment The format needed, includes the following steps：

2.1) image is changed into gray-scale map；

2.4) by treated, picture is cut to 384 × 384 pixel sizes, and corresponding line manuscript base is also cropped to corresponding big It is small；

3) network model is built, including feature extraction network, sketch simplify network, include the following steps：

3.1) construction sketch simplifies network.Sketch simplify network input be arbitrary size sketch (training when, we Inputted using fixed, be 1 × 384 × 384 after step 2) processing), it exports as after simplification identical with artwork size dimension Result figure.The network includes 3 concatenated structures (up-sampling layer, residual block and down-sampling layer).It is that a sketch simplifies below The specific example of network model, as shown in Figure 3；

3.2) construction feature extracts network.The image that the input of feature extraction network is 3 × 384 × 384 (need to be by sketch letter Change 1 × 384 × 384 results that network obtains to be extended), exporting a series of low-dimensional coding characteristic figures, (present invention uses It is the characteristic pattern of relu5-1, size is 512 × 24 × 24).The network includes multiple cascade down-sampling layers.Down-sampling layer by Concatenated convolutional layer and nonlinear activation function layer, pond layer composition, as shown in Figure 4.

4) loss function that sketch simplifies network is defined, is included the following steps：

4.1) loss function that sketch simplifies network is defined.Defining loss function makes the result figure of output as far as possible in vision Perceptually close to the line manuscript base manually marked, defining loss function with Perceptual Loss herein makes sketch simplify network It exports result figure to approach with the line manuscript base manually marked as far as possible in visual perception, formula is as follows：

Wherein, L_pecTo perceive loss function,And I_yIt indicates to simplify result figure line corresponding with its that network generates respectively Manuscript base, W_i,j,H_i,jAnd C_i,jThe width of characteristic pattern, height, number of channels, φ are indicated respectively_i,j(I) indicate characteristic pattern function, i and J indicates feature graph laplacian；

5) training network model, includes the following steps：

5.1) each layer parameter of initialization model

The initialization of each layer parameter is using commonly used method in depth convolutional neural networks, to feature extraction network Convolution layer parameter using the convolutional layer parameter value in the good VGG16 network models of ImageNet pre-training as initial value, grass Figure simplifies the parameter of the convolutional layer in network, and it is 0 to be all made of mean value, and standard deviation is used (scale is result figure The multiple of opposite artwork amplification, fan_inIt is the picture number of input) Gaussian Profile that is calculated initialized；

5.2) training network model

Stochastic inputs pass through the original image of step 2) processing, and the feature extraction network by step 3.1) obtains accordingly Low-dimensional coding characteristic simplifies network in the sketch by step 3.2) and generates the result figure after simplifying, and counted by step 4.1) Corresponding perception penalty values are calculated, this perception penalty values then can be obtained step 3) sketch by backpropagation simplifies network The gradient of each layer parameter in model, then the gradient made by stochastic gradient descent algorithm optimize each layer parameter, The training of a wheel network model can be realized；

6) model is verified, and is included the following steps：

The model trained using verify data set pair is verified, and is tested its lines and is simplified ability.At random from verification Some original images are taken out in data set, after step 2) processing, are input to the trained network model of step 5), are allowed the net Network model goes to simplify the sketch inputted, is compared with corresponding label data by the result of output, to judge the training The sketch of good network model simplifies ability.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications, Equivalent substitute mode is should be, is included within the scope of the present invention.

Claims

1. the sketch of the depth convolutional neural networks based on perception loss simplifies method, which is characterized in that include the following steps：

1) data acquisition；

Using rough draft figure and corresponding label data, training depth convolutional neural networks need training data, the label data It is exactly the correspondence line manuscript base manually drawn on the basis of rough draft figure, is then divided into training dataset and validation data set；

2) data processing；

The rough draft figure of image data set and corresponding line manuscript base are converted by pretreatment needed for trained depth convolutional neural networks to The format wanted；

3) model construction；

According to training objective and the input/output format of model, constructs one and be suitable for the depth convolution that sketch simplifies problem Neural network；

4) loss function is defined；

5) model training；

The parameter of each layer network is initialized, continuous iteration inputs training sample, the loss of network is calculated according to loss function It is worth, then calculates the gradient of each network layer parameter by backpropagation, by stochastic gradient descent method to the parameter of each layer network It is updated；

6) model is verified；

2. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 2), data processing is specially：

2.1) image is changed into gray-scale map；

2.2) horizontal, vertical, diagonal turnover is carried out to the image that data are concentrated, corresponding line manuscript base is also same overturning behaviour Make；

3. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 2.4), treated, and picture is cut to 384 × 384 pixel sizes.

4. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 3), model construction is specially：

3.1) sketch simplified network model is constructed；

The sketch simplified network model is responsible for the sketch of input to be simplified to line original text, and input is the grass of an arbitrary size Figure, output are the line manuscript bases after a same size size simplifies；Sketch simplified network model is by two concatenated encoder moulds Type and decoder model are constituted；The encoder model is for extracting high-rise semantic information and guarantor from the sketch of input It is stored in the coding of a low-dimensional；The decoder model is for after being disassembled in the coding of low-dimensional and restoring and be simplified Result figure；

3.2) construction feature extracts network model；

The feature extraction network model is used to extract high-rise semantic information from the image of input and in this high-rise language It is compared under adopted information, the input of feature extraction network is result figure and correspondence after the simplification that step 3.1) is handled Artificial marking line manuscript base need input picture being extended to 3 × 384 × 384, one before inputting pre-training VGG16 networks Under a high-layer semantic information, result figure and the corresponding artificial respective characteristic pattern of marking line manuscript base is taken to be compared.

5. the sketch of the depth convolutional neural networks according to claim 4 based on perception loss simplifies method, feature exists In in the step 3.1), the encoder model includes multiple cascade down-sampling layers, and the down-sampling layer is by concatenated volume Lamination, batch regularization layer and nonlinear activation layer composition, have been gradually reduced the ruler of image while increasing receptive field It is very little, make Web-based Self-regulated Learning down-sampling using the convolutional layer with step-length, batch regularization layer is by normalizing the same batch The mean value and standard deviation of input sample play the role of stablizing and acceleration model training, the addition of nonlinear activation layer prevent Model degradation is simple linear model, improves the descriptive power of model；

The decoder model includes multiple cascade up-sampling layers, the up-sampling layer by concatenated warp lamination, batch just Then change layer and nonlinear activation layer composition, plays the role of encoding enlarged drawing size, the deconvolution with step-length from low-dimensional Layer makes Web-based Self-regulated Learning up-sample, same in the effect of batch regularization layer and nonlinear activation layer and encoder, while I Also introduce residual block to accelerate training speed.

6. the sketch of the depth convolutional neural networks according to claim 4 based on perception loss simplifies method, feature exists In, in the step 3.2),

The feature extraction network model include multiple cascade down-sampling layers, the down-sampling layer by concatenated convolutional layer, with And nonlinear activation function layer, pond layer composition, wherein convolutional layer step-length is 1, and convolution kernel size is 3 × 3, is extracted corresponding Characteristic pattern, it is simple linear model that the addition of nonlinear activation layer, which prevents model degradation, improves the descriptive power of model, Chi Hua The effect of layer is to reduce the size of characteristic pattern, to increase the receptive field of convolution kernel.

7. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 4), defining loss function is specially：

The computational methods of the function are by the result figure after sketch simplifies network reduction and its corresponding prior people Input feature vector extraction network takes result figure and line manuscript base to the line manuscript base that work has marked under a high-rise semantic information together Corresponding characteristic pattern is compared, define loss function make result figure characteristic pattern and line manuscript base characteristic pattern as much as possible It is close；Therefore, perception loss can be defined as by simplifying the loss function of sketch, and formula is as follows：

Wherein, L_pecTo perceive loss function,And I_yIt indicates to simplify result figure line manuscript base corresponding with its that network generates respectively, W_i,j, H_i,jAnd C_i,jThe width of characteristic pattern, height, number of channels, φ are indicated respectively_i,j(I) indicate that characteristic pattern function, i and j indicate Feature graph laplacian.

8. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 5), the specific method of model training is：

5.1) each layer parameter of initialization model

The initialization of each layer parameter is using commonly used method in depth convolutional neural networks, to the volume of feature extraction network Lamination parameter is using the convolutional layer parameter value in the good VGG16 network models of ImageNet pre-training as initial value, sketch letter Change network in convolutional layer, use mean value be 0, standard deviation use The Gaussian Profile being calculated carries out initial Change, wherein scale is result figure with respect to the multiple that artwork is amplified, fan_inIt is the picture number of input；

5.2) training network model

Stochastic inputs pass through the original image of step 2) processing, and the sketch by step 3.1) simplifies network and generates the knot after simplifying Fruit is schemed, and the result figure Jing Guo step 3.1) and corresponding artificial marking line manuscript base, the feature extraction network by step 3.2) are inputted Corresponding characteristic pattern is obtained, corresponding perception penalty values is calculated and by step 4.1), then leads to this perception penalty values The gradient for each layer parameter that backpropagation can obtain in step 3.1) sketch simplified network model is crossed, then by under stochastic gradient The gradient that drop algorithm makes optimizes each layer parameter, you can realizes the training of a wheel network model；

9. the sketch of the depth convolutional neural networks according to claim 1 based on perception loss simplifies method, feature exists In in the step 6), the specific method of model verification is：

It is concentrated at random from verify data and takes out some original images, after step 2) processing, it is trained to be input to step 5) Network model allows the network model to go to simplify the sketch inputted, by the result of output and corresponding artificial marking line manuscript base into Row compares, to judge that the sketch of the trained network model simplifies ability.