CN106570911B

CN106570911B - Method for synthesizing facial cartoon based on daisy descriptor

Info

Publication number: CN106570911B
Application number: CN201610753192.2A
Authority: CN
Inventors: 盛斌
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-08-29
Filing date: 2016-08-29
Publication date: 2020-04-10
Anticipated expiration: 2036-08-29
Also published as: CN106570911A

Abstract

The invention relates to a method for synthesizing facial cartoon based on daisy descriptor, which comprises the following steps: 1) acquiring the gray value of each pixel point in the face image to be synthesized; 2) establishing a day descriptor for each pixel point in the face image to be synthesized, extracting image block characteristics, and establishing a day descriptor for each pixel point of each face cartoon image in a training set; 3) adopting a patchmatch algorithm in pixel points of the face cartoon image in the training set to obtain K candidate pixel positions most similar to each pixel point of the face image to be synthesized; 4) obtaining K candidate values according to the corresponding displacement vectors, and giving weights to the K candidate values; 5) and acquiring a weighted value by adopting a conjugate gradient resolver, and synthesizing the cartoon image of the face to be synthesized by an RGB value of the cartoon image in the training set by adopting an SSD noise reduction method according to the weighted value. Compared with the prior art, the method has the advantages of high similarity, accurate synthesis and the like.

Description

Method for synthesizing facial cartoon based on daisy descriptor

Technical Field

The invention relates to the technical field of image processing and analysis, in particular to a method for synthesizing facial cartoon based on a day descriptor.

Background

The synthesis of facial cartoons has found widespread use in digital entertainment, and a great deal of research and commercial products have been devoted to the synthesis of facial cartoons. Although the styles are different, the generation of a high-quality and high-similarity facial cartoon image is pursued by all the works.

Currently, the synthesis of face sketch images has achieved good effect, and there are basically two methods for general sketch synthesis: image-based methods and example-based methods, image-based sketch synthesis methods generally cannot capture important facial details, while example-based methods reconstruct new sketch images from existing sketches, but require a lot of examples and have poor noise reduction effects, and the synthesized images are not accurate.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a method for synthesizing facial cartoon based on daisy descriptor, which has high similarity and accurate synthesis.

The purpose of the invention can be realized by the following technical scheme:

a method for synthesizing facial cartoon based on daisy descriptor includes the following steps:

1) acquiring the gray value of each pixel point in the face image to be synthesized, and storing the gray value in a two-dimensional matrix;

2) establishing a day descriptor for each pixel point in the face image to be synthesized, extracting image block characteristics, and establishing a day descriptor for each pixel point of each face cartoon image in a training set;

3) adopting a patchmatch algorithm in pixel points of the face cartoon image in the training set to obtain K candidate pixel positions most similar to each pixel point of the face image to be synthesized, and obtaining corresponding displacement vectors;

4) obtaining K candidate values according to the corresponding displacement vectors, and giving weights to the K candidate values to enable the K candidate values to be linearly combined into an input image block;

5) and acquiring a weighted value by adopting a conjugate gradient resolver, and synthesizing the cartoon image of the face to be synthesized by an RGB value of the cartoon image in the training set by adopting an SSD noise reduction method according to the weighted value.

The face image to be synthesized is equal to each face cartoon image in the training set in size, and the resolution ratio is the same.

In the step 2), the construction method of the daisy descriptor comprises the following steps:

21) selecting parameters of a daisy descriptor, including the farthest radius from a central pixel point, the number of convolution layers in each direction, the number of gradient directions in each layer and the number of straight bars of a gradient histogram;

22) calculating a plurality of orientation maps of the face image, and acquiring a plurality of corresponding convolution orientation maps by adopting a plurality of times of Gaussian kernel convolution;

23) synthesizing a plurality of convolution square maps into a vector h_Σ(u, v), and obtain the daisy descriptor.

In the step 5), the calculation formula for obtaining the weighted value is as follows:

wherein,

is a weighted value, T_pIs a vector containing the pixel values of the face image to be synthesized,

is a vector that includes K candidate picture pixel values.

In said step 5), the resultant value at the pixel position P

Calculated by the following formula:

wherein, | Ψ_pL is the image block Ψ_pThe number of pixels in (a) is,

and estimating a pixel point p for the pixel point q.

Compared with the prior art, the invention has the following advantages:

firstly, the similarity is high, and the synthesis is accurate: according to the method, the daisy descriptor is used for selecting the pixel point with the highest similarity with the picture to be synthesized, the image block is synthesized in a weighting mode, and then noise reduction synthesis is carried out through an SSD algorithm, so that the accuracy of the synthesized image is high.

Drawings

FIG. 1 is a flow chart of the method of operation of the present innovation.

Fig. 2 shows the original and the synthesized cartoon image of the embodiment, wherein the left image is the original and the right image is the synthesized cartoon image.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Example (b):

as shown in fig. 1, the present embodiment includes the following steps:

first, an input image is loaded and its grayscale matrix is saved. In this example, the size of both our training set picture and the input picture are 500 x 360 pixels. In our training set, there are a total of 68 pairs of photo-cartoons.

And secondly, creating a Daisy descriptor, and calculating K-NN by using a patch match for each pixel point of the input image to obtain K candidate pixels and obtain corresponding displacement vectors. In this example, the value of K is 5, and the parameters of the Daisy descriptor are chosen as follows: the radius R farthest from the center pixel point is taken as 15, the number Q of convolution layers in each direction is taken as 3, the number T of gradient directions in each layer is taken as 8, and the number H of straight bars in a gradient histogram is taken as 8. The adopted parameter values are a group of values which are obtained after a plurality of experiments and have better average effect and average power consumption.

For an input image, we first calculate H orientation maps G by the formula

Where I denotes the input image, o denotes the calculated orientation (a total of 8 orientations), (a)⁺Denotes max (a, 0). Then convolving them with Gaussian kernel for several times to obtain convolution square map

Then, the convolution direction maps of all directions are combined into a vector as follows:

where (u, v) represents the location of the pixel. Thus descriptor D (u) of the entire Daisy₀,v₀) It is obtained.

Where Ij (u, v, R) represents a position in the j direction at a distance R from (u, v).

And then, for the Daisy local feature descriptors of the input pictures, performing batch matching on the Daisy local feature descriptors of the training set pictures one by one to obtain pixel positions with high similarity in each training set picture, selecting the most similar K values from the pixel positions, and storing the displacements in a displacement vector to record K candidate pixel positions.

And thirdly, obtaining K candidate values based on the calculated displacement vector, giving weights to the K candidate values to enable the K candidate values to be linearly combined into an input image block, and then calculating to obtain a weighted value by using a conjugate gradient solver. We want to solve the following linear equation to obtain the weighted values, namely

Where Tp represents a vector including input picture pixel values,

Figure 309063DEST_PATH_GDA0001234716250000023

representing a vector containing K candidate picture pixel values.

We can solve this equation set to the desired weight value very efficiently using the conjugate gradient method.

And fourthly, synthesizing the target cartoon image by using the RGB value of the cartoon image in the training set by using an SSD noise reduction method according to the calculated coefficient. The resultant value at pixel position P

Calculated by the following formula

Where | Ψ_pI denotes the image block Ψ_pThe number of pixels in (1).

The estimation of the pixel point p by the pixel point q is shown, and the average of the estimation is the final result of the estimation. And the calculation of these estimates requiresAnd weighting the pixel values of the K candidate pictures at the pixel point q by using the weighting value finally obtained in the last step.

Effects of the implementation

According to the steps, the tested pictures are tested, and finally the cartoon image with good effect is synthesized. The result shows that the algorithm can achieve certain effect when the face photos are synthesized. The comparison of the test results is shown in fig. 2. It can be seen that the resultant map obtained according to our algorithm is still more similar to the original as a whole, but the resultant in some detail is also coarser. We believe that better results are expected from further attempts to modify some of the parameters, and that, because of the relatively small size of the data set and the single style, we consider trying a larger and more stylish cartoon image data set and expect a good improvement in the results.

Claims

1. A method for synthesizing facial cartoon based on daisy descriptor is characterized by comprising the following steps:

2) the method comprises the following steps of establishing a day descriptor for each pixel point in a face image to be synthesized, extracting image block features, and establishing the day descriptor for each pixel point of each face cartoon image in a training set, wherein the method for establishing the day descriptor comprises the following steps:

23) synthesizing a plurality of convolution square maps into a vector h_Σ(u, v) thereofAnd (u, v) represents the position of the pixel point, and obtains a Daisy descriptor, wherein the parameters of the Daisy descriptor are selected as follows:

the radius R farthest from the central pixel point is taken as 15, the number Q of the convolution layers in each direction is taken as 3, the number T of the gradient directions of each layer is taken as 8, and the number H of straight bars of a gradient histogram is taken as 8;

for an input image, H direction maps G are calculated firstly, and the calculation formula is

Wherein I represents the input image, o represents the calculated orientation, a total of 8 orientations, (a)⁺Denotes max (a, 0);

2. The method as claimed in claim 1, wherein the facial cartoon image to be synthesized has the same size and resolution as each facial cartoon image in the training set.

3. The method for synthesizing facial cartoon based on daisy descriptor as claimed in claim 1, wherein in step 5), the calculation formula for obtaining the weighting value is:

wherein,

for a vector containing K candidate picture pixel values, i ∈ [1, K]Representing the ith candidate picture and p representing a pixel point.

4. The method as claimed in claim 1, wherein the step 5) is performed by using a composite value of the pixel position P

Calculated by the following formula:

wherein, | Ψ_pL is the image block Ψ_pThe number of pixels in (a) is,

and estimating a pixel point p for the pixel point q.