CN110023989A

CN110023989A - A kind of generation method and device of sketch image

Info

Publication number: CN110023989A
Application number: CN201780073000.6A
Authority: CN
Inventors: 谭文伟; 林倞; 张冬雨
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-03-29
Filing date: 2017-03-29
Publication date: 2019-07-16
Anticipated expiration: 2037-03-29
Also published as: CN110023989B; WO2018176281A1

Abstract

A kind of generation method and device of sketch image, generalization ability poor and sketch image generate slow-footed problem lower to solve human face sketch image Auto accuracy existing in the prior art.The described method includes: obtaining facial image to be processed；By P convolutional layer of first network branch in depth convolutional neural networks model trained in advance, the facial sketch feature in the facial image is obtained, face structure sketch map is obtained；By P convolutional layer of the second network branches in the depth convolutional neural networks model, the hair sketch feature in the facial image is obtained, hair texture sketch map is obtained；It synthesizes the face structure sketch map and the hair texture sketch map to obtain the sketch image of the facial image.

Description

Sketch image generation method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a sketch image.

Background

The automatic generation of a sketch image of a human face (sketch automatic generation) refers to a process of automatically generating a human face image with a sketch style from an input human face image.

The automatic generation technology of the human face sketch image has important application in many fields. For example, in the public security field, a sketch image generated based on an identity card photo of a criminal suspect may be compared with a sketch image drawn according to the description of a witness, so as to assist a public security organization in determining the identity of the criminal suspect; in the fields of animation industry and social network, the method is mainly used for rendering the character photos in a sketch style.

The current automatic generation technology of face sketch images is mainly based on a synthesis method, namely, a complete sketch image is synthesized by using a part similar to an input image in a sample image.

Specifically, a database including a plurality of sample image blocks and a sketch image block corresponding to each sample image block is first established, wherein each sample image block includes different feature information related to a human face, such as feature information of five sense organs, human face ornaments, hair, beard and the like. Secondly, dividing the input image into a plurality of image blocks, searching a similar sample image block in the database aiming at each image block, obtaining a sketch image block corresponding to the similar sample image block of each image block, and combining all the obtained sketch image blocks into a sketch image. And then removing edges between adjacent sub blocks in the synthesized sketch image through a multi-scale Markov Random Field (MRF) algorithm model to obtain a relatively natural sketch image.

However, the automatic generation technology of the sketch image based on the synthesized face makes the synthesized sketch image tend to be smooth through the MRF algorithm model, so that some detail features such as moles and scars on the face in the synthesized sketch image are smoothed, and the synthesized sketch image cannot well retain the texture detail information in the face photo. Moreover, a sample database is usually required to be established based on the automatic generation technology of the synthesized face sketch image, and the feature information included in the sample data in the sample database is related to the face, but the number of the sample data in the established sample database is limited and cannot cover enough sample data, so that when the face image has no elements included in the sample data, the sketch image cannot be accurately generated based on the automatic generation technology of the synthesized face sketch image, and therefore, the automatic generation technology of the synthesized face sketch image has low accuracy and poor generalization capability. Moreover, when the synthetic face sketch image automatic generation technology generates a sketch image from an original image, original data and all sample image blocks are searched and compared, and all acquired sketch image blocks are synthesized, so that the speed of generating the sketch image is low due to large workload.

Disclosure of Invention

The embodiment of the application provides a sketch image generation method and device, which are used for solving the problems of low accuracy, poor generalization capability and low sketch image generation speed of a face sketch image automatic generation technology in the prior art.

In a first aspect, an embodiment of the present application provides a method for generating a sketch image, where the method may be applied to an electronic device, and the method includes:

after the electronic equipment acquires a face image to be processed, acquiring facial sketch features in the face image through P convolutional layers of a first network branch in a pre-trained deep convolutional neural network model to obtain a facial structure sketch, and acquiring hair sketch features in the face image through P convolutional layers of a second network branch in the deep convolutional neural network model to obtain a hair texture sketch, wherein P is an integer larger than 0.

And then synthesizing the facial structure sketch map and the hair texture sketch map to obtain a sketch image of the face image.

In the embodiment of the application, based on a deep convolutional neural network, by designing a structure including a first network branch for generating human face features and a second network branch for generating hair features, effective feature expressions are learned from a large number of training samples, a network model capable of generating accurate and natural human face sketch images from original images is trained, and automatic generation of the human face sketch images is realized. Compared with the technology for automatically generating the face sketch image based on synthesis in the prior art, the technology for generating the face sketch image based on the deep convolutional neural network does not depend on a sample database, but generates a structural sketch image comprising face features through a first network branch in the deep convolutional neural network, generates a structural sketch image comprising hair features through a second network branch in the deep convolutional neural network, and then synthesizes the structural sketch image and the texture sketch image to obtain a final face sketch image.

Optionally, each convolutional layer in the deep convolutional neural network model uses a modified Linear unit (called ReLU for short) as an activation function. The convolution kernel size model used by each convolution layer in the deep convolution neural network model is an r x r model.

In one possible design, the first N convolutional layers of the first network branch are the same as or coincide with the first N convolutional layers of the second network branch, where N is an integer greater than 0 and smaller than P.

Specifically, the first N convolutional layers of the first network branch are the same as the first N convolutional layers of the second network branch, or the first N convolutional layers of the first network branch and the first N convolutional layers of the second network branch share the first N convolutional layers in the deep convolutional neural network model.

In the embodiment of the application, the first N convolutional layers of the first network branch are the same as or coincide with the first N convolutional layers of the second network branch, so that the calculation efficiency of the deep convolutional neural network model is improved.

In one possible design, the obtaining facial sketch features in the face image through P convolutional layers of a first network branch in the deep convolutional neural network model includes:

filtering background features in the face image through the first N convolutional layers of a first network branch in the deep convolutional neural network model to obtain a face feature image;

and acquiring the face sketch features in the face feature map through the last M convolutional layers of the first network branch.

The acquiring of hair sketch features in the face image through the P convolutional layers of the second network branch in the deep convolutional neural network model includes:

filtering the background features in the face image through the first N convolutional layers of the second network branch in the deep convolutional neural network model to obtain a face feature map;

acquiring hair sketch features in the face feature map through the last M convolutional layers of the second network branch;

wherein P ═ M + N.

In the design, the first N convolutional layers of the first network branch are used for filtering background features in a human face image to be processed, and the last M convolutional layers are used for acquiring a face structure sketch; the first N convolution layers of the second network branch are used for filtering background features in a face image to be processed, and the last M convolution layers are used for acquiring a hair texture sketch pattern, so that the accuracy of a face sketch image generation technology is improved, and the speed of generating the face sketch image is increased.

In one possible design, the convolution kernel sizes of the last M convolutional layers of the first network branch are correspondingly equal to the convolution kernel sizes of the last M convolutional layers of the second network branch.

In the design, the sizes of convolution kernels of the last M convolution layers of the first network branch are correspondingly equal to the sizes of convolution kernels of the last M convolution layers of the second network branch, so that the accuracy of the face sketch image generation technology is improved.

In one possible design, where M is 2, the convolution kernel sizes of the last two convolutional layers of the first network branch are equal, and the convolution kernel sizes of the last two convolutional layers of the second network branch are equal.

In one possible design, where N is 4, the filtering the background features in the face image through the first N convolutional layers of the first network branch in the deep convolutional neural network model includes:

and filtering the background features of the horizontal direction and the vertical direction of the face image through a first convolution layer and a second convolution layer in the first N convolution layers of the first network branch in the deep convolutional neural network model.

And smoothing the face image with the background features filtered in the horizontal direction and the vertical direction through a third convolution layer and a fourth convolution layer in the first N convolution layers of the first network branch in the deep convolutional neural network model.

In the above design, the first convolution layer and the second convolution layer of the first N convolution layers of the first network branch are used for filtering the background features in the horizontal direction and the vertical direction of the face image to be processed, and the third convolution layer and the fourth convolution layer are used for performing smoothing processing in the horizontal direction and the vertical direction with respect to the face image with the filtered background features, so that the accuracy of the face sketch image generation technology is improved, and the generated sketch image is more natural.

In one possible design, the convolution kernel size of the first convolution layer is equal to the convolution kernel size of the second convolution layer, and the convolution kernel size of the third convolution layer is the same as the convolution kernel size of the fourth convolution layer.

In the design, the convolution kernel size of the first convolution layer is equal to that of the second convolution layer, and the convolution kernel size of the third convolution layer is equal to that of the fourth convolution layer, so that the accuracy of the face sketch image generation technology is improved.

In one possible design, the method further includes:

acquiring hair probability that each pixel point in the face image is a hair feature point;

the sketch image of the face image is obtained by synthesizing the face structure sketch and the hair texture sketch, and meets the following formula requirements:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

wherein, the S_(i,j)Is the pixel value, P, of the pixel point of the ith row and the jth column in the sketch image of the face image_h(i,j)Is the hair probability of the pixel point of the ith row and the jth column in the sketch image of the face image, S_S(i,j)The pixel value of the pixel point of the ith row and the jth column in the face structure sketch image is S_t(i,j)And the pixel values of the pixel points in the ith row and the jth column in the hair texture sketch map are represented, wherein the i and the j are integers which are larger than 0.

In the design, the face structure sketch and the hair texture sketch are synthesized based on the hair probability to obtain the sketch image of the face image, so that the synthesized sketch image can well retain face structure information and hair texture information.

In one possible design, the deep convolutional neural network model is trained by:

inputting a plurality of face sample images in a training sample database into an initialized deep convolutional neural network model for training; the training sample database comprises a plurality of face sample images and sketch sample images corresponding to the face sample images, and the initialized depth convolution neural network model comprises weight and bias;

in the K training process, filtering the background features in the face sample image through the first N convolutional layers of the depth convolutional neural network model which is adjusted for K-1 times to obtain a face feature map of the face sample image, wherein K is an integer greater than 0;

acquiring facial sketch features in a facial feature map of the facial sample image through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model to obtain a facial structure sketch of the facial sample image;

obtaining hair sketch features in a face feature image of the face sample image through the last M convolution layers of the second network branch of the K-1-time adjusted deep convolution neural network model to obtain a hair texture sketch image of the face sample image;

synthesizing a face structure sketch of the face sample image and a hair texture sketch of the face sample image to obtain a sketch image of the face sample image;

after the K training, obtaining an error value between a sketch image of the face sample image and a sketch sample image corresponding to the face sample image;

and adjusting the weight and the bias used in the K +1 training process based on the error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.

In the design, the deep convolutional neural network model is trained by adopting a large number of face sample images, so that the sketch image of the face image can be generated directly through the trained deep convolutional neural network model without depending on a sample database when the face image to be processed is generated into the sketch image, the accuracy and the generalization capability of the face sketch image generation technology are improved, the workload in the face sketch image generation process is reduced, and the speed of generating the face sketch image is increased.

In one possible design, in the K-th training process, filtering the background features in the face sample image through the first N convolutional layers of the K-1-time adjusted deep convolutional neural network model, including:

adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image to obtain a face enhancement image;

wherein, the pixel value of any pixel point in the sketch average map is: the average value of pixel values of pixel points which are in the same position with any pixel point in all sketch sample images in the training sample database;

and filtering the background features in the face enhancement image through the first N convolution layers of the depth convolution neural network model which is adjusted for K-1 times.

In the design, the face enhancement image is obtained by adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image, so that the face characteristic information and the hair characteristic information of the face sample image are enhanced, and the accuracy of the face sketch image generation technology is improved.

In one possible design, acquiring the face sketch features in the face feature map of the face sample image through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model includes:

dividing the face feature map of the face sample image into a plurality of mutually overlapped image blocks, and acquiring an image block comprising face feature information from the plurality of mutually overlapped image blocks;

for each image block comprising facial feature information, determining a target area corresponding to each image block comprising facial feature information in a facial feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target area and each image block comprising facial feature information to obtain a face enhancement feature map;

and for each face enhancement feature map, acquiring face sketch features in the face enhancement feature map through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model.

In the design, the facial feature information of the face sample image is strengthened by adding the pixel values of the pixel points at the same position of the image block in the image block comprising the facial feature information and the corresponding target region in the face feature image, so that the synthesized sketch image can well retain the facial structure information.

In one possible design, acquiring hair sketch features in a face feature map of the face sample image through the last M convolutional layers of the second network branch of the K-1-time adjusted deep convolutional neural network model includes:

dividing the face feature map of the face sample image into a plurality of mutually overlapped image blocks, and acquiring an image block comprising hair feature information from the plurality of mutually overlapped image blocks;

adding the pixel values of the pixel points at the same position in the human face sample image and the image block containing the hair characteristic information aiming at each image block containing the hair characteristic information to obtain a hair reinforced characteristic image;

and for each hair strengthening characteristic graph, obtaining hair sketch characteristics in the hair strengthening characteristic graph through the last M convolutional layers of the second network branch of the K-1 times adjusted deep convolutional neural network model.

In the design, the hair characteristic information of the face sample image is strengthened by adding the pixel values of the pixel points at the same position of the image block in the target area corresponding to the image block comprising the hair characteristic information in the face characteristic image, so that the hair texture information of the synthesized sketch image can be well kept.

In one possible design, the obtaining an image block including facial feature information from the plurality of overlapping image blocks includes:

determining the face probability that each pixel point in each image block is a face feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the face probability not being 0 is determined to be larger than a preset threshold, determining each image block as an image block comprising face feature information.

In the design, the face probability that each pixel point in each image block is a face feature point is determined, and then when the number of the pixel points of which the face probability is not 0 is determined to be greater than a preset threshold, each image block is determined to be an image block including face feature information, so that the accuracy of obtaining the image block including the face feature information is improved.

In one possible design, the obtaining an image block including hair feature information from the plurality of overlapping image blocks includes:

determining the hair probability that each pixel point in each image block is a hair feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the hair probability not being 0 is determined to be larger than a preset threshold, determining that each image block is an image block comprising hair characteristic information.

In the design, the face probability that each pixel point in each image block is a hair feature point is determined, and then when the number of the pixel points of which the hair probability is not 0 is determined to be larger than a preset threshold value, each image block is determined to be an image block including hair feature information, so that the accuracy of acquiring the image block including the hair feature information is improved.

In a second aspect, an embodiment of the present application provides an apparatus for generating a sketch image, including:

the acquisition module is used for acquiring a face image to be processed;

the deep convolutional neural network model is used for acquiring a face structure sketch map and a hair texture sketch map in the face image acquired by the acquisition module; the deep convolutional neural network model is pre-trained and comprises a first network branch module and a second network branch module;

the first network branch module is used for acquiring the face sketch features in the face image acquired by the acquisition module to obtain a face structure sketch; the first network branch module comprises P convolutional layers, wherein P is an integer greater than 0;

the second network branch module is used for acquiring hair sketch features in the face image acquired by the acquisition module to obtain a hair texture sketch; the second network branch module comprises P convolutional layers;

and the synthesis module is used for synthesizing the facial structure sketch obtained by the first network branch module and the hair texture sketch obtained by the second network branch module to obtain a sketch image of the face image.

In one possible design, the first N convolutional layers of the P convolutional layers included in the first network branch module are the same as or coincide with the first N convolutional layers of the P convolutional layers included in the second network branch module, where N is an integer greater than 0 and smaller than P.

In one possible design, the first network branching module is specifically configured to:

filtering background features in the face image through the first N convolutional layers of the first network branch module to obtain a face feature map;

acquiring facial sketch features in the face feature map through the last M convolution layers of the first network branch module;

the second network branch module is specifically configured to:

wherein P ═ M + N.

In one possible design, the convolution kernel sizes of the last M convolutional layers of the first network branching module are correspondingly equal to the convolution kernel sizes of the last M convolutional layers of the second network branching module.

In a possible design, where N is 4, the first network branching module, when filtering the background features in the face image through the first N convolutional layers of the first network branching module, is specifically configured to:

filtering the background features of the face image in the horizontal direction and the vertical direction through a first convolution layer and a second convolution layer of the first N convolution layers of the first network branch module;

and smoothing the face image with the filtered background features in the horizontal direction and the vertical direction through a third convolution layer and a fourth convolution layer in the first N convolution layers of the first network branch module.

In one possible design, the obtaining module is further configured to obtain a hair probability that each pixel point in the face image is a hair feature point;

the synthesis module is specifically configured to:

synthesizing the facial structure sketch obtained by the first network branch module and the hair texture sketch obtained by the second network branch module to obtain a sketch image of the face image, wherein the sketch image meets the requirements of the following formula:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

In one possible design, the apparatus further includes:

the training module is used for obtaining the deep convolutional neural network model through training in the following mode:

acquiring facial sketch features in a facial feature map of the facial sample image through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model to obtain a facial structure sketch map of the facial sample image;

acquiring hair sketch features in a face feature map of the face sample image through the last M convolution layers of the second network branch module of the depth convolution neural network model which is adjusted for K-1 times to obtain a hair texture sketch map of the face sample image;

In a possible design, when the training module filters the background features in the face sample image through the first N convolutional layers of the K-1-time adjusted deep convolutional neural network model in the K-th training process, the training module is specifically configured to:

In a possible design, the obtaining module is further configured to divide the face sample image into a plurality of mutually overlapped image blocks, and obtain an image block including facial feature information from the plurality of mutually overlapped image blocks;

the training module is specifically configured to, when obtaining the face sketch features in the face feature map of the face sample image through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model:

for each image block which is acquired by the acquisition module and comprises facial feature information, determining a target area corresponding to each image block comprising the facial feature information in a face feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target area and each image block comprising the facial feature information to obtain a face enhancement feature map;

and for each face enhancement feature map, acquiring face sketch features in the face enhancement feature map through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model.

In a possible design, the obtaining module is further configured to divide the face sample image into a plurality of mutually overlapped image blocks, and obtain an image block including hair feature information from the plurality of mutually overlapped image blocks;

the training module is specifically configured to, when obtaining hair sketch features in the face feature map of the face sample image through the last M convolutional layers of the second network branch module of the K-1-time adjusted deep convolutional neural network model:

adding the pixel values of the pixel points at the same position in the face sample image and the image blocks including the hair characteristic information to obtain a hair reinforced characteristic image aiming at each image block including the hair characteristic information acquired by the acquisition module;

and for each hair strengthening characteristic graph, obtaining hair sketch characteristics in the hair strengthening characteristic graph through the last M convolutional layers of the second network branch module of the K-1-time adjusted deep convolutional neural network model.

In a possible design, when an image block including facial feature information is obtained from the plurality of mutually overlapped image blocks, the obtaining module is specifically configured to:

In a possible design, when the image block including hair feature information is obtained from the plurality of mutually overlapped image blocks, the obtaining module is specifically configured to:

In a third aspect, an embodiment of the present invention further provides a deep convolutional neural network model, where the model includes a first network branch module and a second network branch module;

the first network branch module comprises P convolution layers and is used for acquiring the facial sketch features in the face image acquired by the acquisition module to obtain a facial structure sketch; wherein, the P is an integer larger than 0.

The second network branch module comprises P convolution layers and is used for obtaining hair sketch characteristics in the face image obtained by the obtaining module to obtain a hair texture sketch.

In a fourth aspect, an embodiment of the present application further provides a terminal, where the terminal includes a processor and a memory, where the memory is used to store a software program, and the processor is used to read the software program stored in the memory and implement the method provided in the first aspect or any one of the designs of the first aspect. The electronic device may be a mobile terminal, a computer, etc.

In a fifth aspect, this embodiment of the present application further provides a computer storage medium, where a software program is stored, and the software program can implement the method provided by the first aspect or any one of the designs of the first aspect when being read and executed by one or more processors.

Drawings

Fig. 1 is a schematic flowchart of a method for generating a sketch image according to an embodiment of the present disclosure;

fig. 2A is a schematic structural diagram of a first deep convolutional neural network model according to an embodiment of the present disclosure;

fig. 2B is a schematic structural diagram of another first deep convolutional neural network model according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a method for filtering background features in a face image according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a second deep convolutional neural network model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a process for generating a sketch image according to an embodiment of the present application;

fig. 6A is four face images to be processed according to an embodiment of the present application;

fig. 6B is an effect diagram of generating a sketch image from four face images to be processed according to the embodiment of the present application;

FIG. 7 is a schematic structural diagram of a first deep convolutional neural network model provided in an embodiment of the present application;

FIG. 8 is a schematic flowchart of a first deep convolutional neural network model training process according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of an image block adding method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus for generating a sketch image according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a deep convolutional neural network model according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a terminal implementation manner provided in an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application provides a sketch image generation method and device, which are used for solving the problems of low accuracy, poor generalization capability and low sketch image generation speed of a face sketch image automatic generation technology in the prior art. The method and the device are based on the same inventive concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

The embodiment of the application can be applied to electronic equipment, such as a computer, a tablet computer, a notebook, a smart phone, a server and the like.

The application fields of the embodiments of the present application include, but are not limited to: a face image field, a vehicle image field, a plant image field, or other types of image fields.

Correspondingly, the embodiment of the application is applied to the field of face images, and when the face sketch image is generated, a plurality of face sample images are adopted in advance for training; the method is applied to the field of vehicle images, and when the vehicle sketch images are generated, a plurality of vehicle sample images are adopted in advance for training; the method is applied to the field of plant images, and when the plant sketch images are generated, a plurality of plant sample images are adopted in advance for training; the method is applied to the field of other types of images, and when other types of sketch images are generated, a plurality of other types of sample images are adopted in advance for training.

The embodiment of the application can be used for generating a sketch image and can also be used for generating a gray level image.

Correspondingly, the embodiment of the application is applied to the field of face images, and when the face gray level images are generated, a plurality of face sketch sample images are adopted in advance for training; the method is applied to the field of vehicle images, and when a vehicle gray level image is generated, a plurality of vehicle sketch sample images are adopted in advance for training; the method is applied to the field of plant images, and when the plant gray level images are generated, a plurality of plant sketch sample images are adopted in advance for training; the method is applied to the field of other types of images, and when other types of gray level images are generated, a plurality of sketch sample images of other types are adopted in advance for training.

In order that the embodiments of the present application may be more readily understood, some of the descriptions referred to in the embodiments of the present application are first set forth below in order to provide an understanding to those skilled in the art, and such descriptions should not be taken as limiting the scope of the claimed application.

A convolutional neural network is a multi-layered neural network, each layer consisting of a plurality of two-dimensional planes, and each plane consisting of a plurality of individual neurons. In the present embodiment, a neuron can be regarded as one pixel.

Several, refers to two or more.

In addition, it is to be understood that the terms first, second, etc. in the description of the present application are used for distinguishing between the descriptions and not necessarily for describing a sequential or chronological order.

Referring to fig. 1, a flowchart of a method for generating a sketch image according to an embodiment of the present application is shown, where the method is executed by an electronic device, and specifically includes the following steps:

and step S101, acquiring a face image to be processed.

It should be noted that, in step S101, the manner of acquiring the face image to be processed includes, but is not limited to: the method comprises the steps of collecting a face image to be processed through sensing equipment, obtaining the face image to be processed in a database and the like.

The sensing devices include, but are not limited to: light sensing equipment, camera equipment, collection equipment and the like.

The database includes, but is not limited to: a local database, a cloud database, a U disk, a hard disk and the like.

Step S102, obtaining face sketch characteristics in the face image through P convolutional layers of a first network branch in a pre-trained first deep convolutional neural network model, and obtaining a face structure sketch, wherein P is an integer larger than 0.

Step S103, acquiring hair sketch characteristics in the face image through P convolution layers of a second network branch in the first deep convolution neural network model to obtain a hair texture sketch.

And step S104, synthesizing the facial structure sketch and the hair texture sketch to obtain a sketch image of the face image.

It should be noted that, the sequence of step S102 and step S103 is not strict, step S102 may be executed first and then step S103 is executed, step S102 and step S103 may also be executed simultaneously, and the embodiment of the present application is not limited specifically herein.

In this embodiment of the application, the first deep convolutional neural network model may further include an input layer before the P convolutional layers of the first network branch and the P convolutional layers of the second network branch, and the number of filter channels of the input layer is 3. After the electronic equipment acquires the face image to be processed, the face image to be processed is processed through the input layer to obtain 3 images, wherein the 3 images are respectively an image comprising a red (R) element, a green (G) element and a blue (B) element. Then, the image of the R element, the image of the G element, and the image of the B element are input to the first convolution layer. The first deep convolutional neural network model can also extract element features respectively aiming at brightness and chrominance YUV elements to generate an image.

Each convolutional layer in the first deep convolutional neural network model may use a modified Linear unit (ReLU for short) as an activation function.

In this embodiment of the present application, a size of a Convolution kernel (Conv) used by each Convolution layer in the first deep convolutional neural network model may be a × B, where a and B are positive integers, and a and B may be equal or unequal, which is not specifically limited in this embodiment of the present application.

It should be noted that, the input and the output of each convolutional layer in the first deep convolutional neural network model both have one or more feature maps, and the number of output feature maps is related to the number of input feature maps and the number of filtering channels, for example, a face image is input, and after passing through 3 filtering channels of the input layer, 3 feature maps are obtained.

The first network branch and the second network branch in the embodiment of the present application may be two independent branches, as shown in fig. 2A; of course, the first network branch and the second network branch may also share the first N convolutional layers in the first deep convolutional neural network model, as shown in fig. 2B. When the first network branch and the second network branch are two independent branches, the first N convolutional layers of the first network branch and the first N convolutional layers of the second network branch are the same, and N is an integer greater than 0 and smaller than P.

No matter the first network branch and the second network branch are two independent branches, or the first N convolutional layers are shared by the first network branch and the second network branch, the first N convolutional layers of the first network branch and the first N convolutional layers of the second network branch are all used for filtering background features in the face image to obtain a face feature image. Next, referring to fig. 3, taking N as 4 as an example, a process of obtaining a face feature map by filtering background features in the face image through the first N convolutional layers is specifically described:

s301, filtering the background features of the horizontal direction and the vertical direction of the face image through a first convolution layer and a second convolution layer of the first N convolution layers of the first network branch in the first deep convolutional neural network model.

The convolution kernel size of the first convolution layer is equal to the convolution kernel size of the second convolution layer.

In step S301, the first convolution layer may be a convolution layer for filtering a background feature in a horizontal direction of the face image, and the second convolution layer is a convolution layer for filtering a background feature in a vertical direction of the face image; the first convolution layer may also be a convolution layer for filtering a background feature in the vertical direction of the face image, and the second convolution layer is a convolution layer for filtering a background feature in the horizontal direction of the face image, which is not specifically limited herein. That is, in the embodiment of the present application, the order of filtering the background features in the horizontal direction of the face image or filtering the background features in the vertical direction of the face image is not specifically limited.

And S302, smoothing the face image with the background features filtered in the horizontal direction and the vertical direction through a third convolutional layer and a fourth convolutional layer in the first N convolutional layers of the first network branch in the first deep convolutional neural network model.

In step S302, the convolution kernel size of the third convolution layer is equal to the convolution kernel size of the fourth convolution layer. The third convolution layer may be a convolution layer for performing smoothing processing in the horizontal direction with respect to the face image with the filtered background feature, and the fourth convolution layer may be a convolution layer for performing smoothing processing in the vertical direction with respect to the face image with the filtered background feature; the third convolution layer may also be a convolution layer for performing smoothing processing on the face image with the filtered background features in the vertical direction, and the fourth convolution layer is a convolution layer for performing smoothing processing on the face image with the filtered background features in the horizontal direction, which is not specifically limited herein in this embodiment of the application. That is, in the embodiment of the present application, there is no specific limitation on whether the smoothing process is performed in the horizontal direction or the smoothing process is performed in the vertical direction.

In the embodiment of the present application, the first N convolutional layers of the first network branch and the first N convolutional layers of the second network branch are the same or share the first N convolutional layers in the first deep convolutional neural network model, so that the computation efficiency of the first deep convolutional neural network model is improved, and the first convolutional layer and the second convolutional layer of the first N convolutional layers of the first network branch are used for filtering the background features of the to-be-processed face image in the horizontal direction and the vertical direction, and the third convolutional layer and the fourth convolutional layer are used for performing smoothing processing on the face image with the filtered background features in the horizontal direction and the vertical direction, so that the accuracy of the face sketch image generation technology is improved, and the generated sketch image is more natural.

Optionally, the number of filter channels of the first convolutional layer is a, the number of filter channels of the second convolutional layer is b, the number of filter channels of the third convolutional layer is c, the number of filter channels of the fourth convolutional layer is d, a and b may be positive integers greater than or equal to 100 and less than or equal to 200, a and b are equal, c and d may be positive integers greater than or equal to 1 and less than or equal to 100, and c and d are equal. Specifically, the number of filter channels per convolutional layer is not specifically limited in the embodiments of the present application.

After the background features in the face image are filtered through the first N convolution layers of the first network branch to obtain a face feature image, the face sketch features in the face feature image are obtained through the last M convolution layers of the first network branch to obtain a face structure sketch image. Wherein P ═ M + N. Specific values of M and N are not specifically limited in this embodiment, taking N as 4 and M as 2 as an example, specifically, facial sketch features in the horizontal direction and the vertical direction in the face feature map may be obtained through a fifth convolution layer and a sixth convolution layer of the first network branch, so as to obtain a facial structure sketch map.

The convolution kernel size of the fifth convolution layer of the first network branch is equal to the convolution kernel size of the sixth convolution layer. The fifth convolution layer of the first network branch may be a convolution layer for obtaining a horizontal facial sketch feature in the human face feature map, and the sixth convolution layer is a convolution layer for obtaining a vertical facial sketch feature in the human face feature map; the fifth convolution layer of the first network branch may also be a convolution layer for acquiring a vertical facial sketch feature in the face feature map, and the sixth convolution layer is a convolution layer for acquiring a horizontal facial sketch feature in the face feature map, which is not specifically limited herein. That is, in the embodiment of the present application, the order of first acquiring the horizontal face sketch features in the face feature map or acquiring the vertical face sketch features in the face feature map is not particularly limited.

And after the background features in the face image are filtered through the first N convolution layers of the second network branch to obtain a face feature image, acquiring hair sketch features in the face feature image through the last M convolution layers of the second network branch to obtain a hair texture sketch image. Taking the N as 4 and the M as 2 as an example, obtaining hair sketch features in the horizontal direction and the vertical direction in the face feature map through a fifth convolution layer and a sixth convolution layer of the second network branch, so as to obtain a hair texture sketch map.

A convolution kernel size of a fifth convolution layer of the second network branch is equal to a convolution kernel size of a sixth convolution layer. The fifth convolution layer of the second network branch may be a convolution layer for acquiring hair sketch features in the horizontal direction in the face feature map, and the sixth convolution layer is a convolution layer for acquiring hair sketch features in the vertical direction in the face feature map; the fifth convolution layer of the second network branch may also be a convolution layer for acquiring a hair sketch feature in a vertical direction in the face feature map, and the sixth convolution layer is a convolution layer for acquiring a hair sketch feature in a horizontal direction in the face feature map, which is not specifically limited herein. That is, in the embodiment of the present application, the order of acquiring the hair sketch features in the horizontal direction in the face feature map or acquiring the hair sketch features in the vertical direction in the face feature map is not particularly limited.

Optionally, the convolution kernel sizes of the last M convolutional layers of the first network branch are correspondingly equal to the convolution kernel sizes of the last M convolutional layers of the second network branch. Taking N as 4 and M as 2 as an example, the convolution kernel size of the fifth convolution layer of the first network branch is equal to the convolution kernel size of the fifth convolution layer of the second network branch, and the convolution kernel size of the sixth convolution layer of the first network branch is equal to the convolution kernel size of the sixth convolution layer of the second network branch.

Optionally, the number of filter channels of the four convolutional layers, i.e., the fifth convolutional layer of the first network branch, the sixth convolutional layer of the first network branch, the fifth convolutional layer of the second network branch, and the sixth convolutional layer of the second network branch, may be all 1.

In a possible implementation manner, when the facial structure sketch and the hair texture sketch are synthesized to obtain the sketch image of the face image, the synthesis may be performed based on a hair probability that each pixel is a hair feature point, and specifically, before the synthesis is performed to obtain the sketch image of the face image, the hair probability that each pixel is a hair feature point in the face image may be obtained. After the facial structure sketch map and the hair texture sketch map are obtained by the method, the facial structure sketch map and the hair texture sketch map are synthesized to obtain a sketch image of the face image, and the sketch image meets the following formula requirements:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

In the implementation mode, the face structure sketch and the hair texture sketch are synthesized based on the hair probability to obtain the sketch image of the face image, so that the synthesized sketch image can well retain face structure information and hair texture information.

Optionally, the hair probability of each pixel point in the face image is obtained through a second deep convolutional neural network model.

For example, as shown in FIG. 4, the second deep convolutional neural network model may include 7 connection layers, wherein each of the first connection layer, the second connection layer and the third connection layer includes a convolutional layer with ReLU as an activation function and a convolutional (Conv) kernel size of 5 × 5, a pooling (Pooling) layer with a Conv size of 3 × 3 and a Local Response Normalization (LRN) layer; the fourth connection layer comprises a convolution layer with ReLU as an activation function and Conv core size of 3 x 3; the fifth connection layer comprises a convolution layer with ReLU as an activation function and Conv core size of 3 x 3; the sixth connection layer comprises a convolution layer with ReLU as an activation function and Conv kernel size of 1 × 1; the seventh connection layer comprises a convolutional layer with ReLU as the activation function and Conv size of 1 × 1. The second deep convolutional neural network model can be trained in advance through sample images in a Helen dataset in a Helen database.

The first connecting layer, the second connecting layer and the third connecting layer are used for acquiring hair features, face features and background features of the face image; the fourth connecting layer and the fifth connecting layer are used for acquiring facial contour features, hair contour features and background contour features in the horizontal direction and the vertical direction aiming at the face image of which the hair features, the facial features and the background features are acquired; and the sixth connecting layer and the seventh connecting layer are used for performing smoothing processing on the face image which acquires the face contour feature, the hair contour feature and the background contour feature in the horizontal direction and the vertical direction.

When the hair probability of each pixel point in the face image is obtained through a second deep convolutional neural network model, aiming at each pixel point in the face image, when each pixel point is located in an area covered by a hair contour, the hair probability of each pixel point is 1, and the face probability and the background probability are both 0; when each pixel point is located in an area covered by the facial contour, the facial probability of each pixel point is 1, and the hair probability and the background probability are both 0; and when each pixel point is located in the area covered by the background contour, the background probability of each pixel point is 1, and the hair probability and the face probability are both 0. The pixel points in the area range covered by the facial contour are all facial feature points, the pixel points in the area range covered by the hair contour are all hair feature points, and the pixel points in the area range covered by the background contour are all background feature points.

For better understanding of the embodiment of the present application, the following takes P as an example 4, and the generation process of the sketch image is exemplarily described with reference to fig. 5:

inputting a face image into a first deep convolutional neural network model, and acquiring a face structure sketch of the face image through 4 convolutional layers of a first network branch of the first deep convolutional neural network model; and acquiring a hair texture sketch map of the face image through 4 convolutional layers of a second network branch of the first deep convolutional neural network model. Then, the face part of the face structure sketch is obtained according to the hair probability that each pixel point is a hair feature point, the hair part of the hair texture sketch is obtained according to the hair probability that each pixel point is a hair feature point, and finally the face part and the hair part are synthesized into a sketch image of the face image.

The sketch generation method provided in the embodiment of the application synthesizes the face images to obtain sketch images, as shown in fig. 6A, the sketch images are four face images to be processed, as shown in fig. 6B, effect diagrams of the sketch images are generated for the four face images to be processed shown in fig. 6A, that is, the sketch images are obtained by processing the four face images to be processed shown in fig. 6A respectively through the first deep convolutional neural network model.

The first deep convolutional neural network model adopted in the embodiment of the application can be obtained by training a face sample image in a training sample database aiming at the initialized first deep convolutional neural network model in advance, the training sample database comprises a plurality of face sample images and sketch sample images corresponding to each face sample image, and the initialized first deep convolutional neural network model can comprise weight and bias, and can only comprise weight of course, and the bias is 0.

Next, taking the first deep convolutional neural network model shown in fig. 7 as an example, a training process of the first deep convolutional neural network model is specifically described, and the first deep convolutional neural network model shown in fig. 7 includes: the first four convolutional layers of the first deep convolutional neural network model shared by the first network branch and the second network branch are respectively a first convolutional layer with a ReLU as an activation function and a Conv kernel size of 5 × 5, a second convolutional layer with a ReLU as an activation function and a Conv kernel size of 5 × 5, a third convolutional layer with a ReLU as an activation function and a Conv kernel size of 1 × 1, and a fourth convolutional layer with a ReLU as an activation function and a Conv kernel size of 1 × 1; the last two convolution layers of the first network branch are respectively a fifth convolution layer of the first network branch with ReLU as an activation function and Conv core size of 3 x 3 and a sixth convolution layer of the first network branch with ReLU as an activation function and Conv core size of 3 x 3; the last two convolutional layers of the second network branch are respectively the fifth convolutional layer of the second network branch with ReLU as an activation function and Conv size of 3 × 3, and the sixth convolutional layer of the second network branch with ReLU as an activation function and Conv core size of 3 × 3. The size of the convolution kernel is merely an example, and the configuration of the size of the convolution kernel in the present application is not particularly limited. The size of the convolution kernel is not particularly limited in the embodiments of the present application. The training process of the first deep convolutional neural network model is specifically shown in fig. 8:

s801, inputting a plurality of face sample images in a training sample database into an initialized first deep convolutional neural network model for training.

Optionally, the weight configuration of the initialized first deep convolutional neural network model conforms to a gaussian distribution with a mean value of 0 and a variance of 0.01, and the bias configuration is 0.

And S802, in the K training process, adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image to obtain a face enhancement image.

Wherein, the pixel value of any pixel point in the sketch average map is: and the average value of the pixel values of the pixel points which are positioned at the same position as any one pixel point in all the sketch sample images in the training sample database.

And S803, filtering the background features of the face enhancement image in the horizontal direction and the vertical direction through the first convolution layer and the second convolution layer in the first depth convolution neural network model which are adjusted for K-1 times.

S804, smoothing processing is carried out on the face enhancement image with the background characteristics filtered in the horizontal direction and the vertical direction through the third convolution layer and the fourth convolution layer in the first depth convolution neural network model which is adjusted for K-1 times.

S805, dividing the face sample image into a plurality of mutually overlapped image blocks, and acquiring an image block comprising face characteristic information and an image block comprising hair characteristic information from the plurality of mutually overlapped image blocks.

The number of the image blocks including the facial feature information is H, the number of the image blocks including the hair feature information is Q, and H and Q are positive integers.

Optionally, in step 805, an image block including facial feature information is obtained from the plurality of overlapping image blocks, which may be implemented by, but is not limited to:

the implementation mode is as follows: determining the face probability that each pixel point in each image block is a face feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the face probability not being 0 is determined to be larger than a preset threshold, determining each image block as an image block comprising face feature information.

The implementation mode two is as follows: and acquiring an image block comprising facial feature information from the plurality of mutually overlapped image blocks by a feature identification method. The feature identification method may include a feature identification method based on a local histogram, a feature identification method based on a binarized histogram, and the like, which is not specifically limited in this embodiment of the application.

Optionally, in step 805, an image block including header feature information is obtained from the plurality of overlapping image blocks, which may be implemented by, but is not limited to:

the implementation mode is as follows: determining the hair probability that each pixel point in each image block is a hair feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the hair probability not being 0 is determined to be larger than a preset threshold, determining that each image block is an image block comprising hair characteristic information.

The implementation mode two is as follows: and acquiring an image block comprising hair characteristic information from the plurality of mutually overlapped image blocks by a characteristic identification method. The feature identification method may include a feature identification method based on a local histogram, a feature identification method based on a binarized histogram, and the like, which is not specifically limited in this embodiment of the application.

S806, for the f-th image block including the facial feature information, determining a target region corresponding to the f-th image block including the facial feature information in the face feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target region and the f-th image block including the facial feature information, as shown in fig. 9, to obtain an f-th facial strong feature map.

Wherein f is a positive integer taken over not more than H.

S807, obtaining face sketch features in the f-th face enhancement feature map through the last M convolutional layers of the first network branch of the K-1-time adjusted first depth convolutional neural network model, and obtaining a face structure sketch map of the f-th face sample image.

And S808, aiming at the g image block comprising the hair characteristic information, determining a corresponding target area of the g image block comprising the hair characteristic information in the face characteristic image of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target area and the g image block comprising the hair characteristic information to obtain a g hair reinforced characteristic image.

Wherein g is a positive integer not greater than Q taken over.

And S809, acquiring hair sketch features in the g-th hair enhancement feature map through the last M convolutional layers of the first network branch of the K-1-time adjusted first deep convolutional neural network model, and acquiring a g-th hair texture sketch map of the face sample image.

It should be noted that step S806 and step S808 are not in strict sequence, and step S806 may be executed first and then step S808 is executed, or step S808 is executed first and then step S806 is executed, or step S806 and step S808 are executed at the same time, which is not limited in this embodiment of the present application.

And S810, synthesizing the facial structure sketch of the f-th human face sample image and the hair texture sketch of the g-th human face sample image to obtain a sketch image of the human face sample image.

S811, obtaining an error value between a sketch image of the face sample image and a sketch sample image corresponding to the face sample image.

S812, adjusting the weight and the bias used in the K +1 th training process based on the error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.

Specifically, the adjustment amount of the weight and the offset is determined according to the error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image and the network learning rate, and then the weight and the offset used in the K +1 training process are adjusted according to the adjustment amount.

Wherein the net learning rate is the weight and the amplitude of each bias adjustment, and the net learning rate of the first deep convolutional neural network model can be k × 10^-10Wherein k is a positive integer not greater than 100, and embodiments of the present application are not specifically limited herein.

And S813, obtaining a loss function value of the first deep convolutional neural network model after the K training.

If the loss function value of the first deep convolutional neural network model is larger than a preset threshold value, performing training for the K +1 th time; and if the loss function value of the first deep convolutional neural network model is smaller than or equal to a preset threshold value, finishing the training of the first deep convolutional neural network model.

Specifically, the loss function of the first deep convolutional neural network model conforms to the following formula:

L_g＝L_s+αL_t；

wherein L is_gA loss function value of the first deep convolutional neural network model; l is_sA loss function value for the first network branch; l is_tα is a scalar parameter for maintaining a balance between the value of the loss function of the first network branch and the value of the loss function of the second network branch.

The loss function value of The first network branch may be a Mean Squared Error (MSE) value, a Sum of Absolute Difference (SAD) value, a Mean Absolute Error (MAD) value, or other Error values, and The embodiments of The present invention are not limited herein. Taking the loss function value of the first network branch as the MSE value of the first network branch as an example, the loss function value of the first network branch may be determined by the following formula:

wherein L is_sA loss function value for the first network branch; p is a radical of_sFor the f-th image block including facial feature information; s is_sThe image blocks in the target area corresponding to the f-th image block containing the facial feature information in the sketch sample image corresponding to the face sample image are included; p_sAll image blocks including facial feature information; i P_sI is the number of all image blocks including facial feature information, i.e. | P_sL is equal to H; in the facial structure sketch of the f-th human face sample image, the f-th image block comprising the facial feature information is the image block included in the target area corresponding to the image block comprising the facial feature information, namely

w_gWeights and biases, w, for the first N convolutional layers of the first deep convolutional neural network model_sWeights and biases for the last M convolutional layers of the first network branch of the first deep convolutional neural network model.

The loss function value of the second network branch may be a weighting of an MSE value and a sorting Mean Square Error (SM) value of the second network branch, or may be another Error value, which is not specifically limited herein. Where SM (·) is Sort { MSE (·), Sort () is a sorting function.

Taking the loss function value of the second network branch as the weighting of the MSE value and SM value of the second network branch as an example, the loss function value of the second network branch can be determined by the following formula:

wherein β is a scalar parameter, L_tA loss function value for the first network branch; p is a radical of_tThe image block comprising the hair characteristic information is the g-th image block; s is_tIn the sketch sample image corresponding to the face sample image, the g-th packetImage blocks included in a target area corresponding to the image blocks including the hair feature information; p_tAll image blocks comprising hair characteristic information are used; i P_tI is the number of all image blocks including hair feature information, i.e. | P_tI is equal to Q; in the hair texture sketch of the g-th human face sample image, the g-th image block comprising the hair characteristic information is an image block included in a target area corresponding to the image block comprising the hair characteristic information, namely

w_gWeights and biases, w, for the first N convolutional layers of the first deep convolutional neural network model_tWeights and offsets for the last M convolutional layers of the second network branch of the first deep convolutional neural network model.

Taking the loss function value of the first network branch as the MSE value of the first network branch, and the loss function value of the second network branch as the weighting of the MSE value and the SM value of the second network branch as an example, the loss function value of the first deep convolutional neural network model is determined by the following formula:

Based on the same inventive concept as that of the method embodiment, the embodiment of the present invention provides an apparatus 10 for generating a sketch image, which is specifically used for implementing the methods described in the embodiments of fig. 1 to 5, 7, and 8, and the apparatus has a structure as shown in fig. 10, and includes an obtaining module 11, a deep convolutional neural network model 12, and a synthesizing module 13, where:

and the acquisition module 11 is used for acquiring a face image to be processed.

A deep convolutional neural network model 12, configured to obtain a facial structure sketch map and a hair texture sketch map in the face image obtained by the obtaining module 11; the deep convolutional neural network model 12 is pre-trained, and includes a first network branch module 121 and a second network branch module 122, and the structure of the deep convolutional neural network model 12 is as shown in fig. 11:

the first network branch module 121 is configured to obtain a face sketch feature in the face image obtained by the obtaining module 11, so as to obtain a face structure sketch; the first network branch module comprises P convolutional layers, wherein P is an integer greater than 0.

The second network branch module 122 is configured to obtain hair sketch features in the face image obtained by the obtaining module 11, so as to obtain a hair texture sketch map; the second network branching module includes P convolutional layers.

A synthesizing module 13, configured to synthesize the facial structure sketch map obtained by the first network branching module 121 and the hair texture sketch map obtained by the second network branching module 122 to obtain a sketch image of the face image.

In a possible implementation manner, the first N convolutional layers of the P convolutional layers included in the first network branch module 121 are the same as or coincide with the first N convolutional layers of the P convolutional layers included in the second network branch module 122, where N is an integer greater than 0 and smaller than P.

In a possible implementation manner, the first network branch module 121 is specifically configured to filter the background features in the face image through the first N convolution layers of the first network branch module 121 to obtain a face feature map, and then obtain the face sketch features in the face feature map through the last M convolution layers of the first network branch module 121. The second network branch module 122 is specifically configured to filter the background features in the face image through the first N convolutional layers of the second network branch in the deep convolutional neural network model 12 to obtain a face feature map, and then obtain the hair sketch features in the face feature map through the last M convolutional layers of the second network branch. Wherein P ═ M + N.

Optionally, the convolution kernel sizes of the last M convolution layers of the first network branching module 121 are correspondingly equal to the convolution kernel sizes of the last M convolution layers of the second network branching module 122.

In a possible implementation manner, N is 4, and when filtering the background feature in the face image through the first N convolution layers of the first network branching module 121, the first network branching module 121 is specifically configured to filter the background feature in the horizontal direction and the vertical direction of the face image through a first convolution layer and a second convolution layer of the first N convolution layers of the first network branching module 121, and then perform smoothing processing in the horizontal direction and the vertical direction on the face image with the background feature filtered through a third convolution layer and a fourth convolution layer of the first N convolution layers of the first network branching module 121.

Optionally, the convolution kernel size of the first convolution layer is equal to the convolution kernel size of the second convolution layer, and the convolution kernel size of the third convolution layer is equal to the convolution kernel size of the fourth convolution layer.

In a possible implementation manner, the obtaining module 11 is further configured to obtain a hair probability that each pixel point in the face image is a hair feature point. The synthesizing module 13 is specifically configured to synthesize the facial structure sketch map obtained by the first network branch module 121 and the hair texture sketch map obtained by the second network branch module 122 to obtain a sketch image of the face image, and the sketch image meets the following formula requirement:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

Optionally, the apparatus further comprises:

a training module 14, configured to train the deep convolutional neural network model 12 by:

inputting a plurality of face sample images in a training sample database into an initialized deep convolutional neural network model 12 for training; the training sample database comprises a plurality of face sample images and sketch sample images corresponding to the face sample images, and the initialized deep convolutional neural network model 12 comprises weight and bias.

In the K training process, filtering the background features in the face sample image through the first N convolution layers of the depth convolution neural network model 12 which are adjusted for K-1 times to obtain a face feature map of the face sample image, wherein K is an integer greater than 0.

And acquiring facial sketch features in the facial feature map of the facial sample image through the last M convolutional layers of the first network branch module 121 of the K-1-time adjusted deep convolutional neural network model 12 to obtain a facial structure sketch map of the facial sample image.

And acquiring hair sketch features in the face feature image of the face sample image through the last M convolution layers of the second network branch module 122 of the depth convolution neural network model 12 subjected to the K-1 times of adjustment, so as to obtain a hair texture sketch image of the face sample image.

And synthesizing the facial structure sketch of the face sample image and the hair texture sketch of the face sample image to obtain a sketch image of the face sample image.

And after the K-th training, obtaining an error value between a sketch image of the face sample image and a sketch sample image corresponding to the face sample image.

Optionally, in the K-th training process, when filtering the background features in the face sample image through the first N convolutional layers of the K-1-time adjusted deep convolutional neural network model 12, the training module 14 is specifically configured to:

and adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image to obtain a face enhancement image. Wherein, the pixel value of any pixel point in the sketch average map is: the average value of pixel values of pixel points which are in the same position with any pixel point in all sketch sample images in the training sample database; and filtering the background features in the face enhancement image through the first N convolutional layers of the depth convolutional neural network model 12 which is adjusted for K-1 times.

In a possible implementation manner, the obtaining module 11 is further configured to divide the face sample image into a plurality of mutually overlapped image blocks, and obtain an image block including facial feature information from the plurality of mutually overlapped image blocks. The training module 14 is specifically configured to, when obtaining the face sketch features in the face feature map of the face sample image through the last M convolutional layers of the first network branch module 121 of the K-1-time adjusted deep convolutional neural network model 12:

for each image block including facial feature information acquired by the acquisition module 11, determining a target region corresponding to the image block including facial feature information in a face feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target region and the image block including facial feature information to obtain a face enhancement feature map; and for each face enhancement feature map, acquiring face sketch features in the face enhancement feature map through the last M convolutional layers of the first network branch module 121 of the K-1-time adjusted deep convolutional neural network model 12.

In a possible implementation manner, the obtaining module 11 is further configured to divide the face sample image into a plurality of mutually overlapped image blocks, and obtain an image block including hair feature information from the plurality of mutually overlapped image blocks. The training module 14, when obtaining hair sketch features in the face feature map of the face sample image through the last M convolutional layers of the second network branch module 122 of the K-1-time adjusted deep convolutional neural network model 12, is specifically configured to:

adding the pixel values of the pixel points at the same position in the face sample image and the image block including the hair characteristic information to each image block including the hair characteristic information acquired by the acquisition module 11 to obtain a hair reinforced characteristic diagram; and for each hair strengthening feature map, acquiring hair sketch features in the hair strengthening feature map through the last M convolutional layers of the second network branch module 122 of the K-1-time adjusted deep convolutional neural network model 12.

Optionally, when the image block including the facial feature information is obtained from the plurality of mutually overlapped image blocks, the obtaining module 11 is specifically configured to:

In a possible design, when the obtaining module 11 obtains an image block including hair feature information from the plurality of mutually overlapped image blocks, the obtaining module is specifically configured to:

The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

When the integrated module may be implemented in a hardware form, as shown in fig. 12, the integrated module may include a collector 1201, a processor 1202 and a memory 1203. The physical hardware corresponding to the deep convolutional neural network model 12, the synthesis module 13 and the training module 14 may be the processor 1202. The processor 1202 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The processor 1202 obtains a face image to be processed through the collector 1201. A memory 1203 for storing programs executed by the processor 1202.

In the embodiment of the present application, a specific connection medium among the collector 1201, the processor 1202, and the memory 1203 is not limited. In the embodiment of the present application, the memory 1203, the processor 1202, and the collector 1201 are connected by a bus 1204 in fig. 12, the bus is represented by a thick line in fig. 12, and the connection manner between other components is only schematically illustrated and is not limited thereto. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.

The memory 1203 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 1203 may also be a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this. The memory 1203 may be a combination of the above.

The processor 1202 is configured to execute the program code stored in the memory 1203, and is specifically configured to execute the method according to the embodiment corresponding to fig. 1 to 9, which may be specifically implemented with reference to the embodiment corresponding to fig. 1 to 9, and details of the method are not repeated herein.

The embodiments described herein are only for illustrating and explaining the present application and are not intended to limit the present application, and the embodiments and functional blocks in the embodiments in the present application may be combined with each other without conflict.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

A method for generating a sketch image, comprising:

acquiring a face image to be processed;

acquiring facial sketch features in the face image through P convolutional layers of a first network branch in a pre-trained deep convolutional neural network model to obtain a facial structure sketch, wherein P is an integer larger than 0;

obtaining hair sketch features in the face image through P convolution layers of a second network branch in the deep convolution neural network model to obtain a hair texture sketch;

and synthesizing the facial structure sketch map and the hair texture sketch map to obtain a sketch image of the face image.
The method of claim 1, wherein the first N convolutional layers of the first network branch are the same as or coincide with the first N convolutional layers of the second network branch, N being an integer greater than 0 and less than P.
The method of claim 2, wherein said obtaining facial sketch features in the face image through P convolutional layers of a first network branch in the deep convolutional neural network model comprises:

filtering background features in the face image through the first N convolutional layers of a first network branch in the deep convolutional neural network model to obtain a face feature image;

acquiring facial sketch features in the face feature map through the last M convolutional layers of the first network branch;

the acquiring of hair sketch features in the face image through the P convolutional layers of the second network branch in the deep convolutional neural network model includes:

filtering the background features in the face image through the first N convolutional layers of the second network branch in the deep convolutional neural network model to obtain a face feature map;

acquiring hair sketch features in the face feature map through the last M convolutional layers of the second network branch;

wherein P ═ M + N.
The method of claim 3, wherein convolution kernel sizes of the last M convolutional layers of the first network branch correspond equally to convolution kernel sizes of the last M convolutional layers of the second network branch.
The method of claim 3 or 4, wherein N is 4, and wherein filtering the background features in the face image through the first N convolutional layers of the first network branch in the deep convolutional neural network model comprises:

filtering the background features of the horizontal direction and the vertical direction of the face image through a first convolution layer and a second convolution layer of the first N convolution layers of a first network branch in the deep convolutional neural network model;

and smoothing the face image with the background features filtered in the horizontal direction and the vertical direction through a third convolution layer and a fourth convolution layer in the first N convolution layers of the first network branch in the deep convolutional neural network model.
The method of claim 5, wherein the convolution kernel size of the first convolutional layer is equal to the convolution kernel size of the second convolutional layer, and the convolution kernel size of the third convolutional layer is the same as the convolution kernel size of the fourth convolutional layer.
The method of any of claims 1 to 6, further comprising:

acquiring hair probability that each pixel point in the face image is a hair feature point;

the sketch image of the face image is obtained by synthesizing the face structure sketch and the hair texture sketch, and meets the following formula requirements:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

wherein, the S_(i,j)Is the pixel value, P, of the pixel point of the ith row and the jth column in the sketch image of the face image_h(i,j)Is the hair probability of the pixel point of the ith row and the jth column in the sketch image of the face image, S_S(i,j)The pixel value of the pixel point of the ith row and the jth column in the face structure sketch image is S_t(i,j)And the pixel values of the pixel points in the ith row and the jth column in the hair texture sketch map are represented, wherein the i and the j are integers which are larger than 0.
The method of any one of claims 2-7, wherein the deep convolutional neural network model is trained by:

inputting a plurality of face sample images in a training sample database into an initialized deep convolutional neural network model for training; the training sample database comprises a plurality of face sample images and sketch sample images corresponding to the face sample images, and the initialized depth convolution neural network model comprises weight and bias;

in the K training process, filtering the background features in the face sample image through the first N convolutional layers of the depth convolutional neural network model which is adjusted for K-1 times to obtain a face feature map of the face sample image, wherein K is an integer greater than 0;

acquiring facial sketch features in a facial feature map of the facial sample image through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model to obtain a facial structure sketch of the facial sample image;

obtaining hair sketch features in a face feature image of the face sample image through the last M convolution layers of the second network branch of the K-1-time adjusted deep convolution neural network model to obtain a hair texture sketch image of the face sample image;

synthesizing a face structure sketch of the face sample image and a hair texture sketch of the face sample image to obtain a sketch image of the face sample image;

after the K training, obtaining an error value between a sketch image of the face sample image and a sketch sample image corresponding to the face sample image;

and adjusting the weight and the bias used in the K +1 training process based on the error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.
The method of claim 8, wherein filtering background features in the face sample image through the first N convolutional layers of the K-1 adjusted deep convolutional neural network model during the kth training process comprises:

adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image to obtain a face enhancement image;

wherein, the pixel value of any pixel point in the sketch average map is: the average value of pixel values of pixel points which are in the same position with any pixel point in all sketch sample images in the training sample database;

and filtering the background features in the face enhancement image through the first N convolution layers of the depth convolution neural network model which is adjusted for K-1 times.
The method of claim 8 or 9, wherein obtaining facial sketch features in a face feature map of the face sample image through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model comprises:

dividing the face sample image into a plurality of mutually overlapped image blocks, and acquiring an image block comprising facial feature information from the plurality of mutually overlapped image blocks;

for each image block comprising facial feature information, determining a target area corresponding to each image block comprising facial feature information in a facial feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target area and each image block comprising facial feature information to obtain a face enhancement feature map;

and for each face enhancement feature map, acquiring face sketch features in the face enhancement feature map through the last M convolutional layers of the first network branch of the K-1-time adjusted deep convolutional neural network model.
The method according to any one of claims 8 to 10, wherein obtaining hair sketch features in a face feature map of the face sample image through the last M convolutional layers of the second network branch of the K-1-time adjusted deep convolutional neural network model comprises:

dividing the face sample image into a plurality of mutually overlapped image blocks, and acquiring an image block comprising hair characteristic information from the plurality of mutually overlapped image blocks;

adding the pixel values of the pixel points at the same position in the human face sample image and the image block containing the hair characteristic information aiming at each image block containing the hair characteristic information to obtain a hair reinforced characteristic image;

and for each hair strengthening characteristic graph, obtaining hair sketch characteristics in the hair strengthening characteristic graph through the last M convolutional layers of the second network branch of the K-1 times adjusted deep convolutional neural network model.
The method as claimed in claim 10, wherein said obtaining an image block including facial feature information from said plurality of mutually overlapping image blocks comprises:

determining the face probability that each pixel point in each image block is a face feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the face probability not being 0 is determined to be larger than a preset threshold, determining each image block as an image block comprising face feature information.
The method as claimed in claim 11, wherein said obtaining an image block including hair feature information from said plurality of overlapping image blocks comprises:

determining the hair probability that each pixel point in each image block is a hair feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the hair probability not being 0 is determined to be larger than a preset threshold, determining that each image block is an image block comprising hair characteristic information.
An apparatus for generating a sketch image, comprising:

the acquisition module is used for acquiring a face image to be processed;

the deep convolutional neural network model is used for acquiring a face structure sketch map and a hair texture sketch map in the face image acquired by the acquisition module; the deep convolutional neural network model is pre-trained and comprises a first network branch module and a second network branch module;

the first network branch module is used for acquiring the face sketch features in the face image acquired by the acquisition module to obtain a face structure sketch; the first network branch module comprises P convolutional layers, wherein P is an integer greater than 0;

the second network branch module is used for acquiring hair sketch features in the face image acquired by the acquisition module to obtain a hair texture sketch; the second network branch module comprises P convolutional layers;

and the synthesis module is used for synthesizing the facial structure sketch obtained by the first network branch module and the hair texture sketch obtained by the second network branch module to obtain a sketch image of the face image.
The apparatus of claim 14, wherein a first N of the P convolutional layers included in the first network finger module are the same as or coincide with a first N of the P convolutional layers included in the second network finger module, N being an integer greater than 0 and less than P.
The apparatus of claim 15, wherein the first network branching module is specifically configured to:

filtering background features in the face image through the first N convolutional layers of the first network branch module to obtain a face feature map;

acquiring facial sketch features in the face feature map through the last M convolution layers of the first network branch module;

the second network branch module is specifically configured to:

filtering the background features in the face image through the first N convolutional layers of the second network branch in the deep convolutional neural network model to obtain a face feature map;

acquiring hair sketch features in the face feature map through the last M convolutional layers of the second network branch;

wherein P ═ M + N.
The apparatus of claim 16, wherein convolution kernel sizes of last M convolutional layers of the first network finger module are correspondingly equal to convolution kernel sizes of last M convolutional layers of the second network finger module.
The apparatus according to claim 16 or 17, wherein N is 4, and the first network branching module, when filtering the background features in the face image through the first N convolutional layers of the first network branching module, is specifically configured to:

filtering the background features of the face image in the horizontal direction and the vertical direction through a first convolution layer and a second convolution layer of the first N convolution layers of the first network branch module;

and smoothing the face image with the filtered background features in the horizontal direction and the vertical direction through a third convolution layer and a fourth convolution layer in the first N convolution layers of the first network branch module.
The apparatus of claim 18, wherein a convolution kernel size of the first convolutional layer is equal to a convolution kernel size of the second convolutional layer, and a convolution kernel size of the third convolutional layer is the same as a convolution kernel size of the fourth convolutional layer.
The apparatus according to any one of claims 14 to 19, wherein the obtaining module is further configured to obtain a hair probability that each pixel point in the face image is a hair feature point;

the synthesis module is specifically configured to:

synthesizing the facial structure sketch obtained by the first network branch module and the hair texture sketch obtained by the second network branch module to obtain a sketch image of the face image, wherein the sketch image meets the requirements of the following formula:

S_(i,j)＝(1-P_h(i,j))×S_S(i,j)+P_h(i,j)×S_t(i,j)

wherein, the S_(i,j)Is the pixel value, P, of the pixel point of the ith row and the jth column in the sketch image of the face image_h(i,j)Is the hair probability of the pixel point of the ith row and the jth column in the sketch image of the face image, S_S(i,j)Sketching an image for the facial structurePixel value of pixel point of ith row and jth column, S_t(i,j)And the pixel values of the pixel points in the ith row and the jth column in the hair texture sketch map are represented, wherein the i and the j are integers which are larger than 0.
The apparatus of any one of claims 14-20, further comprising:

the training module is used for obtaining the deep convolutional neural network model through training in the following mode:

inputting a plurality of face sample images in a training sample database into an initialized deep convolutional neural network model for training; the training sample database comprises a plurality of face sample images and sketch sample images corresponding to the face sample images, and the initialized depth convolution neural network model comprises weight and bias;

in the K training process, filtering the background features in the face sample image through the first N convolutional layers of the depth convolutional neural network model which is adjusted for K-1 times to obtain a face feature map of the face sample image, wherein K is an integer greater than 0;

acquiring facial sketch features in a facial feature map of the facial sample image through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model to obtain a facial structure sketch map of the facial sample image;

acquiring hair sketch features in a face feature map of the face sample image through the last M convolution layers of the second network branch module of the depth convolution neural network model which is adjusted for K-1 times to obtain a hair texture sketch map of the face sample image;

synthesizing a face structure sketch of the face sample image and a hair texture sketch of the face sample image to obtain a sketch image of the face sample image;

after the K training, obtaining an error value between a sketch image of the face sample image and a sketch sample image corresponding to the face sample image;

and adjusting the weight and the bias used in the K +1 training process based on the error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.
The apparatus of claim 21, wherein the training module, during the kth training process, is specifically configured to, when filtering the background features in the face sample image through the first N convolutional layers of the K-1 adjusted deep convolutional neural network model:

adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image to obtain a face enhancement image;

wherein, the pixel value of any pixel point in the sketch average map is: the average value of pixel values of pixel points which are in the same position with any pixel point in all sketch sample images in the training sample database;

and filtering the background features in the face enhancement image through the first N convolution layers of the depth convolution neural network model which is adjusted for K-1 times.
The apparatus according to claim 21 or 22, wherein the obtaining module is further configured to divide the face sample image into a plurality of mutually overlapping image blocks, and obtain an image block including facial feature information from the plurality of mutually overlapping image blocks;

the training module is specifically configured to, when obtaining the face sketch features in the face feature map of the face sample image through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model:

for each image block which is acquired by the acquisition module and comprises facial feature information, determining a target area corresponding to each image block comprising the facial feature information in a face feature map of the face sample image, and adding pixel values of pixel points at the same position in the image block in the target area and each image block comprising the facial feature information to obtain a face enhancement feature map;

and for each face enhancement feature map, acquiring face sketch features in the face enhancement feature map through the last M convolutional layers of the first network branch module of the K-1-time adjusted deep convolutional neural network model.
The apparatus according to any one of claims 21 to 23, wherein the obtaining module is further configured to divide the face sample image into a plurality of mutually overlapping image blocks, and obtain an image block including hair feature information from the plurality of mutually overlapping image blocks;

the training module is specifically configured to, when obtaining hair sketch features in the face feature map of the face sample image through the last M convolutional layers of the second network branch module of the K-1-time adjusted deep convolutional neural network model:

adding the pixel values of the pixel points at the same position in the face sample image and the image blocks including the hair characteristic information to obtain a hair reinforced characteristic image aiming at each image block including the hair characteristic information acquired by the acquisition module;

and for each hair strengthening characteristic graph, obtaining hair sketch characteristics in the hair strengthening characteristic graph through the last M convolutional layers of the second network branch module of the K-1-time adjusted deep convolutional neural network model.
The apparatus as claimed in claim 23, wherein the obtaining module, when obtaining the image block including the facial feature information from the plurality of mutually overlapped image blocks, is specifically configured to:

determining the face probability that each pixel point in each image block is a face feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the face probability not being 0 is determined to be larger than a preset threshold, determining each image block as an image block comprising face feature information.
The apparatus according to claim 24, wherein the obtaining module, when obtaining the image block including hair feature information from the plurality of mutually overlapped image blocks, is specifically configured to:

determining the hair probability that each pixel point in each image block is a hair feature point aiming at each image block in the plurality of mutually overlapped image blocks; and when the number of the pixel points with the hair probability not being 0 is determined to be larger than a preset threshold, determining that each image block is an image block comprising hair characteristic information.
A sketch image generating device is characterized by comprising a collector, a memory and a processor;

the collector is used for obtaining a face image to be processed;

a memory for storing a program executed by the processor;

a processor, configured to execute the program stored in the memory based on the face image acquired by the collector to perform the method according to any one of claims 1 to 13.
A computer storage medium having computer-executable instructions stored thereon for causing a computer to perform the method of any one of claims 1 to 13.