CN111832498B

CN111832498B - Cartoon face recognition method based on convolutional neural network

Info

Publication number: CN111832498B
Application number: CN202010692679.0A
Authority: CN
Inventors: 王笛; 田玉敏; 黄珍; 刘瑗; 万波; 杨鹏飞; 赵辉; 罗楠
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2023-07-28
Anticipated expiration: 2040-07-17
Also published as: CN111832498A

Abstract

The invention discloses a cartoon face recognition method based on a convolutional neural network, which comprises the following steps: (1) generating a training set; (2) generating a C-F Loss function; (3) training an Xreception convolutional neural network; (4) generating an identification picture set; and (5) recognizing the cartoon face picture. The invention adopts Xreception convolutional neural network to extract the characteristics, can extract more complete cartoon face characteristics to obtain higher recognition rate, and simultaneously increases the Focal Loss function to generate a C-F Loss function on the basis of the cross entropy Loss function, thereby solving the problems of unbalanced number of pictures in different types and unbalanced difficulty degree of outputting corresponding type names after training.

Description

Cartoon face recognition method based on convolutional neural network

Technical Field

The invention belongs to the technical field of image processing, and further relates to a cartoon face recognition method based on a convolutional neural network in the technical field of image recognition. The invention can be applied to identify the identity information corresponding to the cartoon face from the image of the face.

Background

Cartoon is an artistic form that uses simple and exaggerated techniques to describe a face image. The same person can also have facial cartoon pictures of different styles, and the facial cartoon pictures of different styles highlight different facial part characteristics. For such a cartoon face portrait, a person can easily distinguish which person the cartoon face belongs to, but it is challenging for a machine. Studying cartoon face recognition can help us better understand human perception of face. On the basis of simple face recognition, the study of the cartoon reveals the intrinsic nature of human face perception. The insight gained by computer scientists from psychological studies may promote the development of machine learning methods, further improving the performance of cartoons and face recognition.

Pushkar Shukla et al in its published paper "CARTOONNET Caricature Recognition of Public Figures" (Proceedings of 3rd International Conference on Computer Vision and Image Processing,pp 1-10, 2019) proposes a cartoon face recognition method based on a deep convolutional neural network. The data in the paper that make up the training set uses the published IIIT-CFW data set (Mishra et al European conference on computer vision, 2016) that includes cartoon face images of public characters. During the experiment of the paper, the number of pictures required to meet the same category selected from the data set must be greater than 35, and the categories meeting the conditions are added to the training process. The method can be well applied to the actual situation of cartoon face recognition, but the method still has the defect that as the method only selects more than 35 pictures of the same type from the data set and ignores the type with less images in the same type, the number of people recognized in the process of recognizing the cartoon face is reduced, and the recognition number of identity information corresponding to the face is influenced.

The Hangzhou university of electronic technology discloses a method for carrying out cartoon face recognition by using the gating fusion discrimination features in patent literature (application number: 201911157921.8, application publication number: CN 111079549A) of Hangzhou university. The method comprises the steps of firstly preprocessing data, secondly extracting and fusing characteristics, and fusing 17 local characteristics and global characteristics. And the global features are extracted by scaling the size of the picture to 112 multiplied by 96 and inputting the picture into a lightweight network, and finally, the cosine distance is calculated by the features of the fused cartoon and the face photo. The method can achieve a better recognition result, but has the defects that as the picture input by the method is a zoomed picture when global features are extracted, the pixel size of each picture is reduced, and a lightweight network is adopted for feature extraction, the extracted features of each picture are incomplete, and the identity information recognition rate corresponding to a human face is affected.

Disclosure of Invention

The invention aims to provide a cartoon face recognition method based on a convolutional neural network aiming at the defects of the prior art. The method is used for solving the problems that the number of people identified in the identification process of cartoon faces is reduced due to the fact that the categories with small number of images in the same category are ignored, and the identification rate of identity information corresponding to the faces is affected due to incomplete extraction of picture features.

The method is used for solving the problem that the number of people identified in the identification process of cartoon faces is reduced because the category with small image number cannot be added into training. In training, an Xattention convolutional neural network is used for extracting picture features, so that the problem that the recognition rate of identity information corresponding to a human face is affected due to incomplete picture feature extraction is solved.

The method comprises the following specific steps:

(1) Generating a training set:

(1a) Collecting cartoon face pictures and face pictures of each person to be identified, and collecting at least 15 pictures of each person to be identified;

(1b) Marking each corner of each eye in each picture as a key point, obtaining a picture with aligned faces by adopting a face alignment method based on eyes, cutting the picture with aligned faces into a size of 250 multiplied by 350, and obtaining a cut picture;

(1c) Forming a class by all the cut pictures of each person to be identified, taking the name of the person to be identified as a class name of the class, taking all the cut pictures of the person to be identified in each class as training pictures, and forming training pictures of all the classes into a training set;

(2) The C-F Loss function is generated as follows:

F＝-[y log y′+(1-y)log(1-y′)]+[-α(1-y′) ^γ log(y′)]×e

wherein y represents a class name corresponding to a picture input to the Xreception convolutional neural network, y' represents a predicted class name of an output of the Xreception convolutional neural network for training, alpha represents a parameter for solving imbalance of the number of pictures in different classes, the value range is [0,1], gamma represents a parameter for solving imbalance of the difficulty level of outputting a corresponding class name after the picture is trained in the Xreception convolutional neural network, the value range is [0, + ], and e represents an identification rate factor for adjusting the size of the identification rate, and the value range is [0,1].

(3) Training Xreception convolutional neural networks:

inputting the training set into the Xreception convolutional neural network, performing iterative training on training pictures in the training set by using an Adam optimizer until the value of the C-F Loss function is continuously converged to the minimum, obtaining a trained Xreception convolutional neural network, and storing weights in the trained Xreception convolutional neural network;

(4) Generating an identification picture set:

collecting cartoon face pictures of each person to be identified, which are not repeated with each picture in the training set, at least collecting 1 picture of each person to be identified, and forming an identification picture set by all the cartoon face pictures;

(5) Identifying cartoon face pictures:

and sequentially inputting each picture in the identification picture set into the trained Xreception convolutional neural network, and sequentially outputting each picture and a class name corresponding to each picture.

Compared with the prior art, the invention has the following advantages:

firstly, the invention generates a C-F Loss function used for training the Xreception convolutional neural network, and overcomes two problems in the prior art, namely, the imbalance of the number of pictures in different classes and the imbalance of the difficulty degree of outputting corresponding class names after the pictures are trained in the Xreception convolutional neural network, and the problem that the number of people identified in the identification process of cartoon faces is reduced because of the fact that the number of pictures is small and the pictures cannot be added into the trained classes is solved, so that the pictures do not need to be deleted before the Xreception convolutional neural network is trained, the number of classes is reduced, and the identifiable class number of the cartoon faces in the identification process is increased.

Secondly, the Xreception convolutional neural network is trained by using the pictures in the training set, and as the characteristic is extracted by adopting a plurality of convolution kernels with different sizes, the adaptability of the characteristic with different scales in the picture to be identified is improved, and the characteristic in the picture is extracted more completely. The invention solves the problem that the identification rate of the identity information corresponding to the face is affected because the pixel size of each picture is reduced and the characteristics are extracted by adopting a lightweight network in the prior art, so that the invention extracts more complete characteristics in the picture to be identified, thereby improving the identification rate of the identity information corresponding to the face.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic illustration of a cartoon face sample of a WebCactarture dataset employed in a simulation experiment;

fig. 3 is a schematic diagram of a cartoon face sample after an eye-based face alignment method is adopted for each picture in fig. 2 in a simulation experiment.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The specific steps of the present invention will be described in further detail with reference to fig. 1.

And step 1, generating a training set.

Firstly, collecting cartoon face pictures and face pictures of each person to be identified, wherein each person to be identified collects at least 15 pictures;

marking each corner of each eye in each picture as a key point, obtaining a picture with aligned faces by adopting a face alignment method based on eyes, and cutting each picture with aligned faces into a size of 250 multiplied by 350 to obtain a cut picture;

thirdly, forming a class by all the cut pictures of each person to be identified, taking the name of the person to be identified as a class name of the class, taking all the cut pictures of the person to be identified in each class as training pictures, and forming training sets by the training pictures of all the classes;

and 2, generating a C-F Loss function as follows.

F＝-[y log y′+(1-y)log(1-y′)]+[-α(1-y′) ^γ log(y′)]×e

And step 3, training the Xreception convolutional neural network.

and 4, generating an identification picture set.

and 5, recognizing the cartoon face picture.

The effects of the present invention can be further illustrated by the following simulation experiments.

1. Simulation conditions:

the simulation experiment of the invention adopts software Pycharm as a simulation tool, and the computer is configured as an Intel core i7/3.6GHz/16G, 64-bit Windows7 operating system.

The data used in the simulation experiments of the present invention were from all of the data in the webcast cartoon face dataset created by university of south Beijing, which consisted of 6042 cartoons and 5974 photographs containing 252 person identities. The size of each image was 250×350. Fig. 2 is a schematic diagram of 3 cartoon face samples of two character identities respectively selected in webcast cartoon face dataset.

2. Simulation experiment contents:

the simulation experiment is carried out on the WebCactature cartoon face data set, and the face cartoon image is identified by adopting the method to obtain the identification rate. And (3) adopting an eye-based face alignment method to each picture in the WebCactatic cartoon face data set to obtain a picture with the aligned faces. Fig. 3 is a schematic illustration of a cartoon face sample after face alignment for each of the pictures of fig. 2. Aligning the face data set according to 6:2: the proportion of 2 is randomly divided into a training set, a verification set and a test set. The training set data are input into the Xreception convolutional neural network for training, and the data enhancement technology is adopted in the training process, so that the generalization capability of the model is improved. The verification set is used for verifying a model after one epoch is trained in the training process, so that the recognition rate and the loss value are obtained. The recognition rate and loss value of the model on the verification set are calculated through training, and overfitting caused by over training is avoided by using early-stop. And testing the model with the best training effect by using the test set data to obtain the recognition rate.

Claims

1. A cartoon face recognition method based on a convolutional neural network is characterized in that a C-F Loss function is utilized to train an Xattention convolutional neural network; the method comprises the following steps:

(1) Generating a training set:

(2) The C-F Loss function is generated as follows:

F＝-[y log y′+(1-y)log(1-y′)]+[-α(1-y′) ^γ log(y′)]×e

wherein y represents a class name corresponding to a picture input to the Xreception convolutional neural network, y' represents a predicted class name of an output of the Xreception convolutional neural network for training, alpha represents a parameter for solving imbalance of the number of pictures in different classes, the value range is [0,1], gamma represents a parameter for solving imbalance of the difficulty level of outputting a corresponding class name after the picture is trained in the Xreception convolutional neural network, the value range is [0, + ], e represents an identification rate factor for adjusting the size of an identification rate, and the value range is [0,1];

(3) Training Xreception convolutional neural networks:

(4) Generating an identification picture set:

(5) Identifying cartoon face pictures: