CN114943999A

CN114943999A - Method for training age detection model, age detection method and related device

Info

Publication number: CN114943999A
Application number: CN202210623051.4A
Authority: CN
Inventors: 陈仿雄
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-08-26

Abstract

The embodiment of the application relates to the technical field of facial image attribute prediction, and discloses a method for training an age detection model, an age detection method and a related device. And then, encoding the age interference information to obtain an information characteristic code. And finally, carrying out iterative training on the neural network by adopting a training set and information characteristic codes corresponding to the face images in the training set to obtain an age detection model. In this embodiment, the neural network can learn the characteristics of the face image and the age interference characteristics in the training process, that is, clearly distinguish the age interference characteristics from the effective characteristics affecting the accuracy of age prediction, so that the interference characteristics can be filtered out in age prediction, the influence of the interference characteristics on network learning is reduced, and the trained age detection model can accurately and stably detect the age.

Description

Method for training age detection model, age detection method and related device

Technical Field

The embodiment of the application relates to the technical field of human face image attribute prediction, in particular to a method for training an age detection model, an age detection method and a related device.

Background

The face image comprises a plurality of face feature information, such as face shape, face skin state, face expression, face five sense organs, face age and the like, wherein the face age is used as more important feature information and is widely applied to the field of face image detection. For example, some clients running on mobile devices have an age detection function, in which the client obtains a face image and outputs the detected age based on the obtained face image to feed back to the user.

However, the changes of angles, facial expressions and the like of the face images can cause the age value recognized by the same person to fluctuate greatly, and the stability of the face images is poor to a certain extent.

Disclosure of Invention

The embodiment of the application mainly solves the technical problem of providing a method for training an age detection model, an age detection method and a related device, wherein the age detection model obtained by training by the method can accurately and stably detect the age.

In order to solve the above technical problem, in a first aspect, an embodiment of the present application provides a method for training an age detection model, including:

acquiring a training set, wherein the training set comprises a plurality of face images, and each face image is marked with a real age;

acquiring age interference information corresponding to each face image in a training set, wherein the age interference information comprises shooting information and/or expression information, and the age interference information is an interference factor for interfering a neural network to accurately detect age;

coding the age interference information to obtain an information characteristic code;

and performing iterative training on a neural network by adopting the training set and the information characteristic codes corresponding to the face images in the training set to obtain an age detection model.

In some embodiments, said encoding said age interference information to obtain an information characteristic code comprises:

coding the text data in the age interference information to obtain a text code;

and splicing the text code and the numerical data in the age interference information to obtain the information characteristic code.

In some embodiments, said encoding text data in the age interference information to obtain a text code includes:

and coding the text data in the age interference information by adopting a bag-of-words model to obtain the text code.

In some embodiments, before the performing iterative training on a neural network by using the training set and the information feature codes corresponding to the face images in the training set to obtain an age detection model, the method further includes:

extracting the characteristics of the information characteristic codes by adopting a multilayer perceptron module to obtain target information vectors;

adopting the training set and the information feature codes corresponding to the face images in the training set to carry out iterative training on a neural network to obtain an age detection model, comprising the following steps:

and performing iterative training on the neural network and the multilayer perceptron module by adopting the training set and target information vectors corresponding to the training set to obtain the age detection model.

In some embodiments, the neural network comprises a cascade of convolution modules, a fully-connected layer, a fusion layer, and a classification layer, wherein the convolution modules comprise a plurality of convolution layers;

adopting the training set and the target information vector corresponding to the training set to carry out iterative training on the neural network and the multilayer perceptron module to obtain the age detection model, comprising the following steps:

inputting the training set into a convolution module of the neural network, wherein the last convolution layer of the convolution module outputs an age characteristic diagram;

inputting the age characteristic diagram into a full connection layer of the neural network to obtain an age characteristic vector;

inputting the age characteristic vector and the target information vector into the fusion layer for fusion, and inputting the fusion vector obtained by fusion into the classification layer for age classification to obtain a predicted age;

and calculating the loss between each real age and the corresponding predicted age by adopting a loss function, and adjusting model parameters of the neural network and the multilayer perceptron module according to the loss until convergence to obtain the age detection model.

In some embodiments, said inputting said age feature vector and said target information vector into said fusion layer for fusion comprises:

and multiplying and fusing the age characteristic vector and the target information vector to obtain the fused vector.

In some embodiments, the loss function comprises:

P _i ＝log(Y _ci )

wherein, Y _ci E (1,2, …, n) is the probability that the ith age corresponds to the predicted age, Y _Ti And n is the probability corresponding to the ith age in the real ages, and is the maximum age.

In order to solve the above technical problem, in a second aspect, an age detection method is provided in an embodiment of the present application, and includes:

acquiring a human face image to be detected;

acquiring age interference information corresponding to the face image to be detected, wherein the age interference information comprises shooting information and/or expression information, and the age interference information is an interference factor for interfering a neural network to accurately detect age;

encoding age interference information corresponding to the face image to be detected to obtain information characteristic codes corresponding to the face image to be detected;

and inputting the facial image to be detected and the information characteristic code corresponding to the facial image to be detected into an age detection model for age detection to obtain the age corresponding to the facial image to be detected.

In order to solve the above technical problem, in a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In order to solve the above technical problem, in a fourth aspect, an embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer device to perform the method of the first aspect.

The beneficial effects of the embodiment of the application are as follows: different from the situation in the prior art, in the method for training an age detection model provided in the embodiment of the present application, a training set includes a plurality of face images marked with real ages, and age interference information corresponding to each face image in the training set is first obtained, for example, the age interference information includes shooting information and/or expression information. And then, encoding the age interference information to obtain an information characteristic code. And finally, carrying out iterative training on the neural network by adopting a training set and information characteristic codes corresponding to the face images in the training set to obtain an age detection model. In this embodiment, the neural network can learn the characteristics of the face image and the age interference characteristics in the training process, that is, clearly distinguish the age interference characteristics and the effective characteristics affecting the accuracy of age prediction, so that the effective characteristics have higher discriminability, thereby filtering the interference characteristics when age prediction is performed, reducing the influence of the interference characteristics on network learning, and enabling the trained age detection model to accurately and stably detect the age.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a schematic diagram of an application scenario of an age detection system in some embodiments of the present application;

FIG. 2 is a schematic diagram of an electronic device according to some embodiments of the present application;

FIG. 3 is a schematic flow chart of a method of training an age detection model according to some embodiments of the present application;

FIG. 4 is a schematic flow chart illustrating a sub-step S30 of the method shown in FIG. 3;

FIG. 5 is a schematic flow chart illustrating a sub-process of step S40 in the method of FIG. 3;

FIG. 6 is a schematic diagram of a neural network used for training in some embodiments of the present application;

fig. 7 is a schematic flow chart of an age detection method according to some embodiments of the present application.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that various changes and modifications can be made by one skilled in the art without departing from the spirit of the application. All falling within the scope of protection of the present application.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that, if not conflicted, the various features of the embodiments of the present application may be combined with each other within the scope of protection of the present application. Additionally, while functional block divisions are performed in device schematics, with logical sequences shown in flowcharts, in some cases, steps shown or described may be performed in a different order than the block divisions in devices, or in flowcharts. Further, the terms "first," "second," "third," and the like, as used herein, do not limit the data and the execution order, but merely distinguish the same items or similar items having substantially the same functions and actions.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical features mentioned in the embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.

To facilitate understanding of the method provided in the embodiments of the present application, first, terms referred to in the embodiments of the present application will be described:

(1) neural network

The neural network may be composed of neural units, and may be specifically understood as a neural network having an input layer, a hidden layer, and an output layer, where generally the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Among them, a neural network with many hidden layers is called a Deep Neural Network (DNN). The work of each layer in the neural network can be described by a mathematical expression y ═ a (W · x + b), and from the physical level, the work of each layer in the neural network can be understood as that the transformation of the input space to the output space (i.e. the row space to the column space of the matrix) is completed through five operations on the input space (the set of input vectors), including 1, ascending/descending; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". The operation of 2 and 3 is completed by W.x, the operation of 4 is completed by + b, and the operation of 5 is realized by a () because the classified object is not a single thing but a kind of thing, and the space refers to the set of all individuals of the kind of thing, wherein W is the weight matrix of each layer of the neural network, and each value in the matrix represents the weight value of one neuron of the layer. The matrix W determines the spatial transformation of the input space to the output space described above, i.e. W at each layer of the neural network controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, the weight matrix.

It should be noted that, in the embodiment of the present application, based on the model adopted by the machine learning task, the model is essentially a neural network. The common components in the neural network include a convolutional layer, a pooling layer, a normalization layer, a reverse convolutional layer and the like, the common components in the neural network are assembled to design a model, and when model parameters (weight matrixes of each layer) are determined so that model errors meet preset conditions or the number of the adjusted model parameters reaches a preset threshold, the model converges.

The convolution layer is configured with a plurality of convolution kernels, and each convolution kernel is provided with a corresponding step length so as to carry out convolution operation on the image. The convolution operation aims to extract different features of an input image, a first layer of convolution layer can only extract some low-level features such as edges, lines, angles and other levels, and a deeper convolution layer can iteratively extract more complex features from the low-level features.

The inverse convolutional layer is used to map a space with a low dimension to a space with a high dimension, while maintaining the connection relationship/mode therebetween (the connection relationship here refers to the connection relationship during convolution). The reverse convolution layer is configured with a plurality of convolution kernels, and each convolution kernel is provided with a corresponding step length so as to perform deconvolution operation on the image. Generally, a frame library (for example, a PyTorch library) for designing a neural network is built in an upscale () function, and a low-dimensional to high-dimensional spatial mapping can be realized by calling the upscale () function.

Pooling (posing) is a process that mimics the human visual system in that data can be reduced in size or images can be represented with higher level features. Common operations of pooling layers include maximum pooling, mean pooling, random pooling, median pooling, combined pooling, and the like. Generally, pooling layers are periodically inserted between convolutional layers of a neural network to achieve dimensionality reduction.

The normalization layer is used to perform normalization operations on all neurons in the middle layer to prevent gradient explosion and gradient disappearance.

(2) Loss function

In the process of training the neural network, because the output of the neural network is expected to be as close as possible to the value really expected to be predicted, the weight matrix of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (however, an initialization process is usually carried out before the first updating, namely parameters are configured in advance for each layer in the neural network), for example, if the predicted value of the network is high, the weight matrix is adjusted to be lower in prediction, and the adjustment is carried out continuously until the neural network can predict the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the neural network becomes a process of reducing the loss as much as possible.

Before describing the embodiments of the present application, a brief description is first given of the age detection method known to the inventor of the present application, so that it is convenient to understand the embodiments of the present application in the following.

In some age detection methods, training samples are obtained, and each sample is a face image labeled with a corresponding age label; determining the age label distribution corresponding to the age label of each sample; and training the age estimation model based on the sample and the age label distribution until the total loss function of the age estimation model converges.

The method is a classification method based on statistics, and when the method is applied to practical application, the age prediction result of the same person under different expressions or shooting angles fluctuates greatly due to different facial expressions, shooting angles and the like, and the stability and the accuracy are lacked.

In view of the foregoing problems, embodiments of the present application provide a method for training an age detection model, an age detection method, an electronic device, and a storage medium, where the training method trains a neural network by using a plurality of face images and corresponding age interference information. Therefore, the neural network can learn the characteristics of the face image and the age interference characteristics in the training process, namely clearly distinguish the age interference characteristics for interfering with age prediction and the effective characteristics influencing the age prediction accuracy, so that the effective characteristics have higher identifiability, the interference characteristics can be filtered out in the age prediction process, the influence of the interference characteristics on network learning is reduced, and the trained age detection model can accurately and stably detect the age.

An exemplary application of the electronic device for training the age detection model or for age detection provided in the embodiment of the present application is described below, and it is understood that the electronic device may train the age detection model, and may also perform age detection on a face image by using the age detection model.

The electronic device provided by the embodiment of the application can be a server, for example, a server deployed in the cloud. When the server is used for training the age detection model, the training set is adopted to carry out iterative training on the neural network according to the training set and the neural network provided by other equipment or technicians in the field, and final model parameters are determined, so that the neural network configures the final model parameters, and the age detection model can be obtained. When the server is used for age detection, a built-in age detection model is called, corresponding calculation processing is carried out on the face image to be detected provided by other equipment or a user, and the corresponding age is obtained.

The electronic device provided by some embodiments of the present application may be various types of terminals such as a notebook computer, a desktop computer, or a mobile device. When the terminal is used for training the age detection model, a person skilled in the art inputs the prepared training set into the terminal, designs a neural network on the terminal, and iteratively trains the neural network by using the training set by the terminal to determine final model parameters, so that the neural network configures the final model parameters, and the age detection model can be obtained. When the terminal is used for age detection, a built-in age detection model is called, corresponding calculation processing is carried out on a face image to be detected input by a user, and the age corresponding to the face image to be detected is obtained.

By way of example, referring to fig. 1, fig. 1 is a schematic view of an application scenario of the age detection system provided in the embodiment of the present application, and the terminal 10 is connected to the server 20 through a network, where the network may be a wide area network or a local area network, or a combination of the two.

The terminal 10 may be used to obtain training sets and build neural networks, for example, one skilled in the art downloads the prepared training sets on the terminal and builds a network structure for the neural network. It is understood that the terminal 10 may also be used to obtain a face image, for example, a user inputs the face image through an input interface, and the terminal automatically obtains the face image after the input is completed; for example, the terminal 10 is provided with a camera through which a face image is captured.

In some embodiments, the terminal 10 locally executes the method for training the age detection model provided in this embodiment to complete training the designed neural network by using the training set, and determine the final model parameters, so that the neural network configures the final model parameters, and the age detection model can be obtained. In some embodiments, the terminal 10 may also send, to the server 20 through the network, a training set stored on the terminal by a person skilled in the art and a constructed neural network, the server 20 receives the training set and the neural network, trains the designed neural network with the training set, determines final model parameters, and then sends the final model parameters to the terminal 10, and the terminal 10 stores the final model parameters, so that the neural network configuration can obtain the final model parameters, that is, the age detection model can be obtained.

In some embodiments, the terminal 10 locally executes the age detection method provided in this embodiment to provide an age detection service for the user, invokes a built-in age detection model, and performs corresponding calculation processing on the facial image to be detected and the age interference information to obtain the age corresponding to the facial image to be detected. In some embodiments, the terminal 10 may also send, to the server 20 through the network, the facial image to be detected and the age interference information input by the user on the terminal, and the server 20 receives the facial image to be detected and the age interference information, invokes a built-in age detection model to perform corresponding calculation processing on the facial image to be detected and the age interference information, to obtain the age corresponding to the facial image to be detected, and then sends the age to the terminal 10. The terminal 10 displays the age on its own display interface to inform the user of the age.

The structure of the electronic device in the embodiment of the present application is described below, and fig. 2 is a schematic structural diagram of the electronic device 500 in the embodiment of the present application, where the electronic device 500 includes at least one processor 510, a memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 may be capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating with other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including Bluetooth, Wireless Fidelity (WiFi), and Universal Serial Bus (USB), among others;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

As can be understood from the foregoing, the method for training an age detection model and the age detection method provided in the embodiments of the present application may be implemented by various types of electronic devices with computing processing capabilities, such as an intelligent terminal and a server.

The method for training the age detection model provided by the embodiment of the present application is described below with reference to an exemplary application and implementation of the server provided by the embodiment of the present application. Referring to fig. 3, fig. 3 is a schematic flowchart of a method for training an age detection model according to an embodiment of the present application.

Referring to fig. 3 again, the method S100 may specifically include the following steps:

s10: and acquiring a training set, wherein the training set comprises a plurality of face images, and the real age is marked on each face image.

The training set comprises a number of face images, wherein each face image comprises a face. It is understood that in the training set, each face image is labeled with a real age, and the range of ages that each real age can cover includes 1-100 years.

The real age of each face image may be labeled by a thermal coding labeling method, for example, if the age range to be detected is between 1 and 100, the 100-dimensional vector is used as the real age corresponding to one face image. It will be appreciated that the use of heat-coded labeling is a common practice for those skilled in the art and will not be described in detail herein.

In some embodiments, the terminal or the server may perform normalization processing on the face images in the training set, which is beneficial to improving the convergence speed and model accuracy of subsequent model training. Specifically, in some embodiments, the size of the face image may be set to 256 × 256, translating the range of pixel values of the face image from 0-255 to between 0-1.

In some embodiments, the number of face images in the training set is ten thousand, which may be 20000, for example, which is beneficial for training to obtain an accurate general model. The number of face images can be determined by those skilled in the art according to actual situations.

S20: and acquiring age interference information corresponding to each face image in the training set.

It is understood that the face skin state, the eye spirit, and the like in the face image are effective information for judging the age. Besides the effective information, the face image also comprises age interference information. The age interference information is an interference factor for interfering the neural network to accurately detect the age. It can be understood that the human face of the same person can present different states due to different shooting angles, and therefore learning and judgment of the neural network can be influenced; the human face of the same person shows different states due to different expressions, so that the learning and judgment of the neural network can be influenced. In some embodiments, the age disturbance information includes photographing information and/or expression information.

In some embodiments, the age interference information may also include hair style information. It can be understood that different hairstyles can shield the face to different degrees, and the facial quality or gender is changed. Thus, when the same person, having a different hairstyle, the neural network may give a different age prediction.

Considering the influence of the age interference information on the neural network, here, the age interference information corresponding to each face image in the training set is obtained. In some embodiments, the age disturbance information includes photographing information and expression information. The shooting information includes a pitch angle, a yaw angle, a roll angle, and the like. The expression information includes laughter, smile or dilemma, etc. For example, the age interference information of a 32-year-old face image is: pitch angle 45 °, yaw angle 10 °, roll angle 10 °, laugh (expression).

S30: and coding the age interference information to obtain an information characteristic code.

It is understood that the age interference information includes text data and numerical data. In order to convert the age interference information into numerical data which can be learned by a neural network, the age interference information is coded into an information characteristic code before training. It will be appreciated that the information characteristic code is numerical data, which may be a vector, for example. Thus, the neural network can learn age-related information reflected by the information feature code.

In some embodiments, referring to fig. 4, the step S30 specifically includes:

s31: and coding the text data in the age interference information to obtain a text code.

S32: and splicing the text code and numerical data in the age interference information to obtain an information characteristic code.

It will be appreciated that the text data may include expressions, angle names, etc., such as the text "pitch angle, yaw angle, roll angle, laugh" described above, etc. In some embodiments, the text data may be encoded by thermal encoding or the like to obtain a corresponding text code.

In some embodiments, the step S31 specifically includes: and coding the text data in the age interference information by adopting a bag-of-words model to obtain a text code.

The word bag model is to select words in the text data and put the words into a word bag, count the times of all the words in the word bag appearing in the text data and express the words in a vector form. In this embodiment, text data in age interference information of all face images in a training set is first integrated to construct a dictionary. It will be appreciated that the dictionary includes words included in the text data in the age interference information of all face images in the training set, the dictionary includes word position information and corresponding words, and for example, the constructed dictionary may be { 1: pitch angle, 2: yaw angle, 3: roll angle, 4: laugh, 5: heart injury, 6: smile }, in which the number preceding the word is the word location information.

In this embodiment, for example, for the text data "pitch angle, yaw angle, roll angle, laugh", the text code obtained after the dictionary code in the above example is adopted may be "111100". It can be seen from the text code "111100" that "pitch angle, yaw angle, roll angle, laugh" occur once in the text data, and "hurry heart, smile" occur zero times in the text data.

And after the text code corresponding to the text data in the age interference information is acquired, splicing the text code with the numerical data in the age interference information to obtain the information characteristic code. The splicing mode of the text code and the numerical data can be horizontal splicing.

It is to be understood that numerical data in the age interference information may include various angle values. For example, 45 ° for pitch angle, 10 ° for yaw angle, and 10 ° for roll angle. In some embodiments, the 45 ° is 000101101 and the 10 ° is 000001010, so that after the text code "111100" is spliced with the numerical data "000101101, 000001010, 000001010", the resulting information characteristic code may be "111100000101101000001010000001010".

In the embodiment, the text data is coded, and then the coded text code is spliced with the numerical data, so that the numeralization of the age interference information is realized, and the subsequent study on a neural network is facilitated.

S40: and performing iterative training on the neural network by adopting the training set and the information characteristic codes corresponding to the face images in the training set to obtain an age detection model.

It can be understood that, in the training process, the neural network learns the image features of a plurality of face images in the training set and the interference features reflected by the corresponding information feature codes, makes an age prediction, and reversely adjusts the model parameters based on the predicted age and the real age, that is, adjusts the feature extraction parameters of the face images, so that the neural network can extract and learn effective features, such as skin features and the like, which affect the accuracy of the age prediction.

In addition, the interference features reflected by the information feature codes can tell the human face in the neural network human face image that the interference exists, so that the interference features can be ignored when the age prediction is carried out, and the influence of the interference features on the age prediction is avoided. For example, when the information feature code reflects shooting information and expression information, the information feature code can inform a neural network of the existence of expressions and shooting angles of a face in a face image, and when learning features in the face image, interference features such as the expressions and the shooting angles should be filtered out, so that attention is focused on effective features influencing the accuracy of age prediction.

After multiple times of iterative training and parameter adjustment, the neural network converges to obtain the age detection model. The information feature coding can help the neural network focus attention on effective features influencing the accuracy of age prediction, so that the converged age detection model can accurately and stably detect the age.

It is understood that convergence may refer to the fact that under a certain model parameter, the sum of the differences between the actual ages and the predicted ages in the training set is smaller than a preset threshold or fluctuates within a certain range.

In some embodiments, the adam algorithm is used to optimize the model parameters, for example, the number of iterations is set to 10 ten thousand, the initial learning rate is set to 0.001, the weight attenuation of the learning rate is set to 0.0005, and the learning rate is attenuated to 1/10 of the original learning rate every 1000 iterations, where the learning rate, the difference between each real age in the training set and the corresponding predicted age can be input into the adam algorithm to obtain an adjusted model parameter output by the adam algorithm, and the adjusted model parameter is used to perform the next training until the model parameter of the converged neural network is output after the training is completed. Thus, the converged neural network serves as an age detection model.

In this embodiment, the neural network can learn the characteristics of the face image and the age interference characteristics in the training process, that is, clearly distinguish the age interference characteristics and the effective characteristics affecting the accuracy of age prediction, so that the effective characteristics have higher discriminability, thereby filtering the interference characteristics when age prediction is performed, reducing the influence of the interference characteristics on network learning, and enabling the trained age detection model to accurately and stably detect the age.

In some embodiments, before the foregoing step S40, the method further comprises:

s50: and performing feature extraction on the information feature codes by adopting a multilayer perceptron module to obtain target information vectors.

It is understood that the multi-layered perceptron module includes an input layer, a multi-layered hidden layer, and an output layer, wherein the input layer includes N neurons, the hidden layer includes Q neurons, and the output layer includes K neurons. The operation of each layer may be described by a functional expression, it being understood that the functional expression differs for each layer.

It will be appreciated that if the input information characteristic code is represented by x, the input layer is fed to the hidden layer x, and the output of the hidden layer may be f (w) ₁ x+b ₁ ) Wherein w is ₁ Is a weight, b ₁ Is an offset, the function f may be a commonly used sigmoid function or tanh function. The hidden layer to the output layer is equivalent to a multi-class logistic regression, namely softmax regression, so that the output of the output layer is softmax (w) ₂ x ₁ +b ₂ ) Wherein x is ₁ F (w) output for hidden layer ₁ x+b ₁ )。

Therefore, the multi-layer perceptron module can be represented by the following formula:

wherein G represents the softmax activation function, h represents the number of hidden layers, and W ⁱ And b ⁱ Representing the weights and offsets of the ith hidden layer. x represents the input information characteristic code. W ¹ And b ¹ Represents the weights and offsets of the input layers, S represents the activation function, and mlp (x) represents the target information vector.

In some embodiments, K may be 1024, so that the output layer outputs a one-dimensional vector with length of 1024, i.e. a target information vector with length of 1024.

Each layer of the multilayer perceptron module uses an activation function, and can introduce nonlinear factors into neurons, so that the module can approach any nonlinear function at will, and further, more nonlinear models can be utilized. The multi-layer perceptron module has good feature extraction capability on discrete information, so that the extracted target information vector can sufficiently reflect the features of information feature codes, namely the features of age interference information.

In this embodiment, the step S40 specifically includes:

s41: and performing iterative training on the neural network and the multilayer perceptron module by adopting a training set and target information vectors corresponding to the training set to obtain an age detection model.

In the embodiment, the target-based information vector is extracted through the multilayer perceptron module, and the method has the advantages of low dimensionality, small feature granularity and the like, so that the neural network learning can be facilitated, and the accuracy and the training speed of the model can be improved.

In some embodiments, referring to fig. 6, the neural network includes a cascade of convolution modules, a fully-connected layer, a fusion layer, and a classification layer. The convolution module comprises a plurality of convolution layers and is used for performing down-sampling feature extraction on the input face image. The full connection layer is used for integrating and classifying the input characteristic diagram and outputting a one-dimensional vector. The fusion layer is used for performing fusion processing on at least two characteristics. The classification layer is used for converting the input vector into a probability vector with the value between 0 and 1, thereby realizing classification. It is worth noting that convolutional layers, fully-connected layers, and classification layers are common components in neural networks, and are well known to those skilled in the art and will not be described in detail herein.

In this embodiment, referring to fig. 5, the step S41 specifically includes:

s411: and inputting the training set into a convolution module of the neural network, and outputting an age characteristic diagram by the last convolution layer of the convolution module.

In some embodiments, referring to fig. 6, the convolution module includes a plurality of convolution layers, each convolution layer is configured with a pooling layer for implementing dimensionality reduction, the convolution layers are configured with convolution kernels of 3 × 3 size, the step size is set to 2, the number of convolution kernels in some convolution layers is 32,64,128,256,512, respectively, and the size of the output feature map is 256 × 32, 128 × 128, 64 × 64, 64 × 128, 32 × 256, 16 × 512, respectively. It can be understood that the feature map output by the last convolutional layer in the convolutional module is an age feature map.

S412: and inputting the age characteristic diagram into a full connection layer of the neural network to obtain an age characteristic vector.

The full-connection layer based one-dimensional vector is used for integrating and classifying the input features and outputting one-dimensional vector, so that the age feature map is processed by the full-connection layer to obtain one-dimensional vector, namely the age feature vector.

S413: and inputting the age characteristic vector and the target information vector into a fusion layer for fusion, and inputting the fusion vector obtained by fusion into a classification layer for age classification to obtain the predicted age.

And the fusion layer is used for fusing at least two features, so that after the age feature vector and the target information vector are input into the fusion layer, the obtained fusion vector has the image feature reflected by the age feature vector and the interference feature reflected by the target information vector.

Referring again to FIG. 6, the fused vector is processed by the classification layer and then converted into a probability vector with a value between 0 and 1. It can be understood that the probability vector is a vector representation of the predicted ages, wherein the elements in the probability vector are probabilities of the human face in the human face image being of each age. It is understood that the age with the highest probability is the predicted age.

Because the predicted age is obtained by classifying the fusion vector, the neural network can learn the characteristics of the face image and the age interference characteristics, namely the age interference characteristics and the effective characteristics influencing the age prediction accuracy are clearly distinguished, so that the effective characteristics have higher identifiability, the interference characteristics can be filtered out when the age prediction is carried out, and the influence of the interference characteristics on network learning is reduced.

In some embodiments, the aforementioned "inputting the age feature vector and the target information vector into the fusion layer for fusion includes: and multiplying and fusing the age characteristic vector and the target information vector to obtain a fused vector.

It can be understood that, the multiplication fusion is that the age characteristic vector and the target information vector are multiplied by corresponding positions, so that the weight of the target information in the age characteristic vector is increased, that is, the weight of the target information in the obtained fusion vector is great, the identification is easy to distinguish, and the neural network is favorable for clearly distinguishing the age interference characteristic and the effective characteristic influencing the age prediction accuracy.

S414: and calculating the loss between each real age and the corresponding predicted age by adopting a loss function, and adjusting model parameters of the neural network and the multilayer perceptron module according to the loss until convergence to obtain an age detection model.

Here, the loss function may be configured in the terminal by a person skilled in the art, the configured loss function is sent to the server along with the neural network and the multi-layer perceptron module, after the server processes the predicted age corresponding to each face image in the training set, the server calculates the loss between each real age and the predicted age in the training set by using the loss function, and iteratively trains the neural network and the multi-layer perceptron module based on the loss until convergence to obtain the age detection model.

In this embodiment, a loss function is used to calculate the difference between the true age and the predicted age corresponding to each face image in the training set. The loss function has been described in detail in the above "noun introduction (2)", and will not be described again. It is understood that the structure of the loss function can be set according to actual situations based on different network structures and training modes.

In some embodiments, the loss function comprises:

P _i ＝log(Y _ci )

wherein, Y _ci E (1,2, …, n) is the probability corresponding to the i-th year of the predicted years, Y _Ti The probability corresponding to the i-th age in the real ages is n, and the maximum value of the ages is n.

The loss function has a strong ability to distinguish between classes, and thus, gradient dispersion can be effectively avoided.

In summary, in the method for training an age detection model provided in the embodiment of the present application, a training set includes a plurality of facial images marked with real ages, and age interference information corresponding to each facial image in the training set is first obtained, for example, the age interference information includes shooting information and/or expression information. And then, coding the interference information of each age to obtain an information characteristic code. And finally, carrying out iterative training on the neural network by adopting a training set and information characteristic codes corresponding to the face images in the training set to obtain an age detection model. In this embodiment, the neural network can learn the characteristics of the face image and the age interference characteristics in the training process, that is, clearly distinguish the age interference characteristics and the effective characteristics affecting the accuracy of age prediction, so that the effective characteristics have higher discriminability, thereby filtering the interference characteristics when age prediction is performed, reducing the influence of the interference characteristics on network learning, and enabling the trained age detection model to accurately and stably detect the age.

After the age detection model is trained by the method for training the age detection model provided by the embodiment of the application, the age detection model can be applied to age detection. The age detection method provided by the embodiment of the application can be implemented by various electronic devices with computing processing capacity, such as an intelligent terminal, a server and the like.

The age detection method provided by the embodiment of the present application is described below with reference to an exemplary application and implementation of the terminal provided by the embodiment of the present application. Referring to fig. 7, fig. 7 is a schematic flowchart of an age detection method according to an embodiment of the present disclosure. The method S30 may include the steps of:

s31: and acquiring a human face image to be detected.

Here, the face image to be detected refers to a face image of an age to be detected. It is understood that the face image to be detected includes a human face.

In specific implementation, a clothing classification assistant (application software) built in a terminal (for example, a smart phone) acquires the image of the face to be detected. The face image to be detected can be shot by a terminal or input by a user into the terminal.

S32: and acquiring age interference information corresponding to the face image to be detected.

It can be understood that the skin state, the eye spirit and the like of the face in the face image to be detected are effective information for judging the age. Besides the effective information, the face image to be detected also comprises age interference information. The age interference information is image information that interferes with effective information of age detection model identification.

S33: and coding age interference information corresponding to the face image to be detected to obtain information characteristic codes corresponding to the face image to be detected.

It is understood that the age interference information includes text data and numerical data. In order to convert the age interference information into numerical data which can be calculated by an age detection module, the age interference information is coded into an information characteristic code before detection. The specific implementation of step S33 may refer to the encoding manner in step S30 in the training example, and will not be described again here.

S33: and inputting the face image to be detected and the age interference information corresponding to the face image to be detected into an age detection model for age detection to obtain the age corresponding to the face image to be detected.

Here, the age detection model refers to an age detection model obtained by training the method embodiments of fig. 3 to fig. 6, and has the same structure and function as the age detection model in the embodiments described above, and is not repeated here.

Embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for causing an electronic device to perform a method for training an age detection model provided in embodiments of the present application, for example, a method for training an age detection model as shown in fig. 3 to 6, or an age detection method provided in embodiments of the present application.

In some embodiments, the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EE PROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a HyperText markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device (a device including a smart terminal and a server), or on multiple computing devices located at one site, or distributed across multiple sites interconnected by a communication network.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes a method for training an age detection model or an age detection method as in the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; within the context of the present application, where technical features in the above embodiments or in different embodiments can also be combined, the steps can be implemented in any order and there are many other variations of the different aspects of the present application as described above, which are not provided in detail for the sake of brevity; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of training an age detection model, comprising:

2. The method of claim 1, wherein said encoding the age interference information to obtain an information characteristic code comprises:

coding the text data in the age interference information to obtain a text code;

3. The method of claim 2, wherein encoding the text data in the age interference information to obtain a text code comprises:

4. The method according to any one of claims 1 to 3, wherein before performing iterative training on a neural network by using the training set and the information feature codes corresponding to the face images in the training set to obtain an age detection model, the method further comprises:

adopting the training set and the information characteristic codes corresponding to the face images in the training set to carry out iterative training on a neural network to obtain an age detection model, comprising the following steps:

5. The method of claim 4, wherein the neural network comprises a cascade of convolution modules, fully-connected layers, fused layers, and classified layers, wherein the convolution modules comprise a plurality of convolution layers;

6. The method of claim 5, wherein said inputting said age feature vector and said target information vector into said fusion layer for fusion comprises:

and multiplying and fusing the age characteristic vector and the target information vector to obtain the fusion vector.

7. The method of claim 5, wherein the loss function comprises:

P _i ＝log(Y _ci )

wherein Y is _ci E (1,2, …, n) is the probability that the ith age corresponds to the predicted age, Y _Ti And n is the probability corresponding to the ith age in the real ages, and is the maximum age.

8. An age detection method, comprising:

acquiring a human face image to be detected;

9. An electronic device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor, wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

10. A computer-readable storage medium having computer-executable instructions stored thereon for causing a computer device to perform the method of any one of claims 1-8.