CN112183283A

CN112183283A - Age estimation method, device, equipment and storage medium based on image

Info

Publication number: CN112183283A
Application number: CN202011001040.XA
Authority: CN
Inventors: 王森
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2021-01-05

Abstract

The present disclosure relates to an image-based age estimation method, apparatus, device, and storage medium. The age of the target object in the target image is estimated through the trained neural network model, the neural network model is obtained by training according to the first sample image with the age label and the second sample image corresponding to each preset age group in the plurality of preset age groups, namely, when the neural network model is trained, not only the sample image of the exact age is needed, but also the sample image of the preset age group is needed, the collection cost of the sample image of the preset age group is far lower than that of the sample image of the exact age, and the training cost of the neural network model is reduced. The number of the sample images used for training the neural network model can be increased by adding a large number of sample images of the preset age group for auxiliary training, so that the accuracy of the trained neural network model is improved, and the accuracy of the neural network model for predicting the age is improved.

Description

Age estimation method, device, equipment and storage medium based on image

Technical Field

The present disclosure relates to the field of information technology, and in particular, to a method, an apparatus, a device, and a storage medium for age estimation based on an image.

Background

In the field of computer vision, age estimation based on face images is an important branch. Specifically, the accurate age of the person in the image is estimated according to the face image, and the method has important application value in the fields of man-machine interaction, user portrait, video monitoring, information recommendation and the like.

The current age estimation method based on the face image needs a large amount of labeled face image data to train a neural network model, wherein the label can be the accurate age of a person in the image. However, the labeled face images currently exist in a small amount, and the collection cost is very high, so that the images are not easy to acquire. Therefore, the cost of training the neural network model is too high. In addition, the accuracy of the trained neural network model is low due to the fact that the number of face images used for training the neural network model is small, and therefore the accuracy of age estimation is reduced.

Disclosure of Invention

In order to solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides an image-based age estimation method, apparatus, device, and storage medium to reduce training cost of a neural network model and improve accuracy of a neural network model in predicting an age.

In a first aspect, an embodiment of the present disclosure provides an age estimation method based on an image, including:

acquiring a target image, wherein the target image comprises a target object;

inputting the target image into a trained neural network model, wherein the neural network model is obtained by training according to a first sample image with an age label and a second sample image corresponding to each preset age group in a plurality of preset age groups, the neural network model comprises a main convolutional neural network layer, an age group classification layer and an exact age classification layer, the main convolutional neural network layer is used for carrying out feature extraction on the target image to obtain target feature information corresponding to the target image, the age group classification layer is used for determining age group probability information of the target object according to the target feature information, and the exact age classification layer is used for determining age probability information of the target object according to the target feature information;

and determining the age of the target object according to the age group probability information of the target object and the age probability information of the target object.

In a second aspect, an embodiment of the present disclosure provides a neural network model training method, where the neural network model includes: a main convolutional neural network layer, an age classification layer and an exact age classification layer; the method comprises the following steps:

acquiring a first sample image and an age of a first subject in the first sample image;

acquiring a second sample image corresponding to each preset age group according to the plurality of preset age groups;

performing model training on a main body convolutional neural network layer, an age classification layer and an exact age classification layer in the neural network model according to the first sample image and the age of a first object in the first sample image;

and performing model training on a main body convolution neural network layer and an age group classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image.

In a third aspect, an embodiment of the present disclosure provides an image-based age estimation apparatus, including:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target image which comprises a target object;

the input module is used for inputting the target image into a trained neural network model, the neural network model is obtained by training according to a first sample image with an age label and a second sample image corresponding to each preset age group in a plurality of preset age groups, the neural network model comprises a main convolutional neural network layer, an age group classification layer and an exact age classification layer, the main convolutional neural network layer is used for carrying out feature extraction on the target image to obtain target feature information corresponding to the target image, the age group classification layer is used for determining age group probability information of the target object according to the target feature information, and the exact age classification layer is used for determining age probability information of the target object according to the target feature information;

and the determining module is used for determining the age of the target object according to the age group probability information of the target object and the age probability information of the target object.

In a fourth aspect, an embodiment of the present disclosure provides a neural network model training apparatus, where the neural network model includes: a main convolutional neural network layer, an age classification layer and an exact age classification layer; the neural network model training device comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first sample image and the age of a first object in the first sample image;

the second acquisition module is used for acquiring a second sample image corresponding to each preset age group according to the plurality of preset age groups;

a first training module, configured to perform model training on a main convolutional neural network layer, an age classification layer, and an exact age classification layer in the neural network model according to the first sample image and an age of a first object in the first sample image;

and the second training module is used for carrying out model training on a main body convolution neural network layer and an age group classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image.

In a fifth aspect, an embodiment of the present disclosure provides an image-based age estimation apparatus, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.

In a sixth aspect, an embodiment of the present disclosure provides a neural network model training device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the second aspect.

In a seventh aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect or the second aspect.

According to the image-based age estimation method, the image-based age estimation device, the image-based age estimation equipment and the storage medium, the age of a target object in a target image is estimated through a trained neural network model, the neural network model is obtained according to a first sample image with an age label and a second sample image corresponding to each preset age group in a plurality of preset age groups, namely, when the neural network model is trained, not only sample images of the exact age but also sample images of the preset age groups are needed, the collection cost of the sample images of the preset age groups is far lower than that of the sample images of the exact age, and the sample images of the preset age groups are easier to obtain, so that the training cost of the neural network model is reduced. In addition, in the process of training the neural network model by adopting a small amount of sample images with definite ages, the number of the sample images used for training the neural network model can be increased by adding a large amount of sample images with preset ages for auxiliary training, so that the accuracy of the trained neural network model is improved. In addition, the neural network model comprises a main body convolutional neural network layer, an age classification layer and an exact age classification layer, wherein the main body convolutional neural network layer is used for carrying out feature extraction on the target image to obtain target feature information corresponding to the target image, the age classification layer is used for predicting the age of the target object, and the exact age classification layer is used for predicting the age of the target object. Therefore, according to the age group probability information of the target object and the age probability information of the target object, the accuracy of determining the age of the target object is higher, namely the accuracy of predicting the age by the neural network model is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a neural network model training method provided in an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a neural network model training method provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a neural network model provided in an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a neural network model training method provided in an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a probability density function provided by an embodiment of the present disclosure;

fig. 7 is a flowchart of an image-based age estimation method provided by an embodiment of the present disclosure;

fig. 8 is a flowchart of an image-based age estimation method provided by an embodiment of the present disclosure;

fig. 9 is a flowchart of an image-based age estimation method provided by an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an image-based age estimation apparatus provided in an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a neural network model training apparatus provided in an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an image-based age estimation device provided in an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a neural network model training device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

The current age estimation method based on the face image needs a large amount of labeled face image data to train a neural network model, wherein the label can be the accurate age of a person in the image. However, the labeled face images currently exist in a small amount, and the collection cost is very high, so that the images are not easy to acquire. Therefore, the cost of training the neural network model is too high. In addition, the accuracy of the predicted age of the trained neural network model is low because fewer labeled face images are used to train the neural network model. To address this problem, embodiments of the present disclosure provide an image-based age estimation method, which is described below with reference to specific embodiments.

Specifically, the image-based age estimation method may be performed by a terminal or a server. Specifically, the terminal or the server may estimate the age of the target object in the target image through the neural network model. The execution subject of the training method of the neural network model and the execution subject of the age estimation method may be the same or different.

For example, in one application scenario, as shown in FIG. 1, the server 12 trains a neural network model. The terminal 11 acquires the trained neural network model from the server 12, and the terminal 11 estimates the age of the target object in the target image by using the trained neural network model. The target image may be captured by the terminal 11. Alternatively, the target image is acquired by the terminal 11 from another device. Still alternatively, the target image is an image obtained by image processing of a preset image by the terminal 11, where the preset image may be obtained by shooting by the terminal 11, or the preset image may be obtained by the terminal 11 from another device. Here, the other devices are not particularly limited.

In another application scenario, the server 12 trains a neural network model. Further, the server 12 estimates the age of the target object in the target image through the trained neural network model. The manner in which the server 12 acquires the target image may be similar to the manner in which the terminal 11 acquires the target image as described above, and will not be described herein again.

In yet another application scenario, the terminal 11 trains a neural network model. Further, the terminal 11 estimates the age of the target object in the target image through the trained neural network model.

It can be understood that the neural network model training method and the image-based age estimation method provided by the embodiment of the disclosure are not limited to the several possible scenarios described above. Since the trained neural network model is applicable to the image-based age estimation method, the neural network model training method may be described before the image-based age estimation method is described.

Taking the training of the neural network model by the server 12 as an example, a training method of the neural network model, that is, a training process of the neural network model, is introduced below. It is understood that the neural network model training method is also applicable to the scenario in which the terminal 11 trains the neural network model.

It can be understood that the image-based age estimation method provided by the embodiment of the present disclosure can be used not only to estimate the age of a person but also to estimate the age of an animal. If an age estimation method is used to estimate the age of a person, the sample images used to train the neural network model may be images including the person, for example, face images. If an age estimation method is used to estimate the age of a certain animal, the sample images used to train the neural network model may be images that include the certain animal. The embodiment of the present disclosure is schematically illustrated by taking an example of estimating the age of a person. The method of estimating the age of an animal is similar to the method of estimating the age of a human.

Fig. 2 is a neural network model training method provided in the embodiment of the present disclosure. The neural network model includes: a main convolutional neural network layer, an age classification layer and an exact age classification layer; the method comprises the following steps as shown in fig. 2:

s201, acquiring a first sample image and the age of a first object in the first sample image.

In this embodiment, the first sample image may specifically be an image including a person or a face image. Accordingly, the first object in the first sample image may specifically be a person or a face. The age of the person or face in the first sample image is accurate and known. The age of the human face can be understood as the age of the person. The age of a human face or the age of a person may refer to a specific age value.

If the first sample image is an image including a person, a face image in the first sample image can be extracted by adopting an image processing method. And further, training a neural network model according to the human face image. Or, further, after the face image is subjected to image processing to adjust parameters such as the definition and/or position of the face in the face image, training the neural network model according to the adjusted face image.

If the first sample image is a face image, the neural network model may be trained from the face image. Or, further, after the face image is subjected to image processing to adjust parameters such as the definition and/or position of the face in the face image, training the neural network model according to the adjusted face image.

Optionally, the first sample image is a face image, and the first object in the first sample image is a face.

The present embodiment is schematically described by taking an example in which the first sample image is a face image. The number of the first sample images may be one or more. The manner in which the server 12 obtains the first sample image includes the following possible implementations:

in one possible implementation, the server 12 may store a plurality of first sample images in advance. Alternatively, the server 12 may acquire a plurality of first sample images from the terminal 11 or other servers. It is to be understood that the number of the terminals 11 may be one or more.

In another possible implementation manner, the server 12 may store a plurality of preset images in advance. Alternatively, the server 12 acquires a plurality of preset images from the terminal 11 or other servers. Each preset image includes one or more characters. Further, the server 12 may use a face recognition technology to recognize the face of the person from each preset image, and use an image processing method to obtain the face image in each preset image. For example, a face image of a target person in each preset image may be acquired, the target person may be a person selected by the user, or may be a person satisfying a preset condition. Alternatively, a face image with the definition greater than or equal to a preset threshold in each preset image may be acquired. Or acquiring a face image of each person in the preset image, further performing image processing on the face image of each person to adjust the definition and/or position of the face, and keeping the face image with the definition being greater than or equal to a preset threshold value and/or keeping the face image with the face position at the preset position. The method for acquiring the face image from the preset image includes but is not limited to: intercepting, copying, cutting and the like. Further, the server 12 takes a face image obtained from a preset image thereof as a first sample image.

Specifically, the plurality of first sample images and the age of the human face in each first sample image may constitute a first data set, which may be denoted as D1, and D1 may be expressed as the following formula (1):

wherein i is more than or equal to 0<n1, n1 denotes the first data setThe number of the first sample images,

representing the ith first sample image in the first data set.

Indicating the age of the face in the ith first sample image. The age may be the exact age. The exact age may also be noted as the annotation age.

S202, according to the multiple preset age groups, obtaining a second sample image corresponding to each preset age group.

In the embodiment of the present disclosure, the second sample image may specifically be an image including a person or a face image. Accordingly, the second object in the second sample image may be a person or a face. However, unlike the first sample image described above, the age of the person or face in the second sample image is not accurate and known, but the age group of the person or face in the second sample image is known.

Similarly, if the second sample image is an image including a person, the face image in the second sample image may be extracted by using an image processing method. And further, training a neural network model according to the human face image. Or, further, after the face image is subjected to image processing to adjust parameters such as the definition and/or position of the face in the face image, training the neural network model according to the adjusted face image.

If the second sample image is a face image, the neural network model may be trained from the face image. Or, further, after the face image is subjected to image processing to adjust parameters such as the definition and/or position of the face in the face image, training the neural network model according to the adjusted face image.

Optionally, the second sample image is a face image, and the second object in the second sample image is a human face. Optionally, one face image includes one face.

The present embodiment is schematically described by taking an example in which the second sample image is a face image. The number of the second sample images may be one or more. The manner in which the server 12 obtains the second sample image includes several possible implementations as follows:

in a possible implementation manner, the server 12 or a database corresponding to the server 12 may store, in advance, multiple types of face images, for example, a face image of an infant, a face image of a teenager, a face image of an adolescent, a face image of a middle age, and a face image of an elderly person. Further, the server 12 obtains the face image corresponding to each type from a database corresponding to the local or server 12. The number of each type of face image acquired by the server 12 may be the same or different. Each type of face image may be used as the second sample image.

Specifically, "infant, young, middle-aged and old" can be used as labels for a plurality of predetermined age groups. The correspondence between the label and the preset age group is shown in the following table 1:

TABLE 1

In another possible implementation, the server 12 crawls second sample images corresponding to the "infant", "teenager", "youth", "middle age", and "old age", respectively, from a network such as the internet using the "infant", "teenager", "youth", "young person", "middle age", and "old age" as keywords. Specifically, for 6 types of "baby," young "or" middle-aged "or" old ", the server 12 crawls the corresponding second sample images according to a ratio of 1:1:1:1, that is, the number of the second sample images of each preset age group that the server 12 crawls from the internet is the same. For example, the server 12 crawls 100 second sample images of each preset age group from the internet.

Specifically, the plurality of second sample images and the preset age group corresponding to each second sample image may form a second data set, the second data set may be denoted as D2, and D2 may be expressed as the following formula (2):

wherein i is more than or equal to 0<n2, n2 indicates the number of second sample images in the second data set. J is not less than 0<m1, m1 indicates the number of preset age groups.

Representing the ith second sample image in the second data set.

Indicating the preset age bracket corresponding to the ith second sample image.

Further, a sample image, which may be the first sample image or the second sample image, is randomly selected from the first data set D1 and the second data set D2. Further, the sample image is input into a neural network model, and the neural network model is trained through the sample image. It can be understood that one iterative training can be performed on the neural network model through one sample image, multiple iterative training can be performed on the neural network model through a plurality of sample images, and in the multiple iterative training process, parameters of the neural network model can be continuously updated, so that the accuracy of the neural network model is higher and higher. The parameters of the neural network model may specifically include: parameters of the subject convolutional neural network layer, parameters of the age classification layer, and parameters of the exact age classification layer.

Since the process of training the neural network model by each sample image is similar, the process of training the neural network model by the sample image is described with reference to one sample image.

S203, according to the first sample image and the age of the first object in the first sample image, performing model training on a main body convolutional neural network layer, an age classification layer and an exact age classification layer in the neural network model.

When the sample image is the first sample image, i.e. the sample image is from the first data set D1, since the first sample image should have an exact age, the model training can be performed on the subject convolutional neural network layer, the age classification layer and the exact age classification layer in the neural network model according to the age of the first sample image and the first object in the first sample image.

And S204, performing model training on a main body convolution neural network layer and an age group classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image.

When the sample image is the second sample image, i.e., the sample image is from the second data set D2, since the second sample image has a label of age group or age group, the model training can be performed on the main convolutional neural network layer and the age group classification layer in the neural network model according to the second sample image.

The embodiment of the disclosure trains the neural network model by a first sample image corresponding to a definite age and a second sample image not corresponding to a definite age but corresponding to a preset age bracket. The neural network model comprises a main body convolution neural network layer, an age classification layer and an exact age classification layer. The first sample image and the corresponding exact age of the first sample image are used for model training of the main convolutional neural network layer, the age classification layer and the exact age classification layer. The second sample image and the preset age group corresponding to the second sample image are used for carrying out model training on the main convolutional neural network layer and the age group classification layer. Therefore, the training process of the neural network model can depend on not only the first sample image of the exact age but also the second sample image of the preset age group. The cost of collecting the second sample image is far lower than that of the first sample image, and the second sample image is easier to obtain, so that the training cost of the neural network model is reduced. In addition, when the first sample image at the definite age and the second sample image at the preset age are adopted to train the neural network model, the second sample image is easier to obtain, so that a large number of second sample images at the preset age can be added to assist in training in the process of training the neural network model by adopting a small number of first sample images at the definite age, and the accuracy of the neural network model is improved.

On the basis of the above embodiment, optionally, performing model training on the subject convolutional neural network layer, the age classification layer, and the exact age classification layer in the neural network model according to the first sample image and the age of the first subject in the first sample image, including the following steps as shown in fig. 3:

s301, inputting the first sample image into a main body convolution neural network layer in the neural network model to obtain first characteristic information corresponding to the first sample image.

Fig. 4 is a schematic structural diagram of a neural network model provided in the embodiment of the present disclosure. The neural network model includes: a main convolutional neural network layer, an age classification layer and an exact age classification layer.

When the sample image randomly selected from the first data set D1 and the second data set D2 is the first sample image, the first sample image is input into a main Convolutional Neural Networks (CNN) as shown in fig. 4. The main body convolution neural network layer is used for carrying out feature extraction on the first sample image to obtain first feature information corresponding to the first sample image.

S302, inputting the first characteristic information into the exact age classification layer in the neural network model to obtain a first prediction vector.

S303, inputting the first characteristic information into an age classification layer in the neural network model to obtain a second prediction vector.

Since the first sample image is from the first data set D1. The face in the first sample image has an exact age from which the exact age group can be determined. Therefore, the main convolutional neural network layer can input the first feature information corresponding to the first sample image to the age group classification layer and the exact age classification layer, respectively.

The exact age classification layer can output an exact age prediction vector according to the first feature information corresponding to the first sample image, and the exact age prediction vector is recorded as a first prediction vector in a training stage. The first prediction vector is an M-dimensional vector, i.e., the first prediction vector includes M element values. The present embodiment does not limit the specific value of M, and for example, M is 100. Each element value in the first prediction vector corresponds to a predetermined age. Each element value in the first prediction vector represents a probability that the age of the face in the first sample image predicted by the exact age classification layer is a preset age. The first prediction vector is denoted as P1, and P1 can be expressed as:

P1＝[p1₁，p1₂，......，p1₁₀₀]

wherein, p1₁Is the 1 st element value in the first prediction vector, p1₂Is the 2 nd element value in the first prediction vector, and so on, p1₁₀₀Is the 100 th element value in the first prediction vector. p1₁P1 corresponding to age 1₂Corresponding to age 2, and so on, p1₁₀₀Corresponding to 100 years of age. p1₁Probability that the age of a face in the first sample image, representing an exact age classification level prediction, is 1 year old, p1₂Probability of age of 2 years for a face in the first sample image representing an exact age classification level prediction, and so on, p1₁₀₀The probability that the age of a face in the first sample image, representing an exact age classification level prediction, is 100 years old. Optionally, the sum of all the element values in the first prediction vector is 1.

The age group classification layer may specifically output an age group prediction vector according to the first feature information corresponding to the first sample image, and in the training phase, the age group prediction vector is recorded as a second prediction vector. The second prediction vector is an N-dimensional vector. The present embodiment does not limit the specific value of N, for example, N may be the same as the number of preset age groups as described above, for example, N is 6. Each element value in the second prediction vector corresponds to a predetermined age group. Each element value in the second prediction vector represents the probability that the age group of the face in the first sample image predicted by the age group classification layer is a preset age group. The second prediction vector can be denoted as P2, and P2 can be expressed as:

P2＝[p2₁，p2₂，......，p2₆]

wherein, p2₁Is the value of the 1 st element in the second prediction vector, p2₂Is the 2 nd element value in the second prediction vector, and so on, p2₆Is the 6 th element value in the second prediction vector. The corresponding relationship between each element value in the second prediction vector and the preset age group is shown in the following table 2:

TABLE 2

Value of element	Presetting age bracket
		p2₁	0 (newborn) -6 years old
p2₂	7-12 years old
		p2₃	13-17 years old
p2₄	18-45 years old
		p2₅	46-69 years old
p2₆	Greater than 69 years old

In addition, p2₁Probability that the age group of the face in the first sample image predicted by the age group classification layer is 0 (birth) -6 years old, p2₂Probability that the age group of the face in the first sample image, which represents the prediction of the age group classification layer, is 7-12 years old, and so on, p2₆The probability that the age group of the face in the first sample image representing the age group classification level prediction is greater than 69 years old. Optionally, the sum of all the element values in the second prediction vector is 1.

S304, determining a first loss function according to the first prediction vector, the second prediction vector and the age of the first object in the first sample image.

Optionally, determining a first loss function according to the first prediction vector, the second prediction vector and the age of the first object in the first sample image includes: determining a first sample vector according to the age of a first object in the first sample image, wherein the first sample vector is used for representing the age of the first object; determining a second sample vector from the first sample vector, the second sample vector being used to represent the age group of the first subject; determining a first loss function based on the first prediction vector, the second prediction vector, the first sample vector, and the second sample vector.

For example, in the formula (1)

Indicating the age of the face in the ith first sample image in the first data set D1. According to

A vector may be determined, which is denoted as the first sample vector. The first sample vector may also be referred to asThe true value vector of age. The first sample vector is used to represent the exact age of the face in the first sample image. Each element value in the first sample vector is denoted as a first element value. The first sample vector may be an M-dimensional vector, i.e. the M-dimensional vector comprises M element values. Each of the M element values corresponds to a preset age. This first sample vector can be denoted as W1, and W1 can be expressed as:

W1＝[w1₁，w1₂，......，w1₁₀₀]

wherein, w1₁Is the 1 st element value in the first sample vector, w1₂Is the 2 nd element value in the first sample vector, and so on, w1₁₀₀Is the 100 th element value in the first sample vector. w1₁Corresponding to age 1, w1₂Corresponding to age 2, and so on, w1₁₀₀Corresponding to 100 years of age.

For example, the age of the face in a first sample image is 18 years, and the first sample vector corresponding to the first sample image is a 100-dimensional vector. Since the age of the face in the first sample image is exact, the 18 th element value w1 is divided in the 100-dimensional vector₁₈Except for 1, the values of the other elements are all 0. I.e. by dividing the 18 th element value w1₁₈Except for 1, a first sample vector of 100 dimensions with other element values of 0 represents that the age of the face in the first sample image is 18 years old.

Further, a second sample vector may be determined based on the first sample vector, which may also be referred to as the true value vector for the age group. The second sample vector is used to represent the age group of the face in the first sample image. Since the age of the face in the first sample image is exact, the exact age group can be determined according to the exact age, and the specific determination method can refer to the corresponding relationship in table 1. The second sample vector is denoted as W2, and W2 can be expressed as:

W2＝[w2₁，w2₂，......，w2₆]

wherein, w2₁Is the value of the 1 st element in the second sample vector, w2₂Is the first in the second sample vector2 element values, and so on, w2₆Is the 6 th element value in the second sample vector. The corresponding relationship between each element value in the second sample vector and the preset age group is shown in the following table 3:

TABLE 3

Value of element	Presetting age bracket
		w2₁	0 (newborn) -6 years old
w2₂	7-12 years old
		w2₃	13-17 years old
w2₄	18-45 years old
		w2₅	Age 46-69
w2₆	Greater than 69 years old

For example, the exact age of the face in the first sample image is 18 years. According to the correspondence described in table 1, the age group of the face in the first sample image is young. In this case, W2 is [0,0,0,1,0,0], that is, the values of the elements in the second sample vector are all 0 except the value of the 4 th element which is 1, thereby indicating that the age group of the face in the first sample image is young.

In summary, the first prediction vector P1 is a vector of exact age class level prediction, and the first sample vector W1 is a true value vector with respect to the first prediction vector P1. The second prediction vector P2 is a vector of age group classification level predictions and the second sample vector W2 is the true value vector relative to the second prediction vector P2.

Further, a first loss function is determined from the first prediction vector P1, the second prediction vector P2, the first sample vector W1, and the second sample vector W2. The first loss function can be designated as L1ⁱ，L1ⁱExpressed as:

where i denotes the ith first sample image in the first data set D1. That is, each first sample image corresponds to a first loss function. L1ⁱA first loss function corresponding to the ith first sample image is represented. w1_mIs the mth element value in the first sample vector W1. p1_mIs the mth element value in the first prediction vector P1. M is more than or equal to 1 and less than or equal to M. w2_nIs the nth element value in the second sample vector W2. p2_nIs the nth element value in the second prediction vector P2. N is more than or equal to 1 and less than or equal to N. That is, in the embodiments of the present disclosure, capital letters such as M and lowercase letters such as M each represent a different meaning.

S305, updating the parameters of the main body convolutional neural network layer, the parameters of the age classification layer and the parameters of the exact age classification layer according to the first loss function.

After the first loss function is calculated, the parameters of the main convolutional neural network layer, the parameters of the age group classification layer, and the parameters of the exact age group classification layer may be updated according to the first loss function.

Optionally, updating the parameters of the main convolutional neural network layer, the parameters of the age classification layer, and the parameters of the exact age classification layer according to the first loss function includes: deriving parameters of the main convolutional neural network layer, the age classification layer and the exact age classification layer in the first loss function to obtain a first derivative result; updating parameters of the subject convolutional neural network layer, parameters of the age classification layer, and parameters of the exact age classification layer according to the first derivative result.

For example, L1 by back propagation algorithmⁱThe parameters of the main convolutional neural network layer, the parameters of the age classification layer and the parameters of the exact age classification layer are differentiated to obtain a derivative result, and the derivative result is recorded as a first derivative result. Further, updating parameters of the subject convolutional neural network layer, parameters of the age group classification layer, and parameters of the exact age classification layer by a random gradient descent algorithm according to the first derivative result.

According to the embodiment of the disclosure, the neural network model is trained through the first sample image, the parameters of the main convolutional neural network layer, the parameters of the age classification layer and the parameters of the exact age classification layer can be updated, and in the process of multiple iterative training, the accuracy of the neural network model can be increased more and more, so that the accuracy of the neural network model is improved.

On the basis of the above embodiment, optionally, according to the second sample image and the preset age group corresponding to the second sample image, model training is performed on a main convolutional neural network layer and an age group classification layer in the neural network model, including the following steps as shown in fig. 5:

s501, inputting the second sample image into a main body convolution neural network layer in the neural network model to obtain second characteristic information corresponding to the second sample image.

When the sample image randomly selected from the first data set D1 and the second data set D2 is the second sample image, the second sample image is input into a bulk Neural Networks (CNN) as shown in fig. 4. And the main body convolution neural network layer is used for carrying out feature extraction on the second sample image to obtain second feature information corresponding to the second sample image.

S502, inputting the second characteristic information into an age classification layer in the neural network model to obtain a third prediction vector.

Since the second sample image is from the second data set D2. The face in the second sample image does not have an exact age but a preset age group. Therefore, the body convolutional neural network layer may input second feature information corresponding to the second sample image to the age group classification layer. Further, the exact age classification layer outputs an exact age prediction vector according to the second feature information, and the exact age prediction vector can be marked as a third prediction vector. The third prediction vector is an N-dimensional vector. Each element value in the third prediction vector corresponds to a predetermined age group. Each element value in the third prediction vector represents a probability that the age group of the face in the second sample image predicted by the age group classification layer is a preset age group. The third prediction vector can be denoted as P3, and P3 can be expressed as:

P3＝[p3₁，p3₂，......，p3₆]

wherein, p3₁P3 representing the probability that the age group of a face in the second sample image predicted by the age group classification layer is 0 (birth) -6 years old₂Probability that the age group of the face in the second sample image representing the prediction of the age group classification layer is 7-12 years old, and so on, p3₆And representing the probability that the age group of the face in the second sample image predicted by the age group classification layer is more than 69 years old. Optionally, the sum of all the element values in the third prediction vector is 1.

S503, determining a second loss function according to the third prediction vector and a preset age group corresponding to the second sample image.

Optionally, determining a second loss function according to the third prediction vector and a preset age group corresponding to the second sample image, where the determining includes: determining a third sample vector according to the preset age group corresponding to the second sample image, wherein each third element value in the third sample vector represents the probability that the age group of the second object in the second sample image is the preset age group, and each third element value corresponds to one preset age group; determining a second penalty function based on the third prediction vector and the third sample vector.

For example, in the formula (2)

Indicating a predetermined age group corresponding to the ith second sample image in the second data set D2. Further, according to

A vector may be determined, which is denoted as a third sample vector, and the third sample vector is used to indicate the probability that the age group of the face in the second sample image is the preset age group. Each element value in the third sample vector is denoted as a third element value. Each third element value corresponds to a predetermined age group. The third sample vector may be an N-dimensional vector. The third sample vector may be denoted as W3, and W3 may be expressed as:

W3＝[w3₁，w3₂，......，w3₆]

wherein, w3₁The corresponding predetermined age range is 0 (newborn) -6 years, w3₂The corresponding predetermined age range is 7-12 years old, and so on, w3₆The corresponding predetermined age group is greater than 69 years old. w3₁Denotes the probability that the age group of the face in the second sample image is 0 (birth) -6 years old, w3₂Representing the probability that the age bracket of the face in the second sample image is 7-12 years old, and so on, w3₆Indicating the probability that the age bracket of the face in the second sample image is greater than 69 years old. Optionally, the sum of all third element values in the third sample vector is 1.

Optionally, the age group of the second subject follows a gaussian distribution.

When the second sample images respectively corresponding to "infant, young person", "middle year", "old person" are crawled from the internet, the age group of the face in the second sample image may not be consistent with the keyword used in the search. For example, a certain second sample image is searched based on the keyword "baby" but the age group of the face in the second sample image may not be 0 (birth) -6 years old, and may be other preset age groups. In this case, it can be considered that the probability that the age group of the face in the second sample image is 0 (birth) -6 years old is high, and the probability that the age group of the face in the second sample image is other preset age groups is low. And the age group of the face in the second sample image follows a gaussian distribution.

Similarly, if a certain second sample image is searched according to the keyword "teenager", the probability that the age group of the face in the second sample image is 13-17 years old is higher, and the probability that the age group of the face in the second sample image is other preset age groups is lower.

Fig. 6 shows probability density functions corresponding to gaussian distributions. Wherein x represents a predetermined age group, and f (x) represents a probability density function. That is, f (x) is satisfied between the preset age range corresponding to each third element value and each third element value in the third sample vector W3.

Specifically, the age groups of the faces in the second sample image of the same type (e.g., corresponding to the same preset age group) follow the same gaussian distribution. The age groups of the face in the second sample image of different types (e.g., corresponding to different preset age groups) follow different gaussian distributions. For example, for the second sample image corresponding to "teenager", the preset age range x corresponding to the highest point of the probability density function is 13-17 years, and the function values of the probability density function start to gradually decay along both sides of the highest point. Suppose a Gaussian distribution of N (μ, σ)²) Where μ denotes the expected value, σ²The variance is indicated. Specifically, σ is 1.μ corresponds to the highest point.

Optionally, the third sample vectors corresponding to the second sample images belonging to the same type are the same, that is, the third sample vectors corresponding to different second sample images searched by using the same keyword are the same. The third sample vectors corresponding to the second sample images belonging to different types are different, that is, the third sample vectors corresponding to the different second sample images searched by using different keywords are different.

In summary, the third prediction vector P3 is a vector of age class hierarchy prediction, and the third sample vector W3 is a true value vector relative to the third prediction vector P3.

Further, a second loss function is determined based on the third prediction vector P3 and the third sample vector W3. This second loss function can be noted as L2ⁱ，L2ⁱExpressed as:

where i denotes the i-th second sample image in the second data set D2. That is, each second sample image corresponds to a second loss function. L2ⁱRepresenting a second loss function corresponding to the ith second sample image. w3_nIs the nth element value in the third sample vector W3. p3_nIs the nth element value in the third prediction vector P3. N is more than or equal to 1 and less than or equal to N.

S504, updating the parameters of the main convolutional neural network layer and the parameters of the age group classification layer according to the second loss function.

After the second loss function is calculated, the parameters of the main convolutional neural network layer and the parameters of the age group classification layer may be updated according to the second loss function.

Optionally, updating the parameters of the main convolutional neural network layer and the parameters of the age group classification layer according to the second loss function, including: obtaining a second derivative result by performing derivation on the parameters of the main convolutional neural network layer and the parameters of the age group classification layer in the second loss function; and updating the parameters of the main convolutional neural network layer and the parameters of the age classification layer according to the second derivative result.

For example, L2 by back propagation algorithmⁱAnd (4) performing derivation on the parameters of the main convolutional neural network layer and the parameters of the age classification layer to obtain a derivative result, and recording the derivative result as a second derivative result. Further, by randomization based on the second derivative resultThe gradient descent algorithm updates the parameters of the main convolutional neural network layer and the parameters of the age group classification layer.

According to the embodiment of the disclosure, the neural network model is trained through the second sample image, the parameters of the main convolutional neural network layer and the parameters of the age classification layer can be updated, and the accuracy of the neural network model can be increased more and more in the multi-iteration training process, so that the accuracy of the neural network model is improved.

It is understood that the sample images randomly selected from the first data set D1 and the second data set D2 may be input into the main convolutional neural network layer, and feature information of the sample images is obtained by performing feature extraction on the sample images through the main convolutional neural network layer. When the sample image is the first sample image, the main body convolution neural network layer inputs the characteristic information of the sample image to the age group classification layer and the exact age classification layer respectively. When the sample image is a second sample image, the main body convolution neural network layer inputs the characteristic information of the sample image only to the age group classification layer. That is, each first sample image may be updated once for the parameters of the subject convolutional neural network layer, the parameters of the age group classification layer, and the parameters of the exact age group classification layer, and each second sample image may be updated once for the parameters of the subject convolutional neural network layer and the parameters of the age group classification layer. In the process of multiple iterative training, the parameters of the main convolutional neural network layer, the parameters of the age classification layer and the parameters of the exact age classification layer are continuously updated, so that the accuracy of the neural network model is higher and higher. Specifically, when the function value of the first loss function and/or the second loss function is lower than a preset threshold, or the iteration number reaches a preset number, the training may be ended, so as to obtain a trained neural network model.

Fig. 7 is a flowchart of an image-based age estimation method according to an embodiment of the present disclosure. For example, the image-based age estimation method may be performed by the terminal 11. Similarly, the image-based age estimation method may also be performed by the server 12. Specifically, the terminal 11 may obtain a trained neural network model from the server 12, and further, the terminal 11 performs age estimation on the target object in the target image according to the trained neural network model. Specifically, the method comprises the following specific steps:

s701, acquiring a target image, wherein the target image comprises a target object.

For example, the terminal 11 may be provided with a camera through which the terminal 11 may capture an image of the target. Optionally, the target image is a face image, and the target object is a face.

In other embodiments, the target image may be a face image obtained by the terminal 11 from a person image. The generation process or source of the person image is not limited herein. Alternatively, the target image may be a face image acquired by the terminal 11 from another terminal or a server.

Specifically, the target object in the target image may be a human face of which the age is to be determined.

S702, inputting the target image into a trained neural network model, wherein the neural network model is obtained by training according to a first sample image with an age label and a second sample image corresponding to each preset age group in a plurality of preset age groups, the neural network model comprises a main convolutional neural network layer, an age group classification layer and an exact age classification layer, the main convolutional neural network layer is used for carrying out feature extraction on the target image to obtain target feature information corresponding to the target image, the age group classification layer is used for determining age group probability information of the target object according to the target feature information, and the exact age classification layer is used for determining the age probability information of the target object according to the target feature information.

Further, the terminal 11 may input the target image into the trained neural network model. The neural network model is obtained by training according to the first sample image with the age label and the second sample image corresponding to each preset age group in the plurality of preset age groups, and the specific training process can refer to the training process described in the above embodiment, which is not described herein again.

It is assumed that, in the present embodiment, the neural network model shown in fig. 4 is a trained neural network model. Specifically, after the target image is input into the trained neural network model, the target image is first input into the main convolutional neural network layer, and the main convolutional neural network layer can perform feature extraction on the target image to obtain target feature information corresponding to the target image. Further, the main body convolutional neural network layer inputs the target characteristic information into the age classification layer and the exact age classification layer, respectively. Specifically, the age group classification layer may output an age group prediction vector according to the target feature information, and the age group prediction vector may be recorded as age group probability information of the target object in the stage of performing age estimation using the neural network model, where the age group probability information is used to indicate a probability that the age group of the target object predicted by the age group classification layer is a preset age group. The exact age classification layer may output an exact age prediction vector according to the target feature information, and in a stage of performing age estimation using the neural network model, the exact age prediction vector may be recorded as age probability information of the target object, where the age probability information is used to indicate a probability that the age of the target object predicted by the exact age classification layer is each preset age.

And S703, determining the age of the target object according to the age group probability information of the target object and the age probability information of the target object.

Further, the age of the target object is determined based on the probability that the age group of the target object predicted by the age group classification layer, which is the age group probability information of the target object, is each of the preset age groups, and the probability that the age of the target object predicted by the exact age group classification layer, which is the age probability information of the target object, is each of the preset ages, and the age of the target object may be a specific age value of the target object.

The embodiment of the disclosure estimates the age of the target object in the target image through the trained neural network model, which is obtained by training according to the first sample image with the age label and the second sample image corresponding to each preset age group in the plurality of preset age groups, that is, when the neural network model is trained, not only the sample image of the exact age is required, but also the sample image of the preset age group is required, and the collection cost of the sample image of the preset age group is far lower than that of the sample image of the exact age, and the sample image of the preset age group is easier to obtain, so that the training cost of the neural network model is reduced. In addition, in the process of training the neural network model by adopting a small amount of sample images with definite ages, the number of the sample images used for training the neural network model can be increased by adding a large amount of sample images with preset ages for auxiliary training, so that the accuracy of the trained neural network model is improved. In addition, the neural network model comprises a main body convolutional neural network layer, an age classification layer and an exact age classification layer, wherein the main body convolutional neural network layer is used for carrying out feature extraction on the target image to obtain target feature information corresponding to the target image, the age classification layer is used for predicting the age of the target object, and the exact age classification layer is used for predicting the age of the target object. Therefore, according to the age group probability information of the target object and the age probability information of the target object, the accuracy of determining the age of the target object is higher, namely the accuracy of predicting the age by the neural network model is improved.

On the basis of the foregoing embodiment, optionally, the age group probability information of the target object includes a first target vector, each first element value in the first target vector represents a probability that the age group of the target object is a preset age group, and each first element value corresponds to one preset age group.

For example, the first target vector is denoted as K1, and K1 can be expressed as:

K1＝[k1₁，k1₂，......，k1₆]

wherein, k1₁K1 represents the probability that the age group of the target object predicted by the age group classification layer is 0 (birth) -6 years old₂K1, which represents the probability that the age group of the target object predicted by the age group classification layer is 7-12 years old, and so on₆Representing the age of a target object predicted by an age classification levelThe probability of a segment being greater than 69 years old. The corresponding relationship between each first element value in the first target vector and a predetermined age bracket may be the corresponding relationship of the parameters in table 2 or table 3, which is not described herein again.

Optionally, the age probability information of the target object includes a second target vector, each second element value in the second target vector represents a probability that the age of the target object is a preset age, and each second element value corresponds to a preset age.

For example, the second target vector is denoted as K2, and K2 can be expressed as:

K2＝[k2₁，k2₂，......，k2₁₀₀]

wherein, k2₁K2 representing the probability that the age of the target object predicted by the exact age classification level is 1₂Probability of age 2 of the target object representing an exact age classification level prediction, and so on, k2₁₀₀Representing the probability that the age of the target object predicted by the exact age classification level is 100. It is understood that in other embodiments, the maximum age that the exact age classification level can predict is not limited to 100, but may be an age greater than 100. In addition, the exact age classification layer may also predict the probability that the age of the target object is 0. The vectors described in this embodiment are only illustrative and are not limited in particular. The correspondence between each second element value in the second target vector and a preset age may be the correspondence of the parameters in table 1 above, which is not described herein again.

Optionally, determining the age of the target object according to the age group probability information of the target object and the age probability information of the target object includes the following steps as shown in fig. 8:

s801, determining a target age group of the target object according to the first target vector, wherein the target age group is a preset age group corresponding to the maximum first element value in the first target vector.

For example, from the first target vector K1, a target age group of the target object is determined, which may be the largest of the first target vectors K1The first element value corresponds to a preset age range. For example, K1 in the first target vector K1₂Maximum, then k1₂The corresponding preset age bracket 7-12 years old is the target age bracket of the target object.

S802, according to the target age group, obtaining a sub-vector corresponding to the target age group from the second target vector.

For example, the second target vector K2 includes 100 second element values therein. When the target age group is 7-12 years old, a sub-vector corresponding to 7-12 years old, which may be [ K2 ] specifically, may be obtained from the second target vector K2₇，k2₈，k2₉，k2₁₀，k2₁₁，k2₁₂]。

And S803, determining the age of the target object according to the sub-vectors.

Further, according to the sub-vector [ k2 ]₇，k2₈，k2₉，k2₁₀，k2₁₁，k2₁₂]And determining the age of the target object.

Optionally, determining the age of the target object according to the sub-vectors includes the following steps as shown in fig. 9:

and S901, carrying out normalization processing on the second element values in the sub-vectors.

Since the second target vector K2 ═ K2₁，k2₂，......，k2₁₀₀]The sum of all second element values in (a) is equal to 1, and thus the sub-vector k2₇，k2₈，k2₉，k2₁₀，k2₁₁，k2₁₂]The sum of each second element value in (a) is less than 1.

In this embodiment, the subvector [ k2 ] may be₇，k2₈，k2₉，k2₁₀，k2₁₁，k2₁₂]The second element values in (b) are normalized so that the sum of each second element value in the normalized sub-vector is equal to 1.

For example, the normalized subvector is [ k 2'₇，k2’₈，k2’₉，k2’₁₀，k2’₁₁，k2’₁₂]。

S902, carrying out weighted average on each preset age in the target age group according to the second element value in the sub-vector after normalization processing to obtain the age of the target object.

Due to the sub-vector k2₇，k2₈，k2₉，k2₁₀，k2₁₁，k2₁₂]Each of the second element values in (a) corresponds to the ages of 7, 8, 9, 10, 11 and 12 in turn, and thus may be based on the normalized sub-vector [ k 2'₇，k2’₈，k2’₉，k2’₁₀，k2’₁₁，k2’₁₂]Is weighted average of the age of 7, 8, 9, 10, 11 and 12 to obtain the age of the target subject. Wherein the weighting coefficient of 7 years old is k 2'₇And the weighting coefficient of 8 years old is k 2'₈By analogy, the weighting coefficient of 12 years old is k 2'₁₂. Specifically, after weighted averages are obtained by weighted averages for the ages of 7, 8, 9, 10, 11, and 12, the weighted averages are rounded up, and the rounded-up results are used as the age of the target object, i.e., the final prediction result.

It can be understood that, when the execution subject of the training method of the neural network model and the execution subject of the age estimation method are the same execution subject, before the target image is obtained in S701, the execution subject may also train the neural network model, and a specific training process may refer to the training method of the neural network model described in the above embodiment, and details are not described here.

In this embodiment, the target age group of the target object is used to obtain the sub-vector corresponding to the target age group from the second target vector, further, the second element value in the sub-vector is normalized, and each preset age in the target age group is weighted and averaged by the second element value in the normalized sub-vector, so as to obtain the age of the target object, thereby further improving the accuracy of the age of the target object.

Fig. 10 is a schematic structural diagram of an image-based age estimation apparatus according to an embodiment of the present disclosure. The image-based age estimation apparatus provided by the embodiment of the present disclosure may execute the processing flow provided by the embodiment of the image-based age estimation method, as shown in fig. 10, the image-based age estimation apparatus 100 includes:

an obtaining module 101, configured to obtain a target image, where the target image includes a target object;

an input module 102, configured to input the target image into a trained neural network model, where the neural network model is obtained by training according to a first sample image with an age label and a second sample image corresponding to each of a plurality of preset age groups, the neural network model includes a main convolutional neural network layer, an age group classification layer, and an exact age classification layer, the main convolutional neural network layer is configured to perform feature extraction on the target image to obtain target feature information corresponding to the target image, the age group classification layer is configured to determine age group probability information of the target object according to the target feature information, and the exact age classification layer is configured to determine age probability information of the target object according to the target feature information;

a determining module 103, configured to determine the age of the target object according to the age group probability information of the target object and the age probability information of the target object.

Optionally, the age group probability information of the target object includes a first target vector, each first element value in the first target vector represents a probability that the age group of the target object is a preset age group, and each first element value corresponds to one preset age group;

the age probability information of the target object comprises second target vectors, each second element value in the second target vectors represents the probability that the age of the target object is a preset age, and each second element value corresponds to one preset age.

Optionally, when the determining module 103 determines the age of the target object according to the age group probability information of the target object and the age probability information of the target object, it is specifically configured to: determining a target age group of the target object according to the first target vector, wherein the target age group is a preset age group corresponding to a maximum first element value in the first target vector; according to the target age group, acquiring a sub-vector corresponding to the target age group from the second target vector; and determining the age of the target object according to the sub-vectors.

Optionally, when determining the age of the target object according to the sub-vector, the determining module 103 is specifically configured to: normalizing the second element value in the sub-vector; and according to the second element value in the sub-vector after normalization processing, carrying out weighted average on each preset age in the target age group to obtain the age of the target object.

Optionally, the target image is a face image, and the target object is a face.

Optionally, the image-based age estimation apparatus 100 further includes: a first training module 104 and a second training module 105;

the obtaining module 101 is further configured to: acquiring a first sample image and an age of a first subject in the first sample image; acquiring a second sample image corresponding to each preset age group according to the plurality of preset age groups;

a first training module 104, configured to perform model training on a main convolutional neural network layer, an age classification layer, and an exact age classification layer in the neural network model according to the first sample image and an age of a first object in the first sample image;

and the second training module 105 is configured to perform model training on a main convolutional neural network layer and an age group classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image.

Optionally, when the first training module 104 performs model training on the main convolutional neural network layer, the age classification layer, and the exact age classification layer in the neural network model according to the first sample image and the age of the first object in the first sample image, specifically, the first training module is configured to:

inputting the first sample image into a main body convolution neural network layer in the neural network model to obtain first characteristic information corresponding to the first sample image;

inputting the first characteristic information into the exact age classification layer in the neural network model to obtain a first prediction vector;

inputting the first characteristic information into an age classification layer in the neural network model to obtain a second prediction vector;

determining a first loss function based on the first prediction vector, the second prediction vector, and an age of a first subject in the first sample image;

updating parameters of the subject convolutional neural network layer, parameters of the age classification layer, and parameters of the exact age classification layer according to the first loss function.

Optionally, when the first training module 104 determines the first loss function according to the first prediction vector, the second prediction vector, and the age of the first object in the first sample image, it is specifically configured to:

determining a first sample vector according to the age of a first object in the first sample image, wherein the first sample vector is used for representing the age of the first object;

determining a second sample vector from the first sample vector, the second sample vector being used to represent the age group of the first subject;

determining a first loss function based on the first prediction vector, the second prediction vector, the first sample vector, and the second sample vector.

Optionally, when the first training module 104 updates the parameter of the main convolutional neural network layer, the parameter of the age group classification layer, and the parameter of the exact age group classification layer according to the first loss function, the first training module is specifically configured to:

deriving parameters of the main convolutional neural network layer, the age classification layer and the exact age classification layer in the first loss function to obtain a first derivative result;

updating parameters of the subject convolutional neural network layer, parameters of the age classification layer, and parameters of the exact age classification layer according to the first derivative result.

Optionally, the second training module 105 is configured to, when performing model training on the main convolutional neural network layer and the age group classification layer in the neural network model according to the second sample image and the preset age group corresponding to the second sample image, specifically:

inputting the second sample image into a main convolutional neural network layer in the neural network model to obtain second characteristic information corresponding to the second sample image;

inputting the second characteristic information into an age classification layer in the neural network model to obtain a third prediction vector;

determining a second loss function according to the third prediction vector and a preset age group corresponding to the second sample image;

and updating the parameters of the main convolutional neural network layer and the parameters of the age classification layer according to the second loss function.

Optionally, when the second training module 105 determines the second loss function according to the third prediction vector and the preset age group corresponding to the second sample image, the second training module is specifically configured to:

determining a third sample vector according to the preset age group corresponding to the second sample image, wherein each third element value in the third sample vector represents the probability that the age group of the second object in the second sample image is the preset age group, and each third element value corresponds to one preset age group;

determining a second penalty function based on the third prediction vector and the third sample vector.

Optionally, when the second training module 105 updates the parameters of the main convolutional neural network layer and the parameters of the age group classification layer according to the second loss function, the second training module is specifically configured to:

obtaining a second derivative result by performing derivation on the parameters of the main convolutional neural network layer and the parameters of the age group classification layer in the second loss function;

and updating the parameters of the main convolutional neural network layer and the parameters of the age classification layer according to the second derivative result.

Optionally, the first sample image is a face image, and the first object is a face;

the second sample image is a face image, and a second object in the second sample image is a face.

The image-based age estimation apparatus of the embodiment shown in fig. 10 can be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects thereof are similar and will not be described herein again.

Fig. 11 is a schematic structural diagram of a neural network model training device according to an embodiment of the present disclosure. The neural network model training device provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the neural network model training method, as shown in fig. 11, the neural network model training device 110 includes:

a first obtaining module 111, configured to obtain a first sample image and an age of a first object in the first sample image;

a second obtaining module 112, configured to obtain, according to a plurality of preset age groups, a second sample image corresponding to each preset age group;

a first training module 113, configured to perform model training on a main convolutional neural network layer, an age classification layer, and an exact age classification layer in the neural network model according to the first sample image and an age of the first object in the first sample image;

and a second training module 114, configured to perform model training on a main convolutional neural network layer and an age group classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image.

Optionally, when the first training module 113 performs model training on the main convolutional neural network layer, the age classification layer, and the exact age classification layer in the neural network model according to the first sample image and the age of the first object in the first sample image, specifically, the first training module is configured to:

Optionally, when the first training module 113 determines the first loss function according to the first prediction vector, the second prediction vector, and the age of the first object in the first sample image, the first training module is specifically configured to:

Optionally, when the first training module 113 updates the parameter of the main convolutional neural network layer, the parameter of the age group classification layer, and the parameter of the exact age group classification layer according to the first loss function, the first training module is specifically configured to:

Optionally, when the second training module 114 performs model training on the main convolutional neural network layer and the age group classification layer in the neural network model according to the second sample image and the preset age group corresponding to the second sample image, specifically, the second training module is configured to:

Optionally, when the second training module 114 determines the second loss function according to the third prediction vector and the preset age group corresponding to the second sample image, the second training module is specifically configured to:

Optionally, when the second training module 114 updates the parameter of the main convolutional neural network layer and the parameter of the age group classification layer according to the second loss function, the second training module is specifically configured to:

The neural network model training device in the embodiment shown in fig. 11 can be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

Fig. 12 is a schematic structural diagram of an image-based age estimation device according to an embodiment of the present disclosure. The image-based age estimation device provided by the embodiment of the present disclosure may execute the processing flow provided by the embodiment of the image-based age estimation method, as shown in fig. 12, the image-based age estimation device 120 includes: memory 121, processor 122, computer programs, and communications interface 123; wherein the computer program is stored in the memory 121 and is configured to be executed by the processor 122 in the image based age estimation method as described above.

Fig. 13 is a schematic structural diagram of a neural network model training device according to an embodiment of the present disclosure. The neural network model training device provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the neural network model training method, as shown in fig. 13, the neural network model training device 130 includes: memory 131, processor 132, computer programs, and communications interface 133; wherein the computer program is stored in the memory 131 and is configured to execute the neural network model training method as described above by the processor 132.

In addition, the embodiment of the present disclosure also provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the image-based age estimation method or the neural network model training method described in the above embodiment.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image-based age estimation method, the method comprising:

acquiring a target image, wherein the target image comprises a target object;

2. The method of claim 1, wherein the age group probability information of the target object comprises a first target vector, each first element value in the first target vector representing a probability that the age group of the target object is a preset age group, each first element value corresponding to one preset age group;

3. The method of claim 2, wherein determining the age of the target subject based on the age group probability information of the target subject and the age probability information of the target subject comprises:

determining a target age group of the target object according to the first target vector, wherein the target age group is a preset age group corresponding to a maximum first element value in the first target vector;

according to the target age group, acquiring a sub-vector corresponding to the target age group from the second target vector;

and determining the age of the target object according to the sub-vectors.

4. The method of claim 3, wherein determining the age of the target object from the sub-vectors comprises:

normalizing the second element value in the sub-vector;

and according to the second element value in the sub-vector after normalization processing, carrying out weighted average on each preset age in the target age group to obtain the age of the target object.

5. The method of claim 1, wherein the target image is a face image and the target object is a face.

6. The method of claim 1, wherein prior to acquiring the target image, the method further comprises:

7. The method of claim 6, wherein model training a subject convolutional neural network layer, an age classification layer, and an exact age classification layer in the neural network model based on the first sample image and an age of a first subject in the first sample image comprises:

8. The method of claim 7, wherein determining a first loss function based on the first prediction vector, the second prediction vector, and an age of a first subject in the first sample image comprises:

9. The method of claim 7, wherein updating the parameters of the subject convolutional neural network layer, the parameters of the age classification layer, and the parameters of the exact age classification layer according to the first loss function comprises:

10. The method of claim 6, wherein performing model training on a body convolutional neural network layer and an age classification layer in the neural network model according to the second sample image and a preset age group corresponding to the second sample image comprises:

11. The method of claim 10, wherein determining a second loss function according to the third prediction vector and a preset age group corresponding to the second sample picture comprises:

12. The method of claim 10, wherein updating the parameters of the subject convolutional neural network layer and the parameters of the age classification layer according to the second loss function comprises:

13. The method according to claim 6, wherein the first sample image is a face image, and the first object is a human face;

14. A neural network model training method, wherein the neural network model comprises: a main convolutional neural network layer, an age classification layer and an exact age classification layer; the method comprises the following steps:

15. An image-based age estimation device, comprising:

16. A neural network model training apparatus, wherein the neural network model includes: a main convolutional neural network layer, an age classification layer and an exact age classification layer; the neural network model training device comprises:

17. An image-based age estimation device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-13.

18. A neural network model training apparatus, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method as claimed in claim 14.

19. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-14.