CN112906525A

CN112906525A - Age identification method and device and electronic equipment

Info

Publication number: CN112906525A
Application number: CN202110161744.1A
Authority: CN
Inventors: 井雪; 李益永; 孙准; 黄秋实; 项伟
Original assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Current assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-04
Anticipated expiration: 2041-02-05
Also published as: CN112906525B

Abstract

The embodiment of the invention provides an age identification method, an age identification device and electronic equipment, and relates to the technical field of age identification. The method comprises the following steps: acquiring an image to be identified; extracting features of a target person in the image to be recognized to obtain N feature maps of the target person with different scales; carrying out face recognition on the N characteristic graphs to obtain a first age recognition result about a face, and carrying out human body recognition on the N characteristic graphs to obtain a second age recognition result about a human body; determining a first target age identification result of a target person in the image to be identified according to the first age identification result and the second age identification result; wherein N is a positive integer greater than 1. According to the scheme, the age of the image is identified by using the human face and human body multi-dimensional features, so that the obtained first target age identification result is more accurate and has stronger stability.

Description

Age identification method and device and electronic equipment

Technical Field

The present invention relates to the field of age identification technologies, and in particular, to an age identification method and apparatus, and an electronic device.

Background

Due to the development of networks, short videos and live webcasts are becoming the most important internet applications and services for the growth of minors. In order to prevent the short video or live webcasting from causing adverse effects on immature adults, the auditing strength of the immature adults needs to be increased, and appropriate web content needs to be distributed to the immature videos, so that a high-precision and high-recall model is needed to detect the age signal, and a basis is provided for subsequent content distribution or content auditing. Most of the existing methods are based on face detection, classification or regression is carried out by using detection results to obtain the final age, and the methods have high requirements on data labeling precision, and particularly labeling of specific ages is very subjective. In addition, the method has larger prediction error for the condition that the human face cannot be detected or the human face key point is not accurately positioned. Meanwhile, the robustness and the precision of the method are poor, and the same face rotation angle or the prediction result after cutting is different.

Disclosure of Invention

The invention provides an age identification method, an age identification device and electronic equipment, which are used for solving the problem that the existing age identification method is poor in accuracy and stability to a certain extent.

In a first aspect of the present invention, there is provided an age identifying method, including:

acquiring an image to be identified;

extracting features of a target person in the image to be recognized to obtain N feature maps of the target person with different scales;

carrying out face recognition on the N characteristic graphs to obtain a first age recognition result about a face, and carrying out human body recognition on the N characteristic graphs to obtain a second age recognition result about a human body;

determining a first target age identification result of a target person in the image to be identified according to the first age identification result and the second age identification result;

wherein N is a positive integer greater than 1.

In a second aspect of the present invention, there is provided an age identifying device, the device comprising:

the first acquisition module is used for acquiring an image to be identified;

the first extraction module is used for extracting the characteristics of a target person in the image to be identified to obtain N characteristic graphs of the target person with different scales;

the first identification module is used for carrying out face identification on the N characteristic graphs to obtain a first age identification result about a face, and carrying out human body identification on the N characteristic graphs to obtain a second age identification result about a human body;

the first determining module is used for determining a first target age identification result of a target person in the image to be identified according to the first age identification result and the second age identification result;

wherein N is a positive integer greater than 1.

In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the age identification method as described above when executing the program stored in the memory.

In a fourth aspect of the present invention, there is also provided a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the age identification method as described above.

In a fifth aspect of embodiments of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the age identification method as described above.

Aiming at the prior art, the invention has the following advantages:

in the embodiment of the invention, the target person in the image to be recognized is subjected to feature extraction to obtain N feature maps of different scales related to the target person, the N feature maps are subjected to face recognition to obtain a first age recognition result related to the face, the N feature maps are subjected to human body recognition to obtain a second age recognition result related to the human body, the first target age recognition result of the target person in the image to be recognized is determined according to the first age recognition result and the second age recognition result, and the age of the image is recognized by using the face and the human body multi-dimensional features, so that the obtained first target age recognition result is more accurate and has stronger stability.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly described below.

Fig. 1 is a flowchart of an age identification method according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for extracting N feature maps according to an embodiment of the present invention;

FIG. 3 is a structural diagram of a context structure according to an embodiment of the present invention;

fig. 4 is a flowchart of an age identification method for a video to be identified according to an embodiment of the present invention;

fig. 5 is a flowchart of an age identification method for a target user ID according to an embodiment of the present invention;

fig. 6 is a block diagram of an age identifying apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The embodiment of the invention provides an age identification method, which can be a method for identifying the age of input data (such as pictures, videos and the like) through an age identification model, and the identified age result is output, so that the age identification result corresponding to the input data is obtained. Wherein, the enhancement scheme (such as image rotation, clipping, etc.) of the input data can adopt different data enhancement schemes according to the data requirements and driving of the service.

In a concrete realization, the age identification model is a multi-dimensional characteristic age identification model, and training sample pictures used in the model training process comprise pictures of human faces, pictures of only backgrounds and the like, so that the marking data are fully utilized, and one piece of marking data is equivalent to two models for training: the human face model and the human body model not only improve the model precision, but also make the labeling easier after adding the human body in the labeling process, and improve the precision and the efficiency of the labeling. In order to make the network easier to train, a pre-training model of face detection retinaface can be used for transfer learning, a cosine annealing mode can be further adopted for better training the model, the learning rate is firstly slowly reduced according to a cosine function along with the increase of a training algebra, then is quickly reduced, and is slowly reduced again, the network learning can be more and more close to the global minimum value by the mode, and the convergence saturation of the model under large-scale training data is realized by applying a data enhancement technology and a multi-stage self-adaptive learning rate control technology; some non-heuristic optimization algorithms may be used to increase the convergence speed of the stochastic gradient descent and optimize performance. The network has four Loss functions in total, the face classification Loss and the human body classification Loss can be multi-classification Loss functions, so that the influence caused by excessive background samples can be effectively prevented, and the model can be helped to pay more attention to the training of difficult samples; the face regression Loss and the human body regression Loss may be used by the Loss function Smooth L1 Loss. The stochastic gradient descent represents an iterative optimization algorithm of a deep neural network, and a series of reasonable weights can be obtained for the neural network.

It should be noted that the loss in the process of training the model may be replaced by another loss function as needed.

As shown in fig. 1, the following description will be given taking an example in which an age identifying method is applied to an age identifying model, and the age identifying method may specifically include the following steps:

step 101, acquiring an image to be identified.

As an alternative embodiment, the image to be recognized is used as input data of an age recognition model, and the result of age recognition of the image to be recognized can be output through the age recognition model. The image to be recognized may be a picture, or a frame image in a video, or the like.

Optionally, as an example, in a case that the image to be recognized is a frame image in a video, the obtaining of the image to be recognized in step 101 may specifically include the following steps:

acquiring a video to be identified;

performing video analysis on the video to be identified to obtain M frame images;

selecting a first frame image from the M frame images, wherein the first frame image is the image to be identified;

wherein M is a positive integer greater than 1.

In a specific implementation, if the image to be recognized is a frame image in a video, the input data of the age recognition model is the video to be recognized, the video to be recognized is subjected to video analysis, and can be decomposed into M frame images, and the video to be recognized is composed of the M frame images. And the first frame image selected from the M frame images is the image to be identified.

102, extracting features of a target person in the image to be recognized to obtain N feature maps with different scales related to the target person; wherein N is a positive integer greater than 1.

As an optional embodiment, whether the image to be recognized contains people or not is detected, and if the image to be recognized does not contain people (such as only landscape content, living goods content and the like), the image to be recognized is filtered, and age recognition is not carried out. And if the image to be recognized contains the person, performing feature extraction on the target person in the image to be recognized. The number of the detected people in the image to be recognized is at least one, and if the number of the detected people in the image to be recognized is one, the people are target people; if it is detected that the image to be recognized includes a plurality of persons, the target person may be one or more of the plurality of persons, that is, the number of the target persons may be 1 or more, and is not particularly limited herein.

As an alternative embodiment, the feature extraction method may be: inputting an image to be recognized into a backbone network (such as ResNet50), performing feature extraction on a target person to obtain a plurality of first feature maps related to the target person, extracting a part of the first feature maps or extracting all the first feature maps from the plurality of first feature maps extracted from the backbone network, and inputting the extracted part of the first feature maps or all the first feature maps into a feature pyramid structure to obtain N feature maps with different scales.

The above feature extraction method is explained in detail by a specific embodiment as follows:

as a specific example, the backbone network may be a residual neural network ResNet50, but is not limited to this backbone network, and may include other backbone networks such as ResNet18, ResNet34, ResNet101, ResNext101, lightweight network ShuffleNetv2, depth separable convolution-based networks MobileNetv2, MobileNetv3, convolutional neural networks EfficientNetb 0-b 7, and the like. As shown in fig. 2, if the network structure includes five layers, which are conv1, conv2_ x, conv3_ x, conv4_ x and conv5_ x, respectively, the size of the feature map output by conv1 is reduced by 0.5 times and input to conv2_ x, the size of the feature map output by conv2_ x is reduced by 0.5 times and input to conv3_ x, the size of the feature map output by conv3_ x is reduced by 0.5 times and input to conv4_ x, and the size of the feature map output by conv4_ x is reduced by 0.5 times and input to conv5_ x, it can be known that the size of the feature map obtained by each layer of conv1 to conv5_ x is half of the size of the feature map obtained by the previous layer, that is, the size of the feature map obtained by each layer of conv1 to conv5_ x is reduced in turn. The convolutional neural network represents an end-to-end implicit model commonly used for extracting image features or video features and using the features for various visual tasks, and the input of the model is usually an image or a video.

If the value of N is 3, the extracted features may be outputs of three layers, conv3_ x, conv4_ x and conv5_ x, and the three extracted feature maps of three scales are input into the feature pyramid structure to obtain three outputs, that is, 3 feature maps of different scales are obtained, where the 3 feature maps have different sizes and the same number of channels, and the number of channels is 256. If the length x width of the input image to be recognized is 128 x 128, the length x width channel number of the extracted output feature map may be: 32 x 512, 16 x 1024 and 8 x 2048.

Specifically, as shown in fig. 2, the input of the feature pyramid structure is a feature map obtained by conv5_ x, the feature map is filtered by a 1 × 1 filter to obtain a feature map M5, the feature map M5 is enlarged by 2 times in size, and the feature map obtained by filtering the feature map obtained by conv4_ x by a 1 × 1 filter is added to obtain a feature map M4, wherein the enlargement of the feature map M5 by 2 times has the following functions: the size of the feature map obtained by filtering the feature map obtained by conv4_ x by a 1 x 1 filter is the same as that of the feature map obtained by filtering the feature map obtained by conv4_ x by a 1 x 1 filter. Similarly, the feature map obtained by adding the feature map obtained by enlarging the size of the feature map M4 by 2 times to the feature map obtained by filtering the feature map obtained by conv3_ x by a 1 × 1 filter is added to obtain the feature map M3, wherein the effect of enlarging the size of the feature map M4 by 2 times is as follows: the size of the feature map obtained by filtering the feature map obtained by conv3_ x by a 1 x 1 filter is the same as that of the feature map obtained by filtering the feature map obtained by conv3_ x by a 1 x 1 filter. The feature map M5 is filtered by a 3-by-3 filter to obtain a feature map P5, the feature map M4 is filtered by a 3-by-3 filter to obtain a feature map P4, and the feature map M3 is filtered by a 3-by-3 filter to obtain a feature map P3, i.e., P5, P4, and P3 are the obtained 3 feature maps related to different scales of the target person.

In the embodiment of the invention, a plurality of model tasks are skillfully collected, and the age identification result is directly obtained from one model in a mode of sharing the backbone network and the characteristic pyramid structure, so that the model reasoning speed is improved, the calculation expense of the model is reduced, and the resources are saved.

And 103, carrying out face recognition on the N characteristic graphs to obtain a first age recognition result about the face, and carrying out human body recognition on the N characteristic graphs to obtain a second age recognition result about the human body.

As an alternative embodiment, the N feature maps are input into the face subnetwork for face recognition to obtain a first age recognition result about the face, and the N feature maps are input into the body subnetwork for body recognition to obtain a second age recognition result about the body, the age about the face of the target person can be obtained by face recognition, the age about the body of the target person can be obtained by body recognition, and the age recognition can be more accurate and stable by referring to the face age and the body age.

As a specific embodiment, the face subnetwork includes a context structure, each feature map in the N feature maps is input into one context structure, the context structure is used to expand context information of a pre-detection area, a schematic diagram of the context structure is shown in fig. 3, first, a feature map with a channel number of 256 is converted into a feature map with a channel number of 128, then into a feature map with a channel number of 64, and then the two feature maps with a channel number of 64 are added to the feature map with a channel number of 128, so as to obtain a feature map with a channel number of 256, that is, the number of feature maps subjected to context structure processing is unchanged; and after the context structure and the 1 x 1 convolution module are superposed, a classification head and a regression head are generated, the classification head corresponds to the classification information of the feature map, and the regression head corresponds to the boundary box of the feature map, so that the output of the face subnetwork, namely the first age identification result, is obtained. The category information is used for classifying ages, and can be divided into 8 age categories, namely a first age category (including 0-2 years), a second age category (including 3-9 years), a third age category (including 10-13 years), a fourth age category (including 14-17 years), a fifth age category (including 18-30 years), a sixth age category (including 31-50 years), a seventh age category (including 51-100 years), an eighth age category (including various conditions such as no face, only background, fake face and the like); regression refers to coordinate offset regression of the face frame.

As another specific embodiment, the human body sub-network adds the N feature maps and then obtains a second age identification result through a human body classification network and a human body regression network; the human body classification network is composed of 4 convolution layers, the human body regression network is also composed of 4 convolution layers, and the human body classification network and the human body regression network share weight. The classification is the same as the face age classification, and can also be divided into 8 age categories, and the regression refers to coordinate offset regression of a human body frame. Wherein, the convolution layer represents a network layer for performing weighted summation of local pixel values and nonlinear activation, and the convolution base layer can be added or deleted.

It should be noted that, the steps of performing face recognition and human body recognition on the N feature maps are not limited in the front and back, and may be performed first, or may be performed simultaneously with the human body recognition, and the multi-scale feature maps are used to detect the face and the human body, and only the context structure is added in the face detection to enhance the receptive field and expand the context information of the detection region. Wherein, the receptive field represents the image or video range covered by the characteristic value.

And 104, determining a first target age identification result of a target person in the image to be identified according to the first age identification result and the second age identification result.

As an optional embodiment, the first age recognition result about the face and the second age recognition result about the human body are combined, so that a more accurate first target age recognition result about a target person in an image to be recognized can be obtained, the defect of misjudgment caused by only looking at the face is avoided, the accuracy of age prediction can be well improved through the integration of the face and the human body network, and the efficiency of human review can be improved. Wherein, if the target person is multiple, multiple first target age recognition results about the image to be recognized can be obtained, and the first target age recognition result of each target person is determined by the same method.

In the above embodiment of the present invention, feature extraction is performed on a target person in an image to be recognized to obtain N feature maps of different scales related to the target person, face recognition is performed on the N feature maps to obtain a first age recognition result related to a face, human body recognition is performed on the N feature maps to obtain a second age recognition result related to a human body, the first target age recognition result of the target person in the image to be recognized is determined according to the first age recognition result and the second age recognition result, and the age of the image is recognized by using the face and the human body multidimensional features, so that the obtained first target age recognition result is more accurate and has stronger stability.

Optionally, the step 104 determines the first target age recognition result of the target person in the image to be recognized according to the first age recognition result and the second age recognition result, and specifically may include the following steps:

calculating the first age identification result and the second age identification result to obtain a first numerical value;

comparing the first value to a first threshold value;

if the first numerical value is larger than the first threshold, determining a first target age identification result of a target person in the image to be identified according to a first probability value in the first age identification result and a second probability value in the second age identification result;

the first age identification result comprises first probability values corresponding to different age categories, and the second age identification result comprises second probability values corresponding to different age categories.

As an optional embodiment, performing cross-Over (IOU) calculation on the first age recognition result and the second age recognition result to obtain a first numerical value, if the first numerical value is greater than a first threshold (e.g., 0.95), determining that the human face and the human body belong to the same target person, and obtaining a first target age recognition result of the target person in the image to be recognized according to first probability values corresponding to different age categories in the first age recognition result and second probability values corresponding to different age categories in the second age recognition result. The IOU is an index for judging detection precision, the area of intersection of two rectangles is divided by the area of union of the two rectangles, the IOU is used for judging whether the human face and the human body come from the same target person or not in the post-processing of human face human body inference, and the misjudgment influence of a model is avoided by the aid of the element product of the human face and the human body coming from the same target person, so that the model precision is improved.

It should be noted that the first threshold is a limit value set as needed, and is used for determining whether the human face and the human body are the same person; the post-processing of the face and the body is not limited to consistency post-processing, and more efficient consistency can be used to determine whether the face and the body belong to the same target person.

In a specific implementation, after the image to be recognized is input into the age recognition model, four outputs are obtained, including a face frame, first probability values corresponding to different age categories corresponding to the face frame, and second probability values corresponding to different age categories corresponding to the body frame and the body frame. Firstly, calculating the sum of probabilities of a first age category to a seventh age category of a human face, filtering out human face frames corresponding to the probability sum smaller than a preset probability value (such as 0.35), calculating the sum of the probabilities of the first age category to the seventh age category of the human body, and filtering out human body frames corresponding to the probability sum smaller than the preset probability value; sorting the probability sum of the rest face frames from large to small to screen out a plurality of large face frames (such as 500 face frames), sorting the probability sum of the rest human body frames from large to small to screen out a plurality of large human body frames (such as 500 human body frames); and then, using Non-Maximum Suppression (NMS) to eliminate redundant face frames and human body frames to finally obtain a face frame and a human body frame, wherein a first probability value of different age categories corresponding to the final face frame is a first age identification result, and a second probability value of different age categories corresponding to the final human body frame is a second age identification result.

For example: the first probability values corresponding to different age categories corresponding to the face frame are respectively as follows: the first probability value corresponding to the first age category is 0.1, the first probability value corresponding to the second age category is 0.2, the first probability value corresponding to the third age category is 0.1, the first probability value corresponding to the fourth age category is 0.1, the first probability value corresponding to the fifth age category is 0.3, the first probability value corresponding to the sixth age category is 0.1, the first probability value corresponding to the seventh age category is 0.1, and the first probability value corresponding to the eighth age category is 0; if the preset probability value is 0.35, the face box is retained because 1 is greater than 0.35, with respect to the sum of the probabilities of the first age class to the seventh age class of the face, that is, 0.1+0.2+0.1+0.1+0.3+0.1 ═ 1. The second probability values corresponding to different age categories corresponding to the human body frame are respectively as follows: the second probability value corresponding to the first age category is 0.01, the second probability value corresponding to the third age category is 0.08, the second probability value corresponding to the fourth age category is 0.05, the second probability value corresponding to the fifth age category is 0.02, the second probability value corresponding to the sixth age category is 0.03, the second probability value corresponding to the seventh age category is 0, the second probability value corresponding to the eighth age category is 0.8, the sum of the probabilities of the first age category to the seventh age category of the human body, namely 0.01+0.01+0.08+0.05+0.02+0.03+ 0.2, is 0.2, and if the preset probability value is 0.35, the human body box is filtered because 0.2 is smaller than 0.35.

Optionally, the step of determining the first target age recognition result of the target person in the image to be recognized according to the first probability value in the first age recognition result and the second probability value in the second age recognition result may specifically include:

multiplying a first probability value and a second probability value corresponding to the same age category in the first age identification result and the second age identification result to obtain a third probability value corresponding to each age category;

and comparing the sizes of the third probability values corresponding to all the age categories, and determining the age category with the maximum third probability value as the first target age identification result.

As an optional embodiment, a first probability value corresponding to the first age category in the first age identification result is multiplied by a second probability value corresponding to the first age category in the second age identification result to obtain a third probability value corresponding to the first age category; multiplying a first probability value corresponding to the second age category in the first age identification result by a second probability value corresponding to the second age category in the second age identification result to obtain a third probability value corresponding to the second age category; multiplying a first probability value corresponding to a third age category in the first age identification result by a second probability value corresponding to the third age category in the second age identification result to obtain a third probability value corresponding to the third age category; multiplying a first probability value corresponding to a fourth age category in the first age identification result by a second probability value corresponding to the fourth age category in the second age identification result to obtain a third probability value corresponding to the fourth age category; multiplying a first probability value corresponding to a fifth age category in the first age identification result by a second probability value corresponding to the fifth age category in the second age identification result to obtain a third probability value corresponding to the fifth age category; multiplying a first probability value corresponding to the sixth age category in the first age identification result by a second probability value corresponding to the sixth age category in the second age identification result to obtain a third probability value corresponding to the sixth age category; multiplying a first probability value corresponding to a seventh age category in the first age identification result by a second probability value corresponding to the seventh age category in the second age identification result to obtain a third probability value corresponding to the seventh age category; multiplying a first probability value corresponding to the eighth age category in the first age identification result and a second probability value corresponding to the eighth age category in the second age identification result to obtain a third probability value corresponding to the eighth age category; and comparing the sizes of the third probability values corresponding to each of the first age category to the eighth age category, wherein the age category with the maximum third probability value is the first target age identification result, namely if the third probability value corresponding to the first age category is the maximum, the first age category is the first target age identification result.

Optionally, after the step 104 determines the first target age recognition result of the target person in the image to be recognized, the method further includes the following steps:

under the condition that the number of the target persons is 1, determining a second target age identification result of the image to be identified according to a first target age identification result of the target persons in the image to be identified;

and under the condition that the number of the target persons is multiple, determining a second target age identification result of the image to be identified according to a first target age identification result of each target person in the image to be identified.

Specifically, if the number of the target persons is 1, determining that the first target age identification result of the target persons is the second target age identification result; and if the number of the target persons is multiple, determining a second target age identification result according to the first target age identification result corresponding to each target person.

As a specific example, if the number of target persons is 3, they are: a first person, a second person, and a third person. The second target age recognition result can be set as required, wherein the first target age recognition result is the largest of the first target age recognition results corresponding to each of the 3 target persons; or setting the minimum first target age recognition result in the first target age recognition results corresponding to each target person in the 3 target persons as a second target age recognition result according to needs; the second target age recognition result may also be determined in other manners as needed, and is not particularly limited herein.

In a specific implementation, the age identification model is an image-level age identification model, and parameters and structures of the model can be stored. If the immature prediction is carried out, the model is directly loaded to finish the age identification, and the model does not need to be retrained. When video recognition is performed according to the age recognition model, the application scenarios may be: video level minor single out-of-mirror detection and user level minor detection, etc. The video-level minor independent mirror-out detection is mainly used for assisting in auditing the low-quality violation of video contents, and through a high-precision age identification model, whether the video is a minor independent mirror-out video can be directly estimated, so that people do not need to go to audit, and the attack efficiency of the low-quality violation video is improved. The user-level underage detection is used to provide underage ID and determine if the user is underage by detecting all videos under the user's name.

The following is an example of minor single-lens examination:

optionally, after determining the second target age recognition result of the image to be recognized, the method may further include the following steps:

determining a second target age identification result of other frame images, wherein the other frame images are frame images except the image to be identified in the M frame images;

and determining a third target age identification result of the video to be identified according to the second target age identification result of each frame image in the M frame images.

As an optional embodiment, determining a second target age recognition result corresponding to each other frame of image according to a method for determining a second target age recognition result corresponding to the image to be recognized, thereby obtaining a second target age recognition result corresponding to each frame of image in the M frame of images; and then determining a third target age identification result corresponding to the video to be identified according to the second target age identification result corresponding to each frame image in the M frame images.

Optionally, the step of determining a third target age identification result of the video to be identified according to the second target age identification result of each frame image of the M frame images may specifically include:

classifying second target age identification results corresponding to each frame image in the M frame images to obtain the age type of each frame image in the M frame images;

determining the age type of the video to be identified according to the age type of each frame image in the M frame images;

and determining a third target age identification result of the video to be identified according to the age type of the video to be identified.

As an alternative embodiment, the age types may include two age types of an adult and a minor, and multiple age types may be set according to needs, which is not specifically limited herein. The following is described by way of example of an age type that may include both adult and minor age types: classifying the second target age identification result corresponding to each frame image in the M frame images, namely, the age category more than or equal to 18 years old is an adult, and the age category less than 18 years old is a minor, so as to obtain the age type of each frame image in the M frame images; according to the age type to which each frame image in the M frame images belongs, the age type to which the video to be recognized belongs can be determined, and according to the age type to which the video to be recognized belongs, a third target age recognition result corresponding to the video to be recognized can be determined, so that the age recognition result of the video level can be obtained through an image-level age recognition model, and the application is wider.

Optionally, the step of determining the age type of the video to be identified according to the age type of each frame image in the M frame images may specifically include:

acquiring a first number of frame images belonging to a first age type in the M frame images according to the age type of each frame image in the M frame images;

acquiring the size relation between the first quantity and a second threshold value;

and determining the age type of the video to be identified according to the size relation between the first quantity and the second threshold value.

As an alternative embodiment, the first number of frame images belonging to the first age type in the M frame images is obtained according to the age type of each frame image in the M frame images, wherein the first age type may be an adult type or a minor type. If the single lens-out detection is a minor single lens-out detection, the first age type can be an adult type, and the first number is the number of frame images belonging to the adult type in the M frame images; and then, determining the age type of the video to be identified according to the size relation between the first number and the second threshold.

For example: the second threshold value is 2, the first number is compared with the second threshold value, if the first number is larger than 2, the age type of the video to be identified is determined to be an adult type, and the video to be identified can be further determined to have no juveniles to exit alone; if the first number is less than or equal to 2, determining that the age type of the video to be identified is a minor type, and further determining that the video to be identified is a minor single-out border.

Optionally, the step of determining a third target age identification result of the video to be identified according to the age type of the video to be identified may specifically include:

according to the age type of the video to be identified, acquiring a plurality of second frame images, which are identical to the age type of the video to be identified, in the video to be identified;

and determining a third target age identification result of the video to be identified according to a second target age identification result corresponding to each frame of image in the plurality of second frames of images.

As an optional embodiment, according to the age type to which the video to be recognized belongs, a plurality of second frame images in the video to be recognized, which are the same as the age type to which the video to be recognized belongs, are obtained, that is, if the age type to which the video to be recognized belongs is a minor type, a plurality of second frame images in the video to be recognized, which belong to the minor type, are obtained; and if the age type of the video to be identified is an adult type, acquiring a plurality of second frame images belonging to the adult type in the video to be identified. Then, according to a second target age identification result corresponding to each frame image in the plurality of second frame images, a third target age identification result of the video to be identified is determined, that is, after the age type of the video to be identified is determined, according to the second target age identification result corresponding to each frame image in the plurality of second frame images, a specific age category of the video to be identified in the age type, that is, the third target age identification result, can be further determined, and therefore the age identification result corresponding to the video to be identified can be identified.

Optionally, the step of determining a third target age identification result of the video to be identified according to the second target age identification result corresponding to each frame of image in the plurality of second frames of images may specifically include:

acquiring a second number of frame images with the same second target age identification result in the plurality of second frame images according to the second target age identification result corresponding to each frame image in the plurality of second frame images;

and if the number of the frame images is multiple, determining that the second target age identification result corresponding to the frame image with the largest second number is a third target age identification result.

As an optional embodiment, according to a second target age recognition result corresponding to each frame image in the plurality of second frame images, obtaining a second number of frame images having the same second target age recognition result in the plurality of second frame images; and if the second number of numerical values is multiple, taking the second target age identification result corresponding to the frame image corresponding to the second number of the maximum values as a third target age identification result.

For example: if the number of the second frame images is 5, and the second target age identification results corresponding to each of the 5 second frame images are the same in 2 of the second target age identification results corresponding to the second frame images, and the second target age identification results corresponding to the other 3 second frame images are the same, the third target age identification result is the second target age identification result corresponding to the latter (i.e., the other 3) second frame images.

The procedure for minor single-lens detection is described in detail below with a specific embodiment:

for example: as shown in fig. 4, step 401, acquiring a video to be identified input by a user;

step 402, performing age identification on each frame of image in the video to be identified to obtain a second target age identification result corresponding to each frame of image in M frames of image of the video to be identified;

step 403, obtaining a first number of frame images belonging to a first age type (e.g. adult type) from the M frame images, comparing the first number with a second threshold (e.g. 2), and determining whether the first number is greater than 2; if yes, go to step 404; if not, step 405 is entered.

If yes, the first number is larger than 2, the age type of the video to be recognized is determined to be an adult type, and it can be further determined that no minor exits from the video to be recognized.

Step 405, if not, it indicates that the first number is less than or equal to 2, and it is determined that the age type of the video to be identified is a minor type, and it may be further determined that the video to be identified is a minor single exit; and if the frame images have a plurality of second numbers, determining that the second target age identification result corresponding to the frame image with the largest second number is a third target age identification result.

The following is described using a user-level underage test as an example:

optionally, after the step of determining the third target age identification result of the video to be identified according to the second target age identification result corresponding to each frame of image in the plurality of second frames of images, the method may further include the following steps:

detecting a target user account ID for issuing the video to be identified;

acquiring the age type of each video issued by the target user ID according to the third target age identification result of each video issued by the target user ID;

and determining the age type of each video issued by the target user ID according to the age type of each video issued by the target user ID.

As an optional embodiment, the target user ID of the video to be identified may be detected, and the detecting step and the step of determining the third target age identification result corresponding to the video to be identified are not limited in sequence, that is, the target user ID of the video to be identified may be detected first, the third target age identification result corresponding to the video to be identified may be determined first, or both the steps may be performed simultaneously, which is not limited specifically herein. The method for determining the third target age identification result of each video issued by the target user ID can adopt the method for determining the third target age identification result of the video to be identified, so that the third target age identification result corresponding to each video issued by the target user ID can be obtained; then, determining the age type of each video issued by the target user ID according to a third target age identification result corresponding to each video issued by the target user ID; and then, according to the age type of each video issued by the target user ID, determining the age type of the target user ID, namely knowing whether the target user is an adult or a minor, so that the accurate age identification is more widely applied.

Optionally, the step of determining the age type of each video issued by the target user ID according to the age type of each video issued by the target user ID may specifically include:

acquiring the number of videos belonging to a second age type according to the age type of each video issued by the target user ID;

and determining the age type of the target user ID according to the number of the videos belonging to the second age type.

As an alternative embodiment, the number of videos belonging to the second age type (e.g., minor type) is obtained according to the age type to which each video issued by the target user ID belongs, and the age type to which the target user ID belongs can be determined according to the number of videos belonging to the second age type.

The first age type and the second age type may be the same age type or different age types, and may be set as needed, which is not specifically limited herein.

Optionally, the step of determining the age type of the target user ID according to the number of videos belonging to the second age type may specifically include:

judging whether the number of the videos belonging to the second age type meets a first condition;

if not, determining the age type of the target user ID according to the number of the videos belonging to the second age type;

and if so, determining the age type of the target user ID according to a first proportion of the self-shooting video in the videos issued by the target user ID.

As an alternative embodiment, after acquiring the number of videos belonging to the second age type, determining whether the number of videos belonging to the second age type satisfies a first condition, and if the first condition is not satisfied, determining the age type to which the target user ID belongs according to the number of videos belonging to the second age type; if the first condition is met, acquiring a first proportion of self-shot videos (namely videos shot by a front camera and/or a rear camera of the current electronic equipment) in videos issued by the target user ID in all videos issued by the target user ID, and determining the age type of the target user ID.

The steps for user-level underage detection are detailed below with a specific embodiment:

for example: as shown in fig. 5, step 501, detecting a target user account ID issuing a video to be identified, and acquiring all videos issued by the target user account ID;

step 502, acquiring the age type of each video issued by the target user ID according to the third target age identification result of each video issued by the target user ID;

step 503, acquiring the number of videos belonging to a second age type according to the age type of each video issued by the target user ID, and acquiring the number of videos belonging to the minor type if the second age type is the minor type; if the first condition is that the number of videos of the minor type is equal to 1, if the number of videos of the minor type is equal to 1, it indicates that the first condition is satisfied, and step 505 is entered; if the number of videos of the minor type is not equal to 1 (i.e., the number of videos of the minor type is 0 or the number of videos of the minor type is greater than or equal to 2), it indicates that the first condition is not satisfied; if the number of videos of the minor type is 0, it may be determined that the age type to which the target user ID belongs is the adult type; if the number of videos of the minor type is greater than or equal to 2, it may be determined that the age type to which the target user ID belongs is the minor type;

step 504, acquiring a first proportion of a self-shooting video (namely, a video shot by a front camera and/or a rear camera of the current electronic equipment) in videos issued by the target user ID in all videos issued by the target user ID, judging whether the first proportion is larger than a preset proportion (such as 10%), if so, judging that the age type of the target user ID is a minor type, wherein the first proportion is larger than the preset proportion; if not, the first proportion is smaller than or equal to the preset proportion, and the age type of the target user ID is judged to be an adult type.

In summary, in the above embodiments of the present invention, the age of the person in the image is identified by using the multidimensional features such as the human face and the human body, and meanwhile, in order to improve the efficiency, the detection and the age prediction are integrated into an age identification model, which is not divided into two stages of detection and classification, but directly predicts the age of the person in the image by using an end-to-end technology, so as to improve the accuracy, the efficiency and the robustness of the age identification; after training of a core structure of the deep learning image-level age recognition model is completed, performing age recognition on each frame of image in a video to be recognized by using parameters of the trained core structure; then, obtaining an age identification result of the video to be identified by using the age identification result of each frame of image in the video to be identified; furthermore, the age category of the user can be obtained according to the age identification result of each video issued by the user, and the application universality is improved.

As shown in fig. 6, an age identifying apparatus 600 according to an embodiment of the present invention includes:

a first obtaining module 601, configured to obtain an image to be identified;

a first extraction module 602, configured to perform feature extraction on a target person in the image to be identified, so as to obtain N feature maps with different scales for the target person;

a first identification module 603, configured to perform face identification on the N feature maps to obtain a first age identification result about a face, and perform human body identification on the N feature maps to obtain a second age identification result about a human body;

a first determining module 604, configured to determine a first target age recognition result of a target person in the image to be recognized according to the first age recognition result and the second age recognition result;

wherein N is a positive integer greater than 1.

Optionally, the first determining module 604 includes:

the calculating unit is used for calculating the first age identification result and the second age identification result to obtain a first numerical value;

the comparison unit is used for comparing the first numerical value with a first threshold value in size;

a first determining unit, configured to determine a first target age recognition result of the target person in the image to be recognized according to a first probability value in the first age recognition result and a second probability value in the second age recognition result if the first numerical value is greater than the first threshold;

Optionally, the first determining unit includes:

the calculating subunit is used for multiplying a first probability value and a second probability value corresponding to the same age category in the first age identification result and the second age identification result to obtain a third probability value corresponding to each age category;

and the first determining subunit is used for comparing the sizes of the third probability values corresponding to all the age categories and determining the age category with the maximum third probability value as the first target age identification result.

Optionally, the first obtaining module 601 includes:

the first acquisition unit is used for acquiring a video to be identified;

the analysis unit is used for carrying out video analysis on the video to be identified to obtain M frame images;

the selecting unit is used for selecting a first frame image from the M frame images, wherein the first frame image is the image to be identified;

wherein M is a positive integer greater than 1.

Optionally, the apparatus further comprises:

the second determining module is used for determining a second target age recognition result of the image to be recognized according to a first target age recognition result of a target person in the image to be recognized under the condition that the number of the target persons is 1;

and the third determining module is used for determining a second target age recognition result of the image to be recognized according to the first target age recognition result of each target person in the image to be recognized under the condition that the number of the target persons is multiple.

Optionally, the apparatus further comprises:

a fourth determining module, configured to determine a second target age identification result of another frame image, where the another frame image is a frame image of the M frame images except the image to be identified;

and the fifth determining module is used for determining a third target age identification result of the video to be identified according to the second target age identification result of each frame image in the M frame images.

Optionally, the fifth determining module includes:

the classification unit is used for classifying a second target age identification result corresponding to each frame image in the M frame images to obtain the age type of each frame image in the M frame images;

the second determining unit is used for determining the affiliated age type of the video to be identified according to the affiliated age type of each frame image in the M frame images;

and the third determining unit is used for determining a third target age identification result of the video to be identified according to the age type of the video to be identified.

Optionally, the second determining unit includes:

the first obtaining subunit is configured to obtain, according to the age type to which each frame image in the M frame images belongs, a first number of frame images belonging to a first age type in the M frame images;

the second obtaining subunit is configured to obtain a magnitude relationship between the first quantity and a second threshold;

and the second determining subunit is used for determining the age type of the video to be identified according to the size relationship between the first quantity and the second threshold value.

Optionally, the third determining unit includes:

the third obtaining subunit is configured to obtain, according to the age type to which the video to be identified belongs, a plurality of second frame images in the video to be identified, which are the same as the age type to which the video to be identified belongs;

and the third determining subunit is configured to determine a third target age identification result of the video to be identified according to the second target age identification result corresponding to each frame of image in the plurality of second frame images.

Optionally, the third determining subunit includes:

Optionally, the apparatus further comprises:

the detection module is used for detecting the ID of the target user account issuing the video to be identified;

the second obtaining module is used for obtaining the age type of each video issued by the target user ID according to a third target age identification result of each video issued by the target user ID;

and the sixth determining module is used for determining the age type of each video issued by the target user ID according to the age type of each video issued by the target user ID.

Optionally, the sixth determining module includes:

the second acquisition unit is used for acquiring the number of videos belonging to a second age type according to the age type of each video issued by the target user ID;

a fourth determining unit, configured to determine the age type to which the target user ID belongs according to the number of videos belonging to the second age type.

Optionally, the fourth determining unit includes:

a judging subunit, configured to judge whether the number of videos belonging to the second age type satisfies a first condition;

a fourth determining subunit, configured to determine, if the current age is not satisfied, the age type to which the target user ID belongs according to the number of videos belonging to the second age type;

and the fifth determining subunit is used for determining the age type of the target user ID according to the first proportion of the self-shooting video in the videos issued by the target user ID if the first proportion is met.

It should be noted that the embodiment of the age identification apparatus is an apparatus corresponding to the age identification method, and all implementation manners of the embodiment of the method are applicable to the embodiment of the apparatus, and can achieve the same technical effect, which is not described herein again.

The embodiment of the invention also provides the electronic equipment. As shown in fig. 7, the system comprises a processor 701, a communication interface 702, a memory 703 and a communication bus 704, wherein the processor 701, the communication interface 702 and the memory 703 are communicated with each other through the communication bus 704.

A memory 703 for storing a computer program.

The processor 701 is configured to implement part or all of the steps of an age identifying method provided by an embodiment of the present invention when executing the program stored in the memory 703.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment provided by the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute the age identification method described in the above embodiment.

In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the age identification method described in the above embodiment.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. An age identification method, the method comprising:

acquiring an image to be identified;

wherein N is a positive integer greater than 1.

2. The method of claim 1, wherein determining a first target age recognition result of a target person in the image to be recognized according to the first age recognition result and the second age recognition result comprises:

comparing the first value to a first threshold value;

3. The method of claim 2, wherein determining a first target age identification of a target person in the image to be identified based on a first probability value in the first age identification and a second probability value in the second age identification comprises:

4. The method of claim 1, wherein the obtaining the image to be identified comprises:

acquiring a video to be identified;

wherein M is a positive integer greater than 1.

5. The method of claim 4, wherein after determining the first target age identification of the target person in the image to be identified, the method further comprises:

6. The method of claim 5, wherein after determining the second target age identification for the image to be identified, the method further comprises:

7. The method according to claim 6, wherein the determining a third target age recognition result of the video to be recognized according to the second target age recognition result of each frame image of the M frame images comprises:

8. The method according to claim 7, wherein the determining the age type of the video to be identified according to the age type of each of the M frame images comprises:

9. The method according to claim 7, wherein the determining a third target age identification result of the video to be identified according to the age type of the video to be identified comprises:

10. The method according to claim 9, wherein determining a third target age recognition result of the video to be recognized according to the second target age recognition result corresponding to each of the plurality of second frame images comprises:

11. The method according to claim 9, wherein after determining a third target age recognition result of the video to be recognized according to the second target age recognition result corresponding to each image in the second plurality of images, the method further comprises:

detecting a target user account ID for issuing the video to be identified;

12. The method according to claim 11, wherein said determining the age type of the target user ID according to the age type of each video issued by the target user ID comprises:

13. The method according to claim 12, wherein said determining said age type of said target user ID based on said number of videos belonging to said second age type comprises:

14. An age identification device, the device comprising:

the first acquisition module is used for acquiring an image to be identified;

wherein N is a positive integer greater than 1.

15. An electronic device, comprising: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the age identification method as claimed in any one of claims 1 to 13 when executing a program stored on the memory.

16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the age identification method according to any one of claims 1 to 13.