CN115116116A

CN115116116A - Image recognition method and device, electronic equipment and readable storage medium

Info

Publication number: CN115116116A
Application number: CN202210836939.6A
Authority: CN
Inventors: 胡亚非
Original assignee: Vivo Mobile Communication Hangzhou Co Ltd
Current assignee: Vivo Mobile Communication Hangzhou Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-09-27

Abstract

The application discloses an image identification method, an image identification device, electronic equipment and a readable storage medium, and belongs to the field of artificial intelligence. The method comprises the following steps: acquiring first face characteristic information of a first face image; inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the above-mentioned N sub-face feature analysis models is used for characterizing: the age characteristics of the age group corresponding to any sub-human face feature analysis model.

Description

Image recognition method and device, electronic equipment and readable storage medium

Technical Field

The application belongs to the field of artificial intelligence, and particularly relates to an image identification method, electronic equipment and a readable storage medium.

Background

With the development of electronic technology, users can shoot by using electronic equipment and store people in albums in a classified manner according to the shot people. Therefore, how to organize and manage these images is not less important than the shooting itself.

In the related art, when a user wants to manage an album by age identification, it is common to manually mark the age of a person in a person image in the album, and then the album can identify the age of the person image, so that the user can obtain the album sorted by the age of the person in the person image.

Thus, the process of ranking in the album based on the age of the person in the person image is too cumbersome.

Disclosure of Invention

An object of the embodiments of the present application is to provide an image recognition method, an electronic device, and a readable storage medium, which can solve the problem that album grouping is too single when album management is performed through face recognition.

In a first aspect, an embodiment of the present application provides an image recognition method, where the image recognition method includes: acquiring first face characteristic information of a first face image; inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the N sub-face feature analysis models is used for representing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1.

In a second aspect, an embodiment of the present application provides an image recognition apparatus, including: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring first face characteristic information of a first face image; the processing module is used for inputting the first face feature information into a face feature analysis model and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, wherein one sub human face feature analysis model corresponds to one human face feature image; one sub-face feature analysis model corresponds to one age group; the processing module is further configured to mix feature information in the N face feature maps to obtain second face feature information with age characteristics; the processing module is further configured to determine a person age corresponding to the first face image based on the second face feature information; wherein N is an integer greater than 1.

In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor, implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.

In the embodiment of the application, first face characteristic information of a first face image is obtained; inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the above-mentioned N sub-face feature analysis models is used for characterizing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1. Therefore, the age of the person in the first face image is analyzed by utilizing the face characteristic analysis model capable of representing the face development and change of each age group of the person, so that the accurate age of the person can be obtained based on the analysis result of the age of the person, the accurate age identification of the person image of the same person in the album is realized, the fine management of the person is realized, and the accurate management of the album is realized.

Drawings

Fig. 1 is a schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 2 is a second schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 3 is a third schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 4 is a fourth schematic flowchart of an image recognition method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 6 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 7 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application are capable of operation in sequences other than those illustrated or described herein, and that the terms "first," "second," etc. are generally used in a generic sense and do not limit the number of terms, e.g., a first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The image recognition method, the electronic device, and the readable storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

The image identification method provided by the embodiment of the application can be applied to photo album management scenes.

Nowadays, with the development of electronic technology, a user can shoot by using an electronic device and store people in an album in a classified manner according to the shot people. Therefore, how to organize and manage these images is not less important than the capture itself. In the related art, in the process of managing an album, after a user performs face recognition on a person image in the album, a person album of a certain person may be generated. In this case, the user can further manage the images in the person album according to the shooting time of each image in the person album, and can create a growth album of the person. For example, in a case where a user wants to manage all images of his child "xiaoming", when the user wants to perform an operation of ranking the images of "xiaoming" in terms of age and designate the child name corresponding to the image as "xiaoming", images including the "xiaoming" front view shot or a face with a small angle in the album are imported into the same album and stored in the order of the shooting time of all the images, thereby generating a growth album of "xiaoming".

However, when images in the person album are stored in order according to the shooting time of the images in the person album, if a certain image in the person album is not an image shot by the terminal (for example, an image shared by other terminals), the time information carried by the image is inaccurate or disordered, the growing route in the finally generated growing album is disordered, and the growing route of the child from small to large age cannot be reflected.

In summary, how to solve the problem that the photo album grouping is too single when photo album management is performed through face recognition is a technical problem to be solved urgently in the present application.

In the image recognition method, the electronic device and the readable storage medium provided by the embodiment of the application, the first face feature information of the first face image is acquired; inputting the acquired first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, wherein one sub human face feature analysis model corresponds to one human face feature image; one sub-face feature analysis model corresponds to one age group; then mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein N is an integer greater than 1. In this way, by analyzing the age of the person in the first face image by using the face feature analysis model capable of representing the development and change of the face shape of each age group of the person, the person images of the same person in the album can be further finely managed based on the age of the person, and the diversified management of the album is realized.

An execution subject of the image recognition method provided by the embodiment of the application may be an image recognition device, and the image recognition device may be an electronic device or a functional module in the electronic device. The technical solutions provided in the embodiments of the present application will be described below by taking an electronic device as an example.

An embodiment of the present application provides an image recognition method, and fig. 1 shows a flowchart of the image recognition method provided in the embodiment of the present application. As shown in fig. 1, the image recognition method provided in the embodiment of the present application may include steps 201 to 204 described below.

Step 201, the electronic device obtains first face feature information of a first face image.

In this embodiment of the present application, the first face image may be a face image in an image to be recognized, or may also be a face image to be recognized. Illustratively, the image to be recognized may be any image in a person album.

In the embodiment of the present application, the first face feature information is feature information of a basic feature (BaseFeature) of a face in the first face image. Illustratively, the above basic features of the human face are usually facial structural features such as facial features, and facial contours in the human face.

In the embodiment of the application, the electronic device may output the first face feature information by inputting the first face image into the basic face feature model for feature extraction. Illustratively, the above-mentioned face basic feature model may be selected from a backbone network of common image classification models, such as MobileNetV3, where the network structure is not age-specific, in other words, the model parameters of the face basic feature model are independent of age characteristics.

Example 1: taking the image size of the input face image a as 256 × 256 as an example, the face image a may be input into a face basic feature model for processing, so as to obtain a basic feature BaseFeature corresponding to the face image a. For example, the number of feature points corresponding to the captured BaseFeature may be 32x 32.

Step 202, the electronic device inputs the first face feature information into a face feature analysis model and outputs N face feature maps.

In an embodiment of the present application, the face feature analysis model (which may also be referred to as a multi-feature branch model) includes: n sub-human face feature analysis models, wherein one sub-human face feature analysis model correspondingly outputs a human face feature map; one sub-face feature analysis model corresponds to one age group, wherein N is an integer greater than 1.

In this embodiment of the present application, each of the N age groups corresponding to the N sub-facial feature analysis models may only include one age (that is, each age corresponds to one sub-facial feature analysis model), or may include multiple ages, which is not limited in this embodiment of the present application.

In this embodiment of the present application, the model parameters in any one of the N sub-facial feature analysis models are used to characterize: and the age characteristics of the age group corresponding to any sub-human face feature analysis model. That is, the age characteristics of the sub-face feature analysis models at different ages are different.

In the embodiment of the present application, the facial features included in each facial feature map are related to the age characteristics corresponding to the sub-facial feature analysis model corresponding to the facial feature map.

Example 2: with reference to the above example 1, as shown in fig. 2, taking a face feature analysis model with 6 branch structures as an example, the 6 branch models (i.e., sub-face feature analysis models) in the face feature analysis model respectively correspond to 6 different age groups, for example, newborn to age 1, age 2 to age 3, age 4 to age 5, age 6 to age 7, age 8 to age 14, age 15 and over. Then, the feature information of BaseFeature of the face image a is input into 6 branch models (i.e., agenmodel _1 to agenmodel _6) in the face feature analysis model, so as to obtain 6 face feature maps corresponding to the 6 branch models respectively.

Illustratively, the input of each AgeModel _ i is the feature information of BaseFeature of the face image a, and the AgeModel _ i is actually the backbone network after resenet 18 replaces the convolution kernel, so the processing mode only includes convolution operation, namely, AgeModel _ i ═ AgeModel _ i (BaseFeature), wherein the obtained AgeModel _ i is a face feature map.

For example, the AgeModel _1 through AgeModel _3 may employ a 5x5, 3x3 convolution kernel, while the AgeModel _4 through AgeModel _6 may employ a 5x3 convolution kernel. Thus, in AgeModel _1 through AgeModel _3, the 7x7 convolution kernel in the ResNet18 network structure was replaced with a 5x5 convolution kernel, leaving the 3x3 convolution kernel, and in AgeModel _4 through AgeModel _6, the 7x7 convolution kernel and the 3x3 convolution kernel in the ResNet18 network structure were replaced with a 5x3 convolution kernel to fit the actual shape of the input human face.

And step 203, the electronic equipment mixes the feature information in the N face feature maps to obtain second face feature information with age characteristics.

In the present application, "age characteristics" refer to features of a human face that can represent the age of a person. For example, a child may have an age characteristic of "baby fat", and an elderly person may have an age characteristic of "white hair, age spots, etc.

In this embodiment of the application, the electronic device may input the feature information in the N face feature maps into a feature mixture model (blend) for linear combination, and output second face feature information with age characteristics.

In an embodiment of the present application, the second face feature information may include feature information of a face feature (selected feature) having an age characteristic.

In the embodiment of the present application, when the electronic device mixes the feature information in the N face feature maps, the electronic device mainly obtains the feature information of the selected feature by linearly combining the feature information of the N face feature maps. Specifically, the above linear combination refers to the weighted summation of the elements of the convex set in the linear algebraic theory (the sum of the weights is equal to 1).

And step 204, the electronic equipment determines the age of the person corresponding to the first face image based on the second face feature information with the age characteristic.

In this embodiment, after the electronic device acquires the second face feature information, the electronic device may determine, based on the age characteristics indicated by the second face feature information, an age group to which a person corresponding to the first face image belongs from the age characteristics indicated by the N age groups, and may determine, from the age group, an age of the person corresponding to the first face image.

In the image recognition method provided by the embodiment of the application, first face characteristic information of a first face image is obtained; inputting the first face feature information into a face feature analysis model, and outputting N face feature images; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the above-mentioned N sub-face feature analysis models is used for characterizing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1. Therefore, the age of the person in the first face image is analyzed by utilizing the face characteristic analysis model capable of representing the face development and change of each age group of the person, so that the accurate age of the person can be obtained based on the analysis result of the age of the person, the accurate age identification of the person image of the same person in the album is realized, the fine management of the person is realized, and the accurate management of the album is realized.

Optionally, in this embodiment of the present application, before the step 203, the image recognition method provided in this embodiment of the present application further includes steps 301 to 303:

step 301, the electronic device splices the N face feature images according to the color channels to obtain a target face feature image.

Step 302, the electronic device calculates a correlation value between the feature information of the target face feature map and the first face feature information.

Further optionally, in this embodiment, in combination with the step 301 and the step 302, the process of "mixing feature information in N face feature maps to obtain second face feature information with age characteristic" in the step 203 includes the following step 303:

step 303, the electronic device mixes the feature information in the N face feature maps by using the correlation value obtained by the calculation, so as to obtain second face feature information with age characteristics.

In this embodiment of the present application, a correlation value between the feature information of the target face feature map and the first face feature information is used to characterize: and the degree of correlation between the feature information of the target face feature map and the first face feature information.

In this embodiment of the application, the electronic device may perform relevance evaluation by using the feature information of the target face feature map and the first face feature information as two inputs of a relevance evaluation model, so as to obtain a relevance value (Rvalues) between the two inputs.

Illustratively, the correlation evaluation model described above may be a small convolution sub-network.

Illustratively, the small convolution sub-network may specifically include: 3x3 convolutional layers, 1x1 convolutional layers, global average pooling layers, fully connected layers, Sigmoid layers.

Illustratively, the parameters of the small convolution sub-network described above are trainable.

For example, the Sigmoid layer has N output nodes, and each output node outputs a correlation value, that is, the correlation value between the feature information of the target face feature map and the first face feature information may include N values.

In the embodiment of the application, the electronic device may obtain the second face feature information after superimposing image information corresponding to the same color channel in the N face feature maps according to the color channel.

It should be noted that, when image stitching is performed according to the color channels, the height and width of the image remain unchanged.

Optionally, in this embodiment of the application, the process of "the electronic device mixes the feature information in the N face feature maps by using the calculated correlation value to obtain the second face feature information with the age characteristic" in the step 303 specifically includes the following steps a and B:

step A: and the electronic equipment multiplies the correlation value corresponding to the ith human face feature map by the feature information in the ith human face feature map to obtain second human face feature information corresponding to the ith human face feature map.

Wherein i is 1,2 … …, N.

And B: and the electronic equipment accumulates the second face feature information corresponding to each face feature image in the N face feature images to obtain second face feature information with age characteristics.

Further optionally, in this embodiment of the application, the process of "mixing feature information in the N face feature maps by using the calculated correlation value to obtain second face feature information with age characteristics" in the step 303 specifically includes the following step 303 a:

step 303a, the electronic device mixes the feature information in the N face feature maps based on the first formula by using the correlation value obtained by the calculation, so as to obtain second face feature information with age characteristics.

Illustratively, the first formula is:

wherein, the selectefeature is second face feature information with age characteristics;

AgeFeature _i the feature information of the ith human face feature map;

r _i is the correlation value described above.

It should be noted that the processes of step a and step B described above can be realized by the first formula.

Example 3: with reference to the example 2, as shown in fig. 2, after the electronic device acquires 6 face feature maps corresponding to the 6 branch models, the 6 face feature maps may be spliced according to color channels, that is, the channel numbers of the 6 face feature maps output by the AgeModel _1 to the AgeModel _6 are superimposed. For example, if the face feature graph corresponding to the AgeModel _ i is denoted as AgeFeature _ i, the size of the face feature graph is 32x32x256, that is, the height and the width are both 32, and the number of color channels is 256, then the splicing according to the color channels means that the height and the width are kept unchanged, and the ConcatFeature size of the target face feature graph obtained by splicing after the superposition of the number of color channels is 32x32x1536, that is, 32x32x (256x 6).

Then, the electronic device may further concatenate feature maps output by AgeModel _1 to AgeModel _6 by channels, and use the concatenated feature maps as one input of a relevance evaluator (REvaluator), and use the feature information of Basefeature of the face image A as the other input of the relevance evaluator, so that the relevance evaluator REvaluator receives the two inputs, concatenates the two inputs by channels, and obtains a relevance value through a small convolution sub-network calculation. The small convolution sub-network consists of a 3x3 convolution layer, a 1x1 convolution layer, a global average pooling layer, a full connection layer and a Sigmoid layer. Since the Sigmoid layer has 6 output nodes, the relevance value of the REvaluator output is r _i ，i＝1,…,6。

In this manner, by performing correlation calculation between the target face feature map (i.e., the different face feature map mosaic) and the above-described first face feature map (i.e., the basic feature map), second face feature information having an age characteristic can be further more accurately determined based on the calculated correlation value.

Optionally, in this embodiment of the application, the process of determining, by the electronic device, the age of the person corresponding to the first face image based on the second face feature information with the age characteristic in the step 204 includes the following steps 401 and 402:

step 401, the electronic device inputs the second face feature information into an age classification model, and outputs N probability values.

And step 402, the electronic equipment determines the age of the person in the first face image from the age bracket corresponding to the maximum probability value in the N first probability values.

Illustratively, the N probability values correspond to N age groups. The N age groups are the age groups corresponding to the sub-face feature analysis model.

Illustratively, any of the N probability values described above is used to characterize: and the probability that the age of the person corresponding to the first face image belongs to the age group corresponding to any probability value.

Illustratively, the age classification model may include: the system comprises a global average pooling layer, a full connection layer, an N-branch output layer and a Softmax layer.

For example, the classification formula corresponding to the age classification model is the following second formula:

the second formula is: i.e. i ^* ＝argmax _i Softmax_i。

Wherein Softmax _ i is an output value of the ith node in the Softmax layer, i is 1, …, N nodes exist in the Softmax layer, and one node corresponds to one of the age groups.

i is the node corresponding to the maximum probability value in the N first probability values.

For example, in the above example, if the result of a certain calculation is i ═ 1, it is determined that the probability value of the first node is the maximum, that is, the probability that the input first face image belongs to the corresponding age group is the maximum, that is, the first face image corresponds to the first age group, that is, the age of the person corresponding to the face image is from newborn to 1 year.

In this way, the probability of the age group to which the age of the person in the first face image belongs can be obtained by inputting the second face feature information having the age characteristic to the age classification model, so that the age of the person in the first face image can be judged, and a unique age tag (Classifier) can be obtained.

In the embodiment of the present application, the training process of all models in the age identification flowchart shown in fig. 2 includes: since the collected training images have true age labels, the correlation evaluator may be trained using a hot start training approach. Meanwhile, modules such as REvaluator, RValues, Blender and the like can be omitted during initial training, and only the training images are adopted to train AgeModel _ i of the branches corresponding to the age labels of the training images, so that the network learns age-specific characteristic information and network weight. Thus, after the branch module (i.e., AgeModel _ i) is trained and converged, modules such as REvaluator, RValues, Blender and the like are added for training. The training mode can avoid the problem that the complex network structure is difficult to converge after training from the beginning.

Optionally, in an embodiment of the present application, the image recognition method provided in the embodiment of the present application may further include the following steps a1 and a 2:

step A1: the electronic device identifies an age of a person corresponding to each image in the first set of images.

Step A2: the electronic equipment screens out the images with the ages not meeting the preset age condition in the first image set based on the ages of the persons corresponding to the images.

For example, if an album contains adults and children or a partial image of the album contains a group photo of adults and children, the images of the album containing adults may be excluded according to the age identification result of the images, so as to avoid generating a pseudo child growth video album using the photos belonging to adults.

Optionally, in an embodiment of the present application, the image recognition method provided in the embodiment of the present application may further include the following steps B1 and B2:

step B1: the electronic device identifies an age of a person corresponding to each image in the first set of images.

Step B2: the electronic equipment sorts the images in the first image set according to the age sequence based on the age of the person corresponding to each image, and generates a target image set or a target video.

For example, the children photos can be sorted from small to large according to the identification result of the ages of the children, so that a video album which accords with the growth time line of the children is generated, and the growth process of the children is really recorded.

In the related art, in the process of managing the photo album, after the user performs face recognition on the person image in the photo album, if the recognized child face is greatly deviated from the lens (>70 degrees) or the child shadow is shot, the effect of the face recognition scheme is greatly reduced, which results in misclassification or non-automatic uploading to the photo album.

In order to solve the problem, the embodiment of the application may be capable of adapting to different shooting situations (for example, shooting in front or near front, shooting in a large angle away from the lens of a human face, blocking the human face, facing away from the lens, and the like), and different image management schemes are provided for different shooting situations.

Optionally, in this embodiment of the present application, before the step 201, the image recognition method provided in this embodiment of the present application further includes step 501:

and step 501, acquiring an image to be identified.

The image to be recognized may be any image in a terminal album, and specifically, may be determined according to an actual use situation, and the application is not limited.

Further optionally, in this embodiment of the application, the "acquiring first facial feature information of the first facial image" in step 201 may include the following step 502:

step 502, if a face image exists in the image to be recognized and the reliability of the face image meets a first condition, the electronic device obtains feature information of the face image to obtain first face feature information of the first face image.

Illustratively, the first condition includes: the confidence of the key points in the face image in the image to be recognized is greater than or equal to a preset threshold value. Further, the confidence of the key points in the face image may be an average of the confidence of each key point in the face image.

Illustratively, if a face image exists in the image to be recognized and the reliability meets the first condition, it indicates that the face in the image to be recognized is not blocked by a large area, the face does not deviate from the lens direction by a large angle (>70 degrees), the face feature is reliable, and the age recognition can be performed by depending on the face image.

Optionally, in this embodiment of the present application, before the step 201, the image recognition method provided in this embodiment of the present application further includes steps 503 and 504:

step 503, if no face image exists in the image to be recognized, or a face image exists in the image to be recognized and the reliability of the face image does not meet the first condition, the electronic device calculates a similarity value between the person body feature information in the image to be recognized and the person body feature information in the images in the target image set.

Step 504, if the similarity value corresponding to any image in the target image set is greater than the first threshold, the electronic device adds the image to be recognized to the target image set.

Illustratively, the target image set includes at least one image.

Illustratively, the second condition is satisfied between the image in the target image set and the image to be recognized. Wherein the second condition includes: the difference between the timestamp (e.g., shooting time) corresponding to the image in the target image set and the timestamp corresponding to the image to be recognized is less than a second threshold, and/or the scene features in the images in the target image set are matched with the scene features in the image to be recognized.

Illustratively, if a face image does not exist in the image to be recognized, or a face image exists in the image to be recognized and the reliability of the face image does not meet the first condition, it indicates that the face in the image to be recognized is blocked by a large area, the face is deviated from the lens direction by a large angle (>70 degrees), the face features are unreliable, and the image cannot be relied on for age recognition.

Illustratively, the human body feature information in the images in the target image set is consistent with or close to the shooting time and the shooting location of the human body feature information in the image to be recognized.

Illustratively, when no face image exists in the image to be recognized, or a face image exists in the image to be recognized and the reliability of the face image does not satisfy the first condition, the body feature information of the recognized image is acquired, then the cosine similarity between the body feature information and the person body feature information in the images in the target image set and the images in the target image set is calculated, and if the similarity value is greater than a first threshold value, the image to be recognized corresponding to the body feature information is added to the target image set.

Therefore, when the face image in the image to be recognized is reliable, the age of the people in the image to be recognized can be recognized and grouped; when the face images in the images to be recognized are unreliable or no face images exist, cosine value similarity values between body characteristic information in the images to be recognized and body characteristic information of images in a certain image collection are calculated, so that the images to be recognized are grouped, and the face images are further unreliable or the images can also be grouped if no face images exist.

The image recognition method provided by the present application will be exemplarily described below in 2 embodiments.

In a first possible embodiment:

the embodiment provides a video album generating method which can adapt to different shooting conditions (such as front or near-front shooting, large-angle deviation of a face from a lens, face occlusion, back-to-lens and the like), group each person image according to the person identity and sort the images according to the person age.

Exemplarily, taking the child image as an example, as shown in fig. 3, the method includes the following steps S11 to S17:

and step S11, acquiring the album reading and writing authority of the electronic equipment.

Illustratively, in order to protect the privacy of the user and respect the user's informed consent right, a user agreement is shown to the user before the video album generation function is enabled, and the user agrees to obtain the read-write right of the electronic equipment album so as to read the image and generate the album video after processing.

And step S12, acquiring a user image list.

Illustratively, each folder of an album in the consumer electronic device is traversed to obtain a list of all images.

And step S13, AI child identification.

Illustratively, various images exist in the electronic equipment photo album of the user, and whether each image in the whole image list is a child image is judged, so that all child images are screened out, and non-child images are excluded. The child image recognition can be implemented by using a mature image classification technology based on deep learning, and details are not repeated herein.

Illustratively, the child image definition is illustrated as follows: the children under 14 years of age are used as ideal defining values in legal significance, and the children from the newborn to the 6 years of age are preferred; secondly, the image takes the children as a main body, the number, the clothes dressing and the environment of the children are not limited, in other words, the children in the image can be identified as the child image.

And step S14, screening the child images.

For example, due to the influence of factors such as a shooting environment, a child state, a shooting level of a photographer and the like, not all child images in a user album are high-quality images, and an image with poor definition, an image with poor expression and a repeated image are inevitable.

And step S15, clustering the child images.

Illustratively, through the preprocessing of the steps, high-quality images of all children in the electronic equipment photo album of the user are obtained. To create a growing video album for each child, the images need to be grouped by child identity. The images do not need to be prestored as grouped reference pictures, and the images are automatically grouped according to the identity of the child in an unsupervised clustering mode.

Specifically, the face recognition technology can be used for recognizing the identity of the child, which is effective for images shot from the front and non-extreme angles of the child, but actually shot images often include various angles of the child, and the face and the back shadow of the child are covered by the child wearing decoration, and under these conditions, the face features become unreliable, and the problem of misclassification or non-misclassification exists when the child image grouping is performed only according to the face recognition technology, so that images of other unrelated people can be mixed in one group, and the user experience is seriously influenced.

Illustratively, according to the face and body detection results, the facial and body characteristics of the children are adaptively utilized to perform image grouping on the children, so that the problem of image grouping under the condition that the face is greatly deviated from the lens or the face is invisible (such as the face is blocked, faces are away from the lens and the like) in the images can be solved.

Exemplarily, an image is input, the face and the body of a person are firstly detected, and a face frame and a body frame are matched, so as to correspond to the same person individual, wherein the face frame is an area where the face is located, and the body frame is an area where the body is located, and may be rectangular or circular, and may be determined according to actual use conditions, which is not limited in the present application.

In one example, the method for matching the face frame and the body frame includes: since the face frame is usually located within the body frame of the same person, the area of each detected face frame is denoted as a. Calculating the area of the intersection region of the body frame and all the body frames in the graph, and recording the area as Ai, wherein i is the index of the body frame in the graph. And calculating the area ratio Ai/A, wherein the body frame corresponding to the maximum ratio is the body frame matched with the current face frame, namely the face and the body belong to the same person, and if the parallel maximum value exists, the body frame corresponding to the minimum Bi value is taken as the body frame matched with the current face frame.

For example, after a face image is acquired, whether the face features in the face image are reliable or not can be judged, if so, subsequent clustering is performed by using the face feature information, if not, whether the shooting time and the shooting location of the two images are close (for example, the same day and the same location) or not is judged according to the EXIF information of the images, if so, subsequent clustering is performed by using the body feature information, otherwise, it can be judged that the face and the body features are not reliable, and then clustering is not performed on the current person individual.

Illustratively, for each individual person in the image, if the face features of the individual person are reliable, the face image block and the whole body image block are respectively sent to a face Feature extractor and a body Feature extractor, and the face Feature (Feature _ face) and the body Feature (Feature _ body) of the individual person are extracted. If the facial features are not reliable, only the image blocks of their body are fed into a body feature extractor, which extracts their body features, wherein the facial feature extractor can be obtained using a sophisticated face recognition model (e.g., the ArcFace algorithm) and the body feature extractor can be obtained using a sophisticated image classification model (e.g., MobileNetV 3).

Illustratively, clustering the images of the children employs a hierarchical clustering algorithm.

For example, taking a 2-layer clustering as an example, when performing the first-layer clustering, clustering is performed by using the face features, and highly similar individuals are clustered into the same class. After face clustering is completed, for body features (Feature _ bodies) without corresponding face frames, if the shooting time and the shooting location of the pictures where the body features do not correspond to the images in a certain cluster are close, cosine similarity between the body features is calculated, the cluster label corresponding to the cluster label with the highest similarity and exceeding a preset threshold (for example, 0.8) is taken as the cluster label of the current body Feature _ body, namely, the current image is given with the corresponding cluster label, so that the problem of image grouping under the condition that the face in the image is greatly deviated from a lens or the face is invisible (for example, the face is shielded, faces are opposite to the lens and the like) is solved. When the second layer of clustering is performed, only facial features are used, and so on, the hierarchical clustering algorithm recursively merges paired clusters to minimally increase the intra-class link distance. Until the increment of the intra-class link distance is larger than a predetermined threshold (e.g., 0.2), the person clustering is completed, i.e., the image grouping is completed.

Step S16 identifies the age of the child in the image.

For example, the clustering step does not include a child recognition function, and it cannot be guaranteed that all categories are mainly children, for example, there may be a scene in which adults and children are combined, at this time, the clustering step may generate categories corresponding to the adults, so that the categories corresponding to the adults may be excluded according to the age recognition result, and generation of a pseudo child growth video album using images belonging to all adults is avoided. In addition, the electronic equipment can automatically sort the children images from small to large according to the identified ages according to the identification result of the children ages, so that a video album which accords with the growth time line of the children is generated, and the growth process of the children is really recorded.

Illustratively, by using the principle of sketch of the face of a child and considering the development and change process of the face of the child, the image recognition method of the embodiment of the application provides a deep network structure with multiple characteristic branches and a correlation evaluation mechanism to accurately identify the age of the child.

And step S17, generating a child growth photo album.

Illustratively, the images ranked according to the ages of the children are matched to generate the children growth photo album, the children growth photo album is more expressive than the method of only putting one image into one folder, and more personalized children photo album templates can be matched with the terminal photo album client side, so that the emotional value is higher.

Therefore, by the method, repeated images, images with poor definition or character expression can be removed, then the facial and body characteristics of the children are adaptively utilized to perform image grouping on the children, the problem of image grouping under the condition that the face is greatly deviated from the lens or the face is invisible in the images (for example, the face is shielded, faces away from the lens and the like) can be solved, and the system does not need to store any images of any children in advance. Furthermore, the children images and videos are sequenced according to age to generate the child growth photo album in the true sense, and the emotional resonance of parents and children can be caused.

In a second possible embodiment:

since at least one image of each person is usually required to be prestored when face recognition is performed in the related art, the situation of adding another person cannot be automatically handled.

For the scene, the embodiment provides a method for concisely and efficiently processing the images of the newly added characters in the album of the electronic device of the user after the character growth album is generated for the first time.

Exemplarily, taking the child image as an example, as shown in fig. 4, the method includes the following steps S21 to S27:

and step S21, reading the existing clustering information of the electronic equipment.

Illustratively, when the electronic device detects that the user album has a new image, the electronic device reads the existing grouping information to obtain a series of category labels, such as image file name-facial feature information-body feature information-age.

And step S22, acquiring a user new image list.

Illustratively, the newly added images after the last clustering are traversed in the album, and a file name list of the newly added images is obtained.

And step S23, AI child identification.

Exemplarily, like the above step S13, but the step S13 processes only the list of newly added images acquired in the step S22.

And step S24, screening the child images.

Illustratively, same as step S14 described above.

And step S25, performing incremental clustering on the newly added child images.

Illustratively, for the newly added images screened by the steps, the newly added images are inserted into the existing clusters by using an incremental clustering mode, or the newly added images are judged to correspond to newly added people, and then a new category is generated.

Specifically, all the newly added images are processed in the manner of step S15 to obtain a temporary clustering result, where "temporary" means that the category labels of the newly added images may belong to the existing clustering result before, and therefore the clustering result of this step is not the final result of this clustering. Then, using the second-level clustering method of hierarchical clustering in step S15, the temporary clustering result and the existing clustering result are processed, and so on, the hierarchical clustering algorithm recursively merges the paired clusters to increase the intra-class link distance to the minimum. Until the increment of the intra-class link distance is larger than a preset threshold (for example, 0.2), the person incremental clustering is completed, namely, the image grouping updating is completed.

Step S26 identifies the age of the child in the image.

Illustratively, same as step S16 described above.

And step S27, updating or generating the child growth video album.

Illustratively, each new image with age information of the child is inserted into the existing image list of the child by using a binary search method, and the growth album is updated, or for the new child, the corresponding growth album is generated in the manner of the above step S17. In addition, the existing clustering database table is updated.

Therefore, the method provided by the second embodiment can process the situation that the electronic equipment album of the user newly adds images or videos after the first clustering is completed, avoid repeatedly processing a large number of existing images before, and update the existing child growth album or generate the growth album of the newly added child with higher time efficiency and lower energy consumption.

It should be noted that, in the image recognition method provided in the embodiment of the present application, the execution subject may be an image recognition apparatus, or an electronic device, or may also be a functional module or an entity in the electronic device. In the embodiment of the present application, an image recognition method executed by an image recognition apparatus is taken as an example, and the image recognition apparatus provided in the embodiment of the present application is described.

Fig. 5 shows a schematic diagram of a possible structure of the image recognition apparatus according to the embodiment of the present application. As shown in fig. 5, the image recognition apparatus 700 may include: an acquisition module 701 and a processing module 702; the obtaining module 701 is configured to obtain first face feature information of a first face image; the processing module 702 is configured to input the first facial feature information into a facial feature analysis model, and output N facial feature maps; the face feature analysis model comprises N sub-face feature analysis models, and one sub-face feature analysis model correspondingly outputs a face feature image; one sub-face feature analysis model corresponds to one age group; the processing module 702 is further configured to mix feature information in the N face feature maps to obtain second face feature information with age characteristics; the processing module 702 is further configured to determine, based on the second facial feature information, a person age corresponding to the first facial image; wherein, the model parameter in any sub-face feature analysis model in the N sub-face feature analysis models is used for representing: the age characteristics of the age group corresponding to any sub-human face feature analysis model, wherein N is an integer greater than 1.

Optionally, in this embodiment of the application, the processing module 702 is specifically configured to splice the N face feature maps according to a color channel to obtain a target face feature map; the processing module 702 is specifically configured to calculate a correlation value between the feature information of the target face feature map and the first face feature information; the processing module 702 is specifically configured to mix the feature information in the N face feature maps by using the correlation value, so as to obtain second face feature information with age characteristics.

Optionally, in this embodiment of the application, the processing module 702 is specifically configured to multiply a relevance value corresponding to the ith human face feature map by the feature information in the ith human face feature map, so as to obtain second human face feature information corresponding to the ith human face feature map; 1,2 … …, N; the processing module 702 is specifically configured to accumulate second face feature information corresponding to each of the N face feature maps to obtain the second face feature information with the age characteristic.

Optionally, in this embodiment of the application, the processing module 702 is specifically configured to input the second facial feature information into an age classification model, and output N probability values; the N probability values correspond to the N age groups; any one of the N probability values is used to characterize: a probability that the age of the person corresponding to the first face image belongs to an age group corresponding to any probability value; the processing module 702 is further configured to determine the age of the person in the first face image from the age group corresponding to the maximum probability value of the N probability values.

Optionally, in this embodiment of the application, the obtaining module 701 is further configured to obtain an image to be identified; the obtaining module 701 is specifically configured to, if a face image exists in the image to be recognized and the reliability of the face image meets a first condition, obtain feature information of the face image, and obtain first face feature information of the first face image.

Optionally, in this embodiment of the application, the processing module 702 is further configured to calculate a similarity value between the person body feature information in the image to be recognized and the person body feature information in the images in the target image set, if the image to be recognized does not have the face image, or the image to be recognized has the face image and the reliability of the face image does not satisfy the first condition; the processing module 702 is further configured to add the image to be identified to the target image set if the similarity value corresponding to any image in the target image set is greater than the first threshold.

In the image recognition device provided by the embodiment of the application, the device acquires first face characteristic information of a first face image; inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the above-mentioned N sub-face feature analysis models is used for characterizing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1. Therefore, the age of the person in the first face image is analyzed by utilizing the face characteristic analysis model capable of representing the face development and change of each age group of the person, so that the accurate age of the person can be obtained based on the analysis result of the age of the person, the accurate age identification of the person image of the same person in the album is realized, the fine management of the person is realized, and the accurate management of the album is realized.

The image recognition device in the embodiment of the present application may be an electronic device, and may also be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a terminal, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The image recognition apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The image recognition device provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to fig. 4, and is not described here again to avoid repetition.

Optionally, as shown in fig. 6, an electronic device 800 is further provided in an embodiment of the present application, and includes a processor 801 and a memory 802, where the memory 802 stores a program or an instruction that can be executed on the processor 801, and when the program or the instruction is executed by the processor 801, the steps of the embodiment of the image recognition method are implemented, and the same technical effects can be achieved, and are not described again here to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 7 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The processor 110 is configured to obtain first facial feature information of a first facial image; the processor 110 is configured to input the first facial feature information into a facial feature analysis model, and output N facial feature maps; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; the processor 110 is further configured to mix feature information in the N face feature maps to obtain second face feature information with age characteristics; the processor 110 is further configured to determine an age of a person corresponding to the first face image based on the second face feature information; the model parameters in any one of the N sub-face feature analysis models are used for representing: the age characteristics of the age group corresponding to any sub-human face feature analysis model, wherein N is an integer greater than 1.

Optionally, in this embodiment of the application, the processor 110 is specifically configured to splice the N face feature maps according to a color channel to obtain a target face feature map; the processor 110 is specifically configured to calculate a correlation value between the feature information of the target face feature map and the first face feature information; the processor 110 is specifically configured to mix feature information in the N face feature maps by using the correlation value to obtain second face feature information with age characteristics.

Optionally, in this embodiment of the application, the processor 110 is specifically configured to multiply a relevance value corresponding to an ith human face feature map by feature information in the ith human face feature map, so as to obtain second human face feature information corresponding to the ith human face feature map; 1,2 … …, N; the processor 110 is specifically configured to accumulate second face feature information corresponding to each of the N face feature maps to obtain the second face feature information with the age characteristic.

Optionally, in this embodiment of the application, the processor 110 is further configured to input the second facial feature information into an age classification model, and output N probability values; the N probability values correspond to N age groups; any one of the N probability values is used to characterize: a probability that the age of the person corresponding to the first face image belongs to an age group corresponding to any probability value; the processor 110 is further configured to determine an age of the person in the first face image from an age bracket corresponding to a maximum probability value of the N probability values.

Optionally, in this embodiment of the application, the processor 110 is further configured to obtain an image to be identified; the processor 110 is specifically configured to, if a face image exists in the image to be recognized and the reliability of the face image meets a first condition, obtain feature information of the face image, and obtain first face feature information of the first face image.

Optionally, in this embodiment of the application, the processor 110 is further configured to calculate a similarity value between the person body feature information in the image to be recognized and the person body feature information in the images in the target image set if no face image exists in the image to be recognized, or if a face image exists in the image to be recognized and the reliability of the face image does not satisfy the first condition; the processor 110 is further configured to add the image to be recognized to the target image set if the similarity value corresponding to any image in the target image set is greater than the first threshold.

In the electronic device provided by the embodiment of the application, the electronic device acquires first face characteristic information of a first face image; inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group; mixing the feature information in the N face feature images to obtain second face feature information with age characteristics; determining the age of the person corresponding to the first face image based on the second face feature information; wherein, the model parameter in any sub-face feature analysis model in the above-mentioned N sub-face feature analysis models is used for characterizing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1. Therefore, the age of the person in the first face image is analyzed by utilizing the face characteristic analysis model capable of representing the face development and change of each age group of the person, so that the accurate age of the person can be obtained based on the analysis result of the age of the person, the accurate age identification of the person image of the same person in the album is realized, the fine management of the person is realized, and the accurate management of the album is realized.

It should be understood that, in the embodiment of the present application, the input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics Processing Unit 1041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first storage area storing a program or an instruction and a second storage area storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, memory 109 may include volatile memory or non-volatile memory, or memory 109 may include both volatile and non-volatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). Memory 109 in the embodiments of the subject application includes, but is not limited to, these and any other suitable types of memory.

Processor 110 may include one or more processing units; optionally, the processor 110 integrates an application processor, which mainly handles operations related to the operating system, user interface, application programs, etc., and a modem processor, which mainly handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the image recognition method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the embodiment of the image recognition method, and can achieve the same technical effect, and is not described here again to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, or a system-on-chip.

Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing image recognition method embodiments, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be a terminal, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An image recognition method, characterized in that the method comprises:

acquiring first face characteristic information of a first face image;

inputting the first face feature information into a face feature analysis model, and outputting N face feature graphs; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group;

mixing the feature information in the N face feature images to obtain second face feature information with age characteristics;

determining the age of the person corresponding to the first face image based on the second face feature information;

wherein, the model parameter in any sub-face feature analysis model in the N sub-face feature analysis models is used for representing: the age characteristics of the age group corresponding to any sub-human face feature analysis model; n is an integer greater than 1.

2. The method according to claim 1, wherein the mixing the feature information in the N face feature maps to obtain second face feature information with age characteristics comprises:

splicing the N face feature images according to a color channel to obtain a target face feature image;

calculating a correlation value between the feature information of the target face feature map and the first face feature information;

and mixing the feature information in the N face feature graphs by adopting the correlation value to obtain second face feature information with age characteristics.

3. The method according to claim 2, wherein the mixing feature information in the N face feature maps by using the correlation value to obtain second face feature information with age characteristics comprises:

multiplying the correlation value corresponding to the ith human face feature map with the feature information in the ith human face feature map to obtain second human face feature information corresponding to the ith human face feature map; 1,2 … …, N;

and accumulating the second face feature information corresponding to each face feature image in the N face feature images to obtain the second face feature information with the age characteristic.

4. The method of claim 1, wherein before the obtaining first facial feature information of the first facial image, the method further comprises:

acquiring an image to be identified;

obtain first face characteristic information of first face image, include:

if a face image exists in the image to be recognized and the reliability of the face image meets a first condition, acquiring the feature information of the face image to obtain first face feature information.

5. The method of claim 4, wherein after acquiring the image to be identified, the method further comprises:

if no face image exists in the image to be recognized, or the face image exists in the image to be recognized and the reliability of the face image does not meet the first condition, calculating a similarity value between the person body feature information in the image to be recognized and the person body feature information in the images in the target image set;

and if the similarity value corresponding to any image in the target image set is greater than a first threshold value, adding the image to be identified to the target image set.

6. An image recognition apparatus, characterized in that the image recognition apparatus comprises: the device comprises an acquisition module and a processing module;

the acquisition module is used for acquiring first face characteristic information of a first face image;

the processing module is used for inputting the first face feature information acquired by the acquisition module into a face feature analysis model and outputting N face feature maps; the human face feature analysis model comprises N sub human face feature analysis models, and one sub human face feature analysis model correspondingly outputs a human face feature image; one sub-face feature analysis model corresponds to one age group;

the processing module is further configured to mix feature information in the N face feature maps to obtain second face feature information with age characteristics;

the processing module is further configured to determine a person age corresponding to the first face image based on the second face feature information;

wherein, the model parameter in any sub-face feature analysis model in the N sub-face feature analysis models is used for representing: the age characteristics of the age group corresponding to any sub-human face feature analysis model, wherein N is an integer greater than 1.

7. The apparatus of claim 6,

the processing module is specifically used for splicing the N face feature maps according to a color channel to obtain a target face feature map;

the processing module is specifically configured to calculate a correlation value between the feature information of the target face feature map and the first face feature information, which is obtained by processing of the processing module;

the processing module is specifically configured to mix the feature information in the N face feature maps by using the correlation value to obtain second face feature information with an age characteristic.

8. The apparatus of claim 7,

the processing module is specifically configured to multiply the correlation value corresponding to the ith human face feature map with feature information in the ith human face feature map to obtain second human face feature information corresponding to the ith human face feature map; 1,2 … …, N;

the processing module is specifically configured to accumulate second face feature information corresponding to each of the N face feature maps to obtain the second face feature information with the age characteristic.

9. The apparatus of claim 6,

the acquisition module is also used for acquiring an image to be identified;

the obtaining module is specifically configured to obtain feature information of the face image and obtain first face feature information if the face image exists in the image to be recognized and the reliability of the face image meets a first condition.

10. The apparatus of claim 9,

the processing module is further configured to calculate a similarity value between the person body feature information in the image to be recognized and the person body feature information in the images in the target image set if no face image exists in the image to be recognized or if the face image exists in the image to be recognized and the reliability of the face image does not satisfy the first condition;

the processing module is further configured to add the image to be identified to the target image set if the similarity value corresponding to any image in the target image set is greater than a first threshold.

11. An electronic device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the image recognition method according to any one of claims 1 to 5.

12. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the image recognition method according to any one of claims 1 to 5.