CN117373093A

CN117373093A - Image recognition method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN117373093A
Application number: CN202311436094.2A
Authority: CN
Inventors: 许剑清
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-01-09

Abstract

The application provides an image recognition method, device, equipment and storage medium based on artificial intelligence, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like and is used for improving the accuracy rate of image recognition of a cross-age face. Comprising the following steps: acquiring a face image to be identified; invoking an identification network to extract the features of the face image to be identified to obtain a first feature; invoking N complementary networks corresponding to the N age tags to respectively perform feature extraction on the face image to be identified to obtain N second features; invoking a feature modulation network to perform age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be identified; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and classifying the fusion features to obtain the recognition result of the face image to be recognized.

Description

Image recognition method, device, equipment and storage medium based on artificial intelligence

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to an image recognition method, apparatus, device and storage medium based on artificial intelligence.

Background

With the popularization of face recognition application scenarios, the face recognition model is required to be robust to each specific scenario. In a scene containing cross-age picture comparison (such as cross-age person searching and identity comparison peer-to-peer), pictures of the same identity in different age segments are required to be compared.

At present, most of training pictures adopted in the training process of the face recognition model are picture data of the same age group or adjacent age groups, and the face pictures of different age groups are distributed inconsistently, so that extracted features of the pictures of different age groups cannot be aligned in a high-dimensional feature space, and the face recognition model obtained through training cannot meet the recognition requirements of the cross-age faces.

Therefore, a scheme for realizing the cross-age facial image recognition is needed.

Disclosure of Invention

The embodiment of the application provides an image recognition method, device and equipment based on artificial intelligence and a storage medium, which are used for improving the accuracy rate of cross-age face image recognition.

In view of this, the present application provides, in one aspect, an image recognition method based on artificial intelligence, including: acquiring a face image to be identified; invoking an identification network of an image identification model to extract the characteristics of the face image to be identified so as to obtain a first characteristic; invoking N complementary networks in the image recognition model to respectively extract the features of the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1; invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating that the face image to be recognized is classified into probability values of each age label in the N age labels; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and calling a classification network in the image recognition model to classify the fusion characteristics so as to obtain a recognition result of the face image to be recognized.

Another aspect of the present application provides an image recognition apparatus, including:

the acquisition module is used for acquiring the face image to be identified;

the processing module is used for calling a recognition network of the image recognition model to perform feature extraction on the face image to be recognized so as to obtain a first feature; invoking N complementary networks in the image recognition model to respectively extract the features of the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1; invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating that the face image to be recognized is classified into probability values of each age label in the N age labels; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and calling a classification network in the image recognition model to classify the fusion characteristics so as to obtain a recognition result of the face image to be recognized.

In one possible design, in another implementation of another aspect of the embodiments of the present application, an obtaining module is configured to obtain first training sample data and an initial recognition network, where the first training sample data includes face images of multiple ages;

The processing module is used for extracting the characteristics of the first training sample data to obtain first characteristics of the first training sample data; performing classification prediction based on the first feature to obtain a first prediction classification label of the first training sample data; performing loss calculation based on the first prediction classification label and the sample label of the first training sample data to obtain a first loss value;

and the training module is used for training the initial identification network based on the first loss value to obtain the identification network.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is configured to perform feature extraction on the first training sample data to obtain a three-dimensional feature of the first training sample data;

and performing feature mapping on the three-dimensional features to obtain first features of the first training sample data, wherein the first features are one-dimensional features.

In another implementation of another aspect of the embodiments of the present application, the processing module is configured to perform matrix calculation based on the first feature and a first class center matrix to obtain a first prediction classification label of the first training sample data, where the first class center matrix is used to indicate a class center of each identity class in the first training sample data.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, an obtaining module is configured to obtain N second training sample data and N initial complementary networks, where each second training sample data corresponds to the initial complementary network one to one, and samples in the N second training sample data have different age labels;

the processing module is used for calling the identification network to perform feature extraction on the N second training sample data so as to obtain N third features, wherein the identification network is a network with determined network parameters;

invoking the N initial complementary networks to respectively perform one-to-one prediction processing on the N third features so as to obtain N second prediction classification labels;

performing one-to-one loss calculation based on the N second prediction classification labels and the sample labels of the N second training sample data to obtain N second loss values;

and the training module is used for training the N initial complementary networks based on the N second loss values to obtain the N complementary networks.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is configured to perform feature mapping on the N third features to obtain N mapping features, where feature dimensions of the N mapping features are smaller than feature dimensions of the N third features;

And performing matrix calculation in one-to-one correspondence with N second class center matrices based on the N mapping features to obtain N second prediction classification labels, wherein the N second class center matrices are respectively used for indicating class centers of each identity class in N second training sample data.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the processing module is configured to sum the N second loss values to obtain a fusion loss value;

training the N initial complementary networks based on the fusion loss value to obtain N complementary networks.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the obtaining module is configured to obtain third training sample data and an initial feature modulation network, where the third training sample data includes face images of multiple ages;

the processing module is used for calling the identification network to perform feature extraction on the third training sample data so as to obtain a fourth feature; invoking the N complementary networks to perform feature extraction on the third training sample data so as to obtain N fifth features; invoking the initial feature modulation network to perform age prediction processing on the fourth feature to obtain N predicted probability values of the third training sample data, wherein the N predicted probability values are used for indicating the probability value of the third training sample data classified into each age label in the N age labels; performing feature fusion according to the N fifth features, the N predicted probability values and the fourth features to obtain predicted fusion features of the third training sample data; performing matrix calculation on the prediction fusion characteristics and a third class center matrix to obtain a third prediction classification label of the third training sample data, wherein the third class center matrix is a class center corresponding to each identity class in the third training sample data; performing loss calculation based on the third prediction classification label and the sample label of the third training sample data to obtain a third loss value;

And the training module is used for training the initial characteristic modulation network based on the third loss value to obtain the characteristic modulation network.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is configured to perform a one-to-one product process on the N prediction probability values and the N fifth features according to the age label, so as to obtain N prediction weight features;

summing the N prediction weight features to obtain prediction integration features corresponding to the N complementary networks;

the predicted integration feature is summed with the fourth feature to obtain the predicted fusion feature.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is configured to invoke the recognition network to perform feature extraction on the third training sample data to obtain a three-dimensional feature of the third training sample data;

and calling the recognition network to perform feature mapping on the three-dimensional features of the third training sample data to obtain the fourth feature, wherein the fourth feature is one-dimensional feature.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is configured to perform a one-to-one product process on the N output probability values and the N second features according to the age label, so as to obtain N weight features;

Summing the N weight characteristics to obtain integration characteristics corresponding to the N complementary networks;

the integrated feature is summed with the first feature to obtain the fused feature.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the network structure of the identification network, the N complementary networks, and the signature modulation network is created based on a convolutional neural network.

Another aspect of the present application provides a computer device comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, and the processor is used for executing the method according to the aspects according to the instructions in the program code;

the bus system is used to connect the memory and the processor to communicate the memory and the processor.

Another aspect of the present application provides a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the methods of the above aspects.

In another aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the above aspects.

From the above technical solutions, the embodiments of the present application have the following advantages: a plurality of complementary networks for extracting features aiming at the features of different ages are added in the feature extraction stage of face recognition, so that the face information of the different ages can be extracted in a targeted manner; meanwhile, a feature modulation network is added, and face information of different age groups is modulated into features extracted by the recognition network terminal, so that feature comparison among the age groups is realized, and further, the accuracy of the face recognition of the age groups is improved.

Drawings

Fig. 1 is a schematic architecture diagram of an application scenario of an image recognition method in an embodiment of the present application;

FIG. 2 is a schematic diagram of an application flow of an image recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a network architecture of an image recognition model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a training process for identifying a network in an embodiment of the present application;

FIG. 5 is a schematic diagram of a training process of N complementary networks in an embodiment of the present application;

fig. 6 is a schematic diagram of a training process of the feature modulation network in the embodiment of the present application;

FIG. 7 is a schematic diagram of an embodiment of an image recognition method according to an embodiment of the present application;

FIG. 8a is a schematic diagram of an embodiment of an image recognition device according to an embodiment of the present application;

FIG. 8b is a schematic diagram of another embodiment of an image recognition device according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another embodiment of an image recognition device according to an embodiment of the present application;

fig. 10 is a schematic diagram of another embodiment of an image recognition apparatus according to an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

With the popularization of face recognition application scenarios, the face recognition model is required to be robust to each specific scenario. In a scene containing cross-age picture comparison (such as cross-age person searching and identity comparison peer-to-peer), pictures of the same identity in different age segments are required to be compared. At present, most of training pictures adopted in the training process of the face recognition model are picture data of the same age group or adjacent age groups, and the face pictures of different age groups are distributed inconsistently, so that extracted features of the pictures of different age groups cannot be aligned in a high-dimensional feature space, and the face recognition model obtained through training cannot meet the recognition requirements of the cross-age faces. Therefore, a scheme for realizing the cross-age facial image recognition is needed.

In order to solve the technical problems, the application provides the following technical scheme: acquiring a face image to be identified; invoking an identification network of an image identification model to extract the characteristics of the face image to be identified so as to obtain a first characteristic; invoking N complementary networks in the image recognition model to respectively extract the features of the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1; invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating that the face image to be recognized is classified into probability values of each age label in the N age labels; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and calling a classification network in the image recognition model to classify the fusion characteristics so as to obtain a recognition result of the face image to be recognized. In this way, a plurality of complementary networks for extracting features aiming at the features of different ages are added in the feature extraction stage of face recognition, so that the face information of different ages can be extracted in a targeted manner; meanwhile, a feature modulation network is added, and face information of different age groups is modulated into features extracted by the recognition network terminal, so that feature comparison among the age groups is realized, and further, the accuracy of the face recognition of the age groups is improved.

The image recognition method of the various alternative embodiments of the present application may be implemented based on artificial intelligence techniques. Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, following and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. The large model technology brings important innovation for the development of computer vision technology, and a pre-trained model in the vision fields of swin-transformer, viT, V-MOE, MAE and the like can be rapidly and widely applied to downstream specific tasks through fine tuning. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The application also relates to cloud technology. The cloud technology (cloud technology) refers to a hosting technology that unifies system resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of internet behaviors, each object possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing. The cloud technology related in the application mainly refers to transmission of face images to be identified and the like possibly performed by 'cloud' between terminal equipment or servers.

For ease of understanding, some of the terms in this application are described below.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medicine, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

Neural network: the artificial neural network (Artificial Neural Networks, ANN) is formed by connecting a plurality of neurons with adjustable connection weights, and has the characteristics of large-scale parallel processing, distributed information storage, good self-organizing self-learning capacity and the like.

The convolution layers (Convolutional layer, conv) are layered structures formed by a plurality of convolution units in the convolution neural network layers, the convolution neural network (Convolutional Neural Network, CNN) is a feedforward neural network, and the convolution neural network comprises at least two neural network layers, wherein each neural network layer comprises a plurality of neurons, the neurons are arranged in layers, the neurons of the same layer are not connected with each other, and the transmission of interlayer information is only carried out along one direction.

The full connection layer (Fully Connected layer, FC) means that each node in the layered structure is connected with all nodes of the upper layer, and can be used for comprehensively processing the characteristics extracted by the neural network layer of the upper layer, and plays a role of a classifier in the neural network model.

Back propagation: forward propagation refers to the feed-forward processing of the model, and backward propagation is opposite to forward propagation, and refers to updating weight parameters of each layer of the model according to the result output by the model. For example, where the model includes an input layer, a hidden layer, and an output layer, forward propagation refers to processing in the order input layer-hidden layer-output layer, and backward propagation refers to updating the weight parameters of the layers in sequence output layer-hidden layer-input layer.

The image recognition method, device, equipment and storage medium can improve the recognition accuracy of the cross-age face image. An exemplary application of the electronic device provided by the embodiment of the present application is described below, where the electronic device provided by the embodiment of the present application may be implemented as various types of user terminals, and may also be implemented as a server.

By running the image recognition method provided by the embodiment of the application, the electronic equipment can improve the recognition accuracy of the cross-age face image. Namely, the recognition accuracy of the electronic equipment to the cross-age face image is improved.

The scheme can be applied to various computer vision fields, including face recognition. When the data processing method provided by the embodiment of the application is used for helping the user to carry out face recognition, the method can be realized to be an independent online application program which is installed in computer equipment or a background server used by the user, and the user can conveniently use the program to carry out face recognition.

Under the scene, a user inputs a face image on an application program interface or the equipment directly collects the face image through a camera, the computer equipment inputs the face image into the image recognition model to obtain an image recognition result, and the result is returned to the corresponding application program interface to prompt the user whether the recognition is successful or not.

In an exemplary embodiment, the image recognition scheme may be applied to an access control system, a company card punching system or other systems requiring face recognition, and is not limited herein.

Of course, besides being applied to the above-mentioned scenes, the method provided in the embodiment of the present application may also be applied to other scenes that need image recognition, and the embodiment of the present application is not limited to a specific application scene.

Referring to fig. 1, fig. 1 is a schematic diagram of an alternative architecture in an application scenario of the image recognition scheme provided in the embodiment of the present application, in order to support a data processing scheme, a terminal device 100 is connected to a server 300 through a network 200, the server 300 is connected to a database 400, and the network 200 may be a wide area network or a local area network, or a combination of the two. The client for implementing the image recognition scheme is disposed on the terminal device 100, where the client may run on the terminal device 100 in a browser mode, may also run on the terminal device 100 in a form of a stand-alone Application (APP), and the specific presentation form of the client is not limited herein. The server 300 according to the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal device 100 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, a vehicle-mounted device, a wearable device, a smart voice interaction device, a smart home appliance, an aircraft, and the like. The terminal device 100 and the server 300 may be directly or indirectly connected through the network 200 by wired or wireless communication, which is not limited herein. The number of servers 300 and terminal devices 100 is also not limited. The solution provided in the present application may be independently completed by the terminal device 100, or may be independently completed by the server 300, or may be completed by the cooperation of the terminal device 100 and the server 300, which is not specifically limited in this application. The database 400 may be considered as an electronic file cabinet, i.e. a place where electronic files are stored, and a user may perform operations such as adding, querying, updating, deleting, etc. on data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application. The database management system (Database Management System, DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup, and the like. The database management system may classify according to the database model it supports, e.g., relational, extensible markup language (Extensible Markup Language, XML); or by the type of computer supported, e.g., server cluster, mobile phone; or by classification according to the query language used, e.g. structured query language (Structured Query Language, SQL), XQuery; or by performance impact emphasis, such as maximum scale, maximum speed of operation; or other classification schemes. Regardless of the manner of classification used, some DBMSs are able to support multiple query languages across categories, for example, simultaneously. In this application, the database 400 may be used to store training sample data, face images to be recognized, and collected face images, and of course, the storage locations of the training sample data, the face images to be recognized, and the collected face images are not limited to databases, and may be stored in a distributed file system of the terminal device 100, the blockchain, or the server 300, for example.

In some embodiments, both the server 300 and the terminal device 100 may execute the image recognition method and the training method of the image recognition model in the image recognition method provided in the embodiments of the present application.

Based on the above description, as shown in fig. 2, a specific flow of the image recognition method in the present application may be as follows:

it mainly comprises two parts, the first part is the training process of the image recognition model. It should be appreciated that the network structure of the image recognition model may be as shown in fig. 3:

an identification network, N complementary networks, a signature modulation network, and a classification network. The recognition network is used for extracting basic features of the face image to be recognized; the N complementary networks are used for carrying out feature mapping on basic features of face images to be recognized according to different age features to obtain a plurality of age feature information; the feature modulation network is used for carrying out feature fusion on the basic features and the age feature information to obtain the fusion features of the final face image to be identified; the classification network is used for classifying according to the fusion characteristics so as to obtain the recognition result of the face image to be recognized.

The second part is the in-line deployment process of the image recognition model. In the process, the recognition network, N complementary networks, the characteristic modulation network and the classification network which are obtained by the first part of training are combined and deployed, so that the online application of the image recognition model is realized.

In this embodiment, in the training process of the image recognition model, the recognition network, the N complementary networks, and the training process of the feature modulation network may be trained separately in sequence, that is, after the recognition network is trained first, the recognition network is used to train the N complementary networks, and N trained complementary networks are obtained; and finally, retraining the characteristic modulation network by using the trained identification network and N complementary networks. Therefore, the time consumption of network calculation can be effectively reduced in training, and the training speed of the image recognition model is increased.

The training process of the recognition network, the N complementary networks and the feature modulation network in the image recognition model in the present application is correspondingly described below.

Training of the first part, identification network (wherein the network structure of the grey part participates in the network parameter update during this training)

In this application, the training process diagram of the recognition network may be as shown in fig. 4:

1. training data preparation. In this embodiment, before training is started, first training sample data is obtained, where the first training sample data includes face image data of multiple ages (for example, face image data of at least two ages or face image data of a full age), and at this time, a sample tag of the first training sample data may be an identity tag (i.e., used to indicate a user identity corresponding to the face image. The first training sample data may be provided by a third party data authority or may be obtained by historical accumulation, which is not limited herein. It should be understood that in the training process of the identification network, training sample data is sequentially sent into the identification network according to the set batch for processing, so that iterative training of the identification network is realized.

2. Identifying a network: for feature extraction. When the recognition network performs feature extraction, the extraction of the spatial features of the face image can be understood, and the output features retain the spatial structure information of the face image. In an exemplary embodiment, the feature output after the feature extraction by the recognition network may be a three-dimensional feature, for example, the feature output may be a feature vector in dimension k×k×d.

In this application, the identification network may generally have a structure that is a convolutional neural network, and mainly includes operations such as convolutional (convolution) calculation, nonlinear activation function (Relu) calculation, pooling (Pooling) calculation, and the like.

3. And a feature mapping module: the method is used for carrying out feature dimension reduction mapping on the three-dimensional features to obtain the one-dimensional features. After the three-dimensional feature of the face image is obtained, the three-dimensional feature can be mapped into a one-dimensional feature by a feature dimension reduction mapping mode. For example, the output eigenvectors in k x d dimensions are mapped into one eigenvector in d dimensions. Thus, the subsequent calculation complexity can be effectively reduced, and the calculation time consumption is reduced. It should be understood that this step is an optional operation in this application.

In the application, the network structure of the network module for performing feature dimension reduction mapping can realize nonlinear activation, full connection, pooling and other operations.

4. And a classification module: and the method is used for carrying out prediction processing on the training sample data based on the one-dimensional characteristics to obtain a prediction classification label. In one exemplary embodiment, the image recognition device may perform predictive classification on the training sample data by means of class center calculation. The image recognition device obtains a class center matrix corresponding to the first training sample data, wherein the class center matrix is used for indicating a class center corresponding to each identity tag in the first training sample data, the class center can be expressed as (d×m), d is a feature dimension, and m is the number of the identity tags in the first training sample data. And then the image recognition device carries out matrix operation on the one-dimensional features acquired in the step 3 and the class center matrix to obtain the probability value of each face picture belonging to each identity label in the training sample data. And finally, the image recognition device determines a predicted identity label corresponding to each face image in the training sample data according to the probability value.

5. A loss function calculation module: for calculating a loss value from the predicted identity tag and the sample tag. In this step, the image recognition device may input the predicted identity tag and the sample tag as target loss functions and then calculate a loss value. In the present application, the objective loss function may be a classification function (such as softmax, various types of softmax plus margin), or may be another type of loss function, which is not limited herein.

6. Loss function training optimization module: for back-propagation training the identification network based on the loss value. In this application, the image recognition device may perform the back propagation training optimization on the recognition network based on a gradient descent manner (e.g., random gradient descent with a measure, adam, adagard). The operations of 1 to 6 need to be repeated in the training process until the training result meets the training termination condition. The training termination condition is generally set such that the number of iterations satisfies a set value or the loss value is less than the set value to complete training of the model. At this point, a trained identification network may be output.

It should be understood that the network modules performing the calculations in 3 to 6 are only used for this training process, but are not outputted.

Training of the second, N complementary networks (wherein the network structure of the grey part participates in the network parameter update during this training)

In this application, the training process diagram of the N complementary networks may be as shown in fig. 5:

1. training data preparation. In this embodiment, a plurality of second training sample data are obtained before training starts, wherein the sample label of each second training sample data has an age label in addition to an identity label (for example, the identity label corresponding to the face image 1 is user a, the age label is teenager, the identity label corresponding to the face image 2 is user a, the age label is middle-aged, the identity label corresponding to the face image 3 is user B, and the age label is teenager). The second plurality of training sample data may be provided by a third party data authority or may be obtained by historical accumulation, which is not limited herein.

In one exemplary scenario, the age tags may be divided into 4 groups, namely, four age tags for infants, teenagers, middle-aged and elderly people. For example, in this embodiment, 4 pieces of second training sample data are obtained, and the age tag in the first piece of second training sample data is an infant; the age tag in the second training sample data is teenager; age tags in the third second training sample data are middle-aged; age tags in the fourth second training sample data are elderly. It should be understood that in the training process of the N complementary networks, each complementary network corresponds to one training sample data, then the read sample data is sequentially input into the trained identification network in the first part according to the set batch to perform feature extraction, and then the training is performed through the complementary networks, so that iterative training of the identification network is realized. In the iterative process, the training sample data leaf of each complementary network is read according to the set batch, and the number of pictures read each time is kept consistent (i.e. the batch of each complementary network is the same). Since the number of sample data in each training sample data is not completely uniform, the number of iterative repetitions of the sample data in the training sample data is not uniform. For example, the first and second training sample data are provided with age tags of 300 for infants; age tags in the second training sample data are teenagers, and the number of the age tags is 600; the age label in the third second training sample data is middle-aged, and the number of the age labels is 500; the age label in the fourth second training sample data is old, the number of the age labels is 400, the number of each batch is 50, in the case that the number of iterative training is 10, the iteration of partial data in the first second training sample data may be repeated at least twice, and the iteration of partial data in the second training sample data may be repeated 1 time.

2. Identifying a network: for feature extraction. When the recognition network performs feature extraction, the extraction of the spatial features of the face image can be understood, and the output features retain the spatial structure information of the face image. In an exemplary embodiment, the feature output after the feature extraction by the recognition network may be a three-dimensional feature, for example, the feature output may be a feature vector in dimension k×k×d. In this step, the parameters of the identification network are inherited from the parameters of the network that were trained in the first part, and in this step the parameters of the identification network are not updated during the training.

3. Feature mapping module of each complementary network: for mapping the three-dimensional features of the recognition network output into a d-dimensional feature vector. In the application, the feature mapping module of each complementary network has a network structure which can realize nonlinear activation, full connection, pooling and other operations. In this step, each complementary network has an independent feature mapping module for performing feature mapping on the three-dimensional features output by the identification network. Meanwhile, in the step, each feature mapping module only processes the picture features of the corresponding complementary network, namely, the picture features correspond to each independent training sample data prepared by training data.

4. Classification module of each complementary network: and the method is used for carrying out prediction processing on the training sample data based on the one-dimensional characteristics to obtain a prediction classification label. In one exemplary embodiment, the image recognition device may perform predictive classification on the training sample data by means of class center calculation. The image recognition device obtains a class center matrix corresponding to one of the N second training sample data, where the class center matrix is used to indicate a class center corresponding to each identity tag in the first training sample data, and may be represented as (d×m), where d is a feature dimension, and m is the number of identity tags in one of the N second training sample data. And then the image recognition device carries out matrix operation on the one-dimensional features acquired in the step 3 and the class center matrix to obtain the probability value of each face picture belonging to each identity label in the training sample data. And finally, the image recognition device determines a predicted identity label corresponding to each face image in the training sample data according to the probability value.

5. Loss function calculation module of each complementary network: for calculating a loss value from the predicted identity tag and the sample tag. The module is consistent with the function of the loss function calculation module in the first part, and the characteristics of each complementary network are independently calculated to obtain the loss value of each complementary network.

6. Loss value fusion module: for integrating the losses of the individual complementary network independent calculations. In one exemplary embodiment, the integration of the loss value fusion module may be to sum the losses calculated independently by the respective complementary networks to obtain an integrated loss value.

7. Loss function training optimization module: for back-propagation training the identification network based on the loss value. In this application, the image recognition device may perform the back propagation training optimization on the recognition network based on a gradient descent manner (e.g., random gradient descent with a measure, adam, adagard). The operations of 1 to 6 need to be repeated in the training process until the training result meets the training termination condition. The training termination condition is generally set such that the number of iterations satisfies a set value or the loss value is less than the set value to complete training of the model. At this time, the trained N complementary networks may be output.

It should be understood that in the flowchart shown in fig. 5, the gray module performs parameter updating during the training process, and the white module does not perform parameter updating during the training process.

Training of a third part, the feature modulation network (wherein the network structure of the grey part participates in the network parameter update during this training)

In this application, the training process diagram of the feature modulation network may be as shown in fig. 6:

1. training data preparation. In this embodiment, before training is started, third training sample data is obtained, where the third training sample data includes face image data of multiple ages, and a sample tag of the third training sample data may be an identity tag (i.e. used to indicate a user identity corresponding to the face image, for example, a sample tag corresponding to the face image 1 is a user a, and a sample corresponding to the face image 2 is a user B). The third training sample data may be provided by a third party data authority or may be obtained by historical accumulation, which is not limited herein.

In this application, the identification network may generally have a structure that is a convolutional neural network, and mainly includes operations such as convolutional (convolution) calculation, nonlinear activation function (Relu) calculation, pooling (Pooling) calculation, and the like. In this step, the parameters of the identification network are inherited from the parameters of the network that were trained in the first part, and in this step the parameters of the identification network are not updated during the training.

3. The feature mapping module of each complementary network and the feature mapping module output in the first portion: it should be understood that in the training process of the feature modulation network, training sample data is sequentially sent into the identification network according to the set batch for processing, so that iterative training of the identification network is realized. It should be understood that in the training of this step, all training sample data passes through the feature mapping module of each complementary network and the feature mapping module output in the first section. In this step, the parameters of the respective complementary network are inherited from the parameters of the network trained in the second part, and are not updated in the training of this step.

4. Characteristic modulation network: and the output probability value is used for carrying out probability prediction of the age labels on the characteristics output by the characteristic mapping module output in the first part to obtain the classified output probability value of the age labels corresponding to the complementary networks, and the output size of the output probability value is 1 XN. N is the number of complementary networks. The network structure of the characteristic modulation network in the step comprises a plurality of full-connection layers, a nonlinear activation layer and a convolution layer. In one exemplary scenario, the output of the feature modulation network may be a structure that is activated with softmax.

5. A first feature fusion module: for fusion processing of the features output by the respective complementary networks. The fusion probability (i.e., weight) of each respective complementary network is the output (i.e., the N output probability values) of the feature modulated network element.

6. And a second feature fusion module: and 5, fusing the integrated features output in the step with the features output by the identification network to obtain the fused features, and taking the fused features as the input of the classification network of the image identification model.

7. And a classification module: and the method is used for carrying out prediction processing on the training sample data based on the one-dimensional characteristics to obtain a prediction classification label. In one exemplary embodiment, the image recognition device may perform predictive classification on the training sample data by means of class center calculation. The image recognition device obtains a class center matrix corresponding to the first training sample data, wherein the class center matrix is used for indicating a class center corresponding to each identity tag in the first training sample data, the class center can be expressed as (d×m), d is a feature dimension, and m is the number of the identity tags in the first training sample data. And then the image recognition device carries out matrix operation on the one-dimensional features acquired in the step 3 and the class center matrix to obtain the probability value of each face picture belonging to each identity label in the training sample data. And finally, the image recognition device determines a predicted identity label corresponding to each face image in the training sample data according to the probability value.

8. A loss function calculation module: for calculating a loss value from the predicted identity tag and the sample tag. In this step, the image recognition device may input the predicted identity tag and the sample tag as target loss functions and then calculate a loss value. In the present application, the objective loss function may be a classification function (such as softmax, various types of softmax plus margin), or may be another type of loss function, which is not limited herein.

9. Loss function training optimization module: for back-propagation training the identification network based on the loss value. In this application, the image recognition device may perform the back propagation training optimization on the feature modulation network based on a gradient descent manner (e.g., random gradient descent with a measure, adam, adagard). The operations of 1 to 9 need to be repeated in the training process until the training result meets the training termination condition. The training termination condition is generally set such that the number of iterations satisfies a set value or the loss value is less than the set value to complete training of the model. At this point, a trained identification network may be output.

It should be understood that in the flowchart shown in fig. 6, the gray module performs parameter updating during the training process, and the white module does not perform parameter updating during the training process.

It will be appreciated that in the specific embodiments of the present application, related data such as facial images and training sample data are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with relevant laws and regulations and standards of relevant countries and regions.

With reference to the foregoing description, an image recognition method in the present application will be described below, referring to fig. 7, and one embodiment of the image recognition method in the embodiment of the present application includes:

701. and acquiring a face image to be identified.

In this embodiment, the image recognition device receives a face image to be recognized sent by a third party or directly collects the face image to be recognized through a camera. The specific embodiments are not limited herein.

702. And calling a recognition network of the image recognition model to perform feature extraction on the face image to be recognized so as to obtain a first feature.

In this embodiment, the image recognition device uses the recognition network trained by the training process shown in fig. 4 to perform feature extraction on the face image to be recognized, so as to obtain the first feature. At this time, the first feature may be understood as a basic feature of the face image to be recognized.

703. And respectively carrying out feature extraction on the face image to be identified by calling N complementary networks in the image identification model to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1.

In this embodiment, the image recognition device uses the N complementary networks trained by the training process shown in fig. 5 to perform feature extraction on the age-based labels of the face image to be recognized, so as to obtain the N second features. At this time, the N second features may be understood as the aging features of the face image to be recognized.

704. And invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature so as to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating the classification of the face image to be recognized as the probability value of each age label in the N age labels.

In this embodiment, the image recognition device uses the feature modulation network trained by the training process shown in fig. 6 to predict the age of the face image to be recognized, so as to obtain N output probability values corresponding to the face image to be recognized.

705. And carrying out feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized.

The image recognition device carries out one-to-one corresponding product processing on the N output probability values and the N second features according to the age label so as to obtain N weight features; summing the N weight characteristics to obtain integration characteristics corresponding to the N complementary networks; the integrated feature is summed with the first feature to obtain the fused feature. For example, the image recognition device determines that the probability values of the age tags corresponding to the image a classified into 4 complementary networks are respectively 0.1, 0.2, 0.6 and 0.1, and the four probability values are used as weight values; at the same time, the characteristics of the 4 complementary network outputs are characteristic 1, characteristic 2, characteristic 3 and characteristic 4, then the integrated characteristic 0.1+0.2+0.6+0.1+0.1 is obtained, and the final fusion characteristic 0.1+0.2+0.2+0.6+0.1+4 is obtained, wherein the characteristic a is used as the characteristic of the identification network output.

706. And calling a classification network in the image recognition model to classify the fusion characteristics so as to obtain a recognition result of the face image to be recognized.

In this embodiment, the image recognition device performs classification processing on the fusion features by using the final classification network obtained by training in the training process shown in fig. 6, so as to obtain a recognition result corresponding to the face image to be recognized.

Referring to fig. 8a, fig. 8a is a schematic diagram illustrating an embodiment of an image recognition apparatus according to an embodiment of the present application, the image recognition apparatus 20 includes:

an acquisition module 201, configured to acquire a face image to be identified;

the processing module 202 is configured to invoke a recognition network of an image recognition model to perform feature extraction on the face image to be recognized, so as to obtain a first feature; invoking N complementary networks in the image recognition model to respectively extract the features of the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1; invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating that the face image to be recognized is classified into probability values of each age label in the N age labels; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and calling a classification network in the image recognition model to classify the fusion characteristics so as to obtain a recognition result of the face image to be recognized.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, a plurality of complementary networks for extracting the characteristics aiming at the characteristics of different ages are added in the characteristic extraction stage of face recognition, so that the face information of different ages can be extracted in a targeted manner; meanwhile, a feature modulation network is added, and face information of different age groups is modulated into features extracted by the recognition network terminal, so that feature comparison among the age groups is realized, and further, the accuracy of the face recognition of the age groups is improved.

Alternatively, as shown in fig. 8b, based on the embodiment corresponding to fig. 8a, in another embodiment of the image recognition device 20 provided in the embodiment of the present application,

an obtaining module 201, configured to obtain first training sample data and an initial recognition network, where the first training sample data includes face images of multiple ages;

a processing module 202, configured to perform feature extraction on the first training sample data to obtain a first feature of the first training sample data; performing classification prediction based on the first feature to obtain a first prediction classification label of the first training sample data; performing loss calculation based on the first prediction classification label and the sample label of the first training sample data to obtain a first loss value;

The training module 203 is configured to train the initial identification network to obtain the identification network based on the first loss value.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, basic face information in the face image is reserved, so that a large amount of data calculation of the whole image model is avoided, and the calculation time consumption of the image model is not increased under the condition that the accuracy of the cross-age face recognition is improved.

Optionally, based on the embodiment corresponding to fig. 8b, in another embodiment of the image recognition device 20 provided in the embodiment of the present application, the processing module 202 is configured to perform feature extraction on the first training sample data to obtain three-dimensional features of the first training sample data;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, after the three-dimensional features are extracted by the recognition network, the three-dimensional features are subjected to dimension reduction mapping, so that the calculated amount can be effectively reduced, and the calculation time consumption is reduced.

Alternatively, in another embodiment of the image recognition device 20 provided in the embodiment of the present application based on the embodiment corresponding to fig. 8b described above,

the processing module 202 is configured to perform matrix calculation based on the first feature and a first class center matrix to obtain a first prediction classification label of the first training sample data, where the first class center matrix is used to indicate a class center of each identity class in the first training sample data.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, probability calculation is carried out by utilizing the stored class center matrix and the characteristics, so that a predicted probability value can be determined through class center calculation, and image recognition is realized.

an obtaining module 201, configured to obtain N second training sample data and N initial complementary networks, where each second training sample data corresponds to an initial complementary network one by one, and samples in the N second training sample data have different age labels;

the processing module 202 is configured to invoke the identification network to perform feature extraction on the N second training sample data to obtain N third features, where the identification network is a network with determined network parameters;

the training module 203 is configured to train the N initial complementary networks based on the N second loss values to obtain the N complementary networks.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, a plurality of complementary networks for extracting the characteristics of the images of different ages are added, so that the face information of the images of different ages can be extracted in a targeted manner, the characteristics of the images in terms of ages can be effectively obtained, and the accuracy of identifying the faces of people of different ages is improved. Meanwhile, the complementary network delays the characteristic information of the identification network, so that the calculation time consumption is reduced, and the training time consumption is also reduced.

Optionally, in another embodiment of the image recognition device 20 provided in the embodiment of the present application, based on the embodiment corresponding to fig. 8b, the processing module 202 is configured to perform feature mapping on the N third features to obtain N mapping features, where feature dimensions of the N mapping features are smaller than feature dimensions of the N third features;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, the characteristics output by the trained identification network are used, and then the characteristic mapping and training of the complementary network are respectively and independently carried out, so that the face information of different age groups can be extracted in a targeted manner, the characteristics of the image in terms of age can be effectively obtained, and the accuracy of identifying the cross-age faces is improved. Meanwhile, the complementary network delays the characteristic information of the identification network, so that the calculation time consumption is reduced, and the training time consumption is also reduced.

Optionally, based on the embodiment corresponding to fig. 8b, in another embodiment of the image recognition apparatus 20 provided in the embodiment of the present application, the processing module 202 is configured to sum the N second loss values to obtain a fusion loss value;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, the loss values of the multiple complementary networks are fused, and then the overall network parameters are updated according to the fused loss values, so that the training process of the multiple complementary networks can be quickened, and the training time consumption is reduced.

an obtaining module 201, configured to obtain third training sample data and an initial feature modulation network, where the third training sample data includes face images of multiple ages;

a processing module 202, configured to invoke the recognition network to perform feature extraction on the third training sample data to obtain a fourth feature; invoking the N complementary networks to perform feature extraction on the third training sample data so as to obtain N fifth features; invoking the initial feature modulation network to perform age prediction processing on the fourth feature to obtain N predicted probability values of the third training sample data, wherein the N predicted probability values are used for indicating the probability value of the third training sample data classified into each age label in the N age labels; performing feature fusion according to the N fifth features, the N predicted probability values and the fourth features to obtain predicted fusion features of the third training sample data; performing matrix calculation on the prediction fusion characteristics and a third class center matrix to obtain a third prediction classification label of the third training sample data, wherein the third class center matrix is a class center corresponding to each identity class in the third training sample data; performing loss calculation based on the third prediction classification label and the sample label of the third training sample data to obtain a third loss value;

The training module 203 is configured to train the initial feature modulation network based on the third loss value to obtain the feature modulation network.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, in addition, the feature modulation network is added, and the face information of different age groups is modulated into the features extracted by the recognition network terminal, so that feature comparison among the age groups is realized, and the accuracy of the face recognition of the age groups is improved.

the processing module 202 is configured to perform one-to-one product processing on the N predicted probability values and the N fifth features according to the age label, so as to obtain N predicted weight features;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, the face information of different age groups is modulated into the characteristics extracted by the recognition network terminal, so that the characteristic comparison among the age groups is realized, and the accuracy of the age-group face recognition is improved.

the processing module 202 is configured to invoke the recognition network to perform feature extraction on the third training sample data, so as to obtain three-dimensional features of the third training sample data;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, the characteristic extraction of the trained recognition network to the training sample data is prolonged, so that the training time consumption of the characteristic modulation network can be effectively reduced. Meanwhile, the recognition network maps the three-dimensional features to obtain one-dimensional features, so that the calculation time consumption can be effectively reduced.

the processing module 202 is configured to perform one-to-one product processing on the N output probability values and the N second features according to the age label, so as to obtain N weight features;

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, a plurality of complementary networks for extracting the characteristics aiming at the characteristics of different ages are added, so that the face information of different ages can be extracted in a targeted way; meanwhile, a feature modulation network is added, and face information of different age groups is modulated into features extracted by the recognition network terminal, so that feature comparison among the age groups is realized, and further, the accuracy of the face recognition of the age groups is improved.

the network structure of the identification network, the N complementary networks and the signature modulation network is created based on a convolutional neural network.

In an embodiment of the present application, an image recognition apparatus is provided. By adopting the device, each network adopts the convolutional neural network as a network structure, and the effective extraction of the image characteristics can be realized.

Referring to fig. 9, fig. 9 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing application programs 342 or data 344. Wherein the memory 332 and the storage medium 330 may be transitory or persistent. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the server 300.

The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 9.

The image recognition device provided in the present application may be used in a terminal device, please refer to fig. 10, only the portion related to the embodiment of the present application is shown for convenience of explanation, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. In the embodiment of the present application, a terminal device is taken as a smart phone as an example to describe:

fig. 10 is a block diagram illustrating a part of a structure of a smart phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 10, the smart phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (wireless fidelity, wiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the smartphone structure shown in fig. 10 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 10:

The RF circuit 410 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, in particular, after receiving downlink information of the base station, the downlink information is processed by the processor 480; in addition, the data of the design uplink is sent to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (global system of mobile communication, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), long term evolution (long term evolution, LTE), email, short message service (short messaging service, SMS), and the like.

The memory 420 may be used to store software programs and modules, and the processor 480 may perform various functional applications and data processing of the smartphone by executing the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the smart phone, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smart phone. In particular, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 431 or thereabout using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 480, and can receive commands from the processor 480 and execute them. In addition, the touch panel 431 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 440 may be used to display information input by a user or information provided to the user and various menus of the smart phone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 may cover the display panel 441, and when the touch panel 431 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 10, the touch panel 431 and the display panel 441 are two separate components to implement the input and input functions of the smart phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 450, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 441 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for identifying the application of the gesture of the smart phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the smart phone are not described in detail herein.

Audio circuitry 460, speaker 461, microphone 462 can provide an audio interface between the user and the smartphone. The audio circuit 460 may transmit the received electrical signal after the audio data conversion to the speaker 461, and the electrical signal is converted into a sound signal by the speaker 461 and output; on the other hand, microphone 462 converts the collected sound signals into electrical signals, which are received by audio circuit 460 and converted into audio data, which are processed by audio data output processor 480, and transmitted via RF circuit 410 to, for example, another smart phone, or which are output to memory 420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a smart phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 470, so that wireless broadband Internet access is provided for the user. Although fig. 10 shows a WiFi module 470, it is understood that it does not belong to the essential constitution of a smart phone, and can be omitted entirely as required within the scope of not changing the essence of the invention.

The processor 480 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, and performs various functions and processes data of the smart phone by running or executing software programs and/or modules stored in the memory 420 and invoking data stored in the memory 420, thereby performing overall monitoring of the smart phone. Optionally, the processor 480 may include one or more processing units; alternatively, the processor 480 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 480.

The smart phone also includes a power supply 490 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 480 through a power management system that performs functions such as managing charge, discharge, and power consumption.

Although not shown, the smart phone may further include a camera, a bluetooth module, etc., which will not be described herein.

The steps performed by the terminal device in the above-described embodiments may be based on the terminal device structure shown in fig. 10.

Also provided in embodiments of the present application is a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the methods as described in the foregoing embodiments.

Also provided in embodiments of the present application is a computer program product comprising a program which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. An image recognition method based on artificial intelligence, comprising:

acquiring a face image to be identified;

invoking an identification network of an image identification model to extract the characteristics of the face image to be identified so as to obtain a first characteristic;

invoking N complementary networks in the image recognition model to respectively perform feature extraction on the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1;

invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating the probability value that the face image to be recognized is classified into each age label in the N age labels;

performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized;

and calling a classification network in the image recognition model to classify the fusion features so as to obtain a recognition result of the face image to be recognized.

2. The method according to claim 1, wherein the method further comprises:

acquiring first training sample data and an initial recognition network, wherein the first training sample data comprises face images of multiple age groups;

extracting features of the first training sample data to obtain first features of the first training sample data;

performing classification prediction based on the first features to obtain a first prediction classification label of the first training sample data;

performing loss calculation based on the first prediction classification label and the sample label of the first training sample data to obtain a first loss value;

training the initial identification network based on the first loss value to obtain the identification network.

3. The method of claim 2, wherein the performing feature extraction on the first training sample data to obtain the first feature of the first training sample data comprises:

extracting features of the first training sample data to obtain three-dimensional features of the first training sample data;

4. The method of claim 3, wherein said performing a classification prediction based on the first feature to obtain a first prediction classification tag for the first training sample data comprises:

and performing matrix calculation based on the first features and a first class center matrix to obtain a first prediction classification label of the first training sample data, wherein the first class center matrix is used for indicating class centers of each identity class in the first training sample data.

5. The method according to claim 1, wherein the method further comprises:

acquiring N pieces of second training sample data and N pieces of initial complementary networks, wherein each piece of second training sample data corresponds to the initial complementary network one by one, and samples in the N pieces of second training sample data have different age labels;

invoking the identification network to perform feature extraction on the N second training sample data to obtain N third features, wherein the identification network is a network with determined network parameters;

training the N initial complementary networks based on the N second loss values to obtain the N complementary networks.

6. The method of claim 5, wherein invoking the N initial complementary networks to perform a one-to-one prediction process on the N third features, respectively, to obtain N second prediction classification labels comprises:

performing feature mapping on the N third features to obtain N mapping features, wherein the feature dimensions of the N mapping features are smaller than those of the N third features;

7. The method of claim 5, wherein training the N initial complementary networks based on the N second loss values to obtain the N complementary networks comprises:

Summing the N second loss values to obtain a fusion loss value;

training the N initial complementary networks based on the fusion loss values to obtain N complementary networks.

8. The method according to claim 1, wherein the method further comprises:

acquiring third training sample data and an initial feature modulation network, wherein the third training sample data comprises face images of multiple age groups;

invoking the recognition network to perform feature extraction on the third training sample data so as to obtain a fourth feature;

invoking the N complementary networks to perform feature extraction on the third training sample data so as to obtain N fifth features;

invoking the initial feature modulation network to perform age prediction processing on the fourth feature to obtain N predicted probability values of the third training sample data, wherein the N predicted probability values are used for indicating that the third training sample data is classified into probability values of each age label in the N age labels;

performing feature fusion according to the N fifth features, the N prediction probability values and the fourth features to obtain prediction fusion features of the third training sample data;

Performing matrix calculation on the prediction fusion characteristics and a third class center matrix to obtain a third prediction classification label of the third training sample data, wherein the third class center matrix is a class center corresponding to each identity class in the third training sample data;

performing loss calculation based on the third prediction classification label and the sample label of the third training sample data to obtain a third loss value;

training the initial feature modulation network based on the third loss value to obtain the feature modulation network.

9. The method of claim 8, wherein feature fusing based on the N fifth features, the N predicted probability values, and the fourth features to obtain predicted fused features of the third training sample data comprises:

carrying out one-to-one corresponding product processing on the N prediction probability values and the N fifth features according to the age labels to obtain N prediction weight features;

and summing the prediction integration feature with the fourth feature to obtain the prediction fusion feature.

10. The method of claim 8, wherein invoking the recognition network to perform feature extraction on the third training sample data to obtain a fourth feature comprises:

invoking the recognition network to perform feature extraction on the third training sample data so as to obtain three-dimensional features of the third training sample data;

and calling the recognition network to perform feature mapping on the three-dimensional features of the third training sample data to obtain the fourth features, wherein the fourth features are one-dimensional features.

11. The method according to any one of claims 1 to 10, wherein the feature fusion according to the N second features, the N output probability values, and the first features to obtain the fused features of the face image to be identified includes:

carrying out one-to-one corresponding product processing on the N output probability values and the N second features according to the age labels to obtain N weight features;

summing the N weight characteristics to obtain integrated characteristics corresponding to the N complementary networks;

12. The method according to any one of claims 1 to 10, wherein the network structure of the identification network, the N complementary networks and the signature modulation network is created based on a convolutional neural network.

13. An image recognition apparatus, comprising:

the acquisition module is used for acquiring the face image to be identified;

the processing module is used for calling a recognition network of the image recognition model to perform feature extraction on the face image to be recognized so as to obtain a first feature; invoking N complementary networks in the image recognition model to respectively perform feature extraction on the face image to be recognized to obtain N second features, wherein the N complementary networks correspond to N age tags, and N is an integer greater than 1; invoking a feature modulation network in the image recognition model to conduct age prediction processing on the first feature to obtain N output probability values corresponding to the face image to be recognized, wherein the N output probability values are used for indicating the probability value that the face image to be recognized is classified into each age label in the N age labels; performing feature fusion according to the N second features, the N output probability values and the first features to obtain fusion features of the face image to be recognized; and calling a classification network in the image recognition model to classify the fusion features so as to obtain a recognition result of the face image to be recognized.

14. A computer device, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor being for executing a program in the memory, the processor being for executing the method of any one of claims 1 to 12 according to instructions in program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

15. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 12.