CN117037232A - Face recognition method, device and storage medium - Google Patents

Face recognition method, device and storage medium Download PDF

Info

Publication number
CN117037232A
CN117037232A CN202210779688.2A CN202210779688A CN117037232A CN 117037232 A CN117037232 A CN 117037232A CN 202210779688 A CN202210779688 A CN 202210779688A CN 117037232 A CN117037232 A CN 117037232A
Authority
CN
China
Prior art keywords
face recognition
network
features
training
shallow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210779688.2A
Other languages
Chinese (zh)
Inventor
许剑清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210779688.2A priority Critical patent/CN117037232A/en
Publication of CN117037232A publication Critical patent/CN117037232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a face recognition method, a face recognition device and a storage medium. The method comprises the steps that a first characteristic is obtained through a face recognition model which inputs a face image into an integrated model, the integrated model comprises the face recognition model and a deep semantic extraction branch, and the face recognition model comprises a shallow network and a deep network; inputting shallow features of the face image into a deep semantic extraction branch to obtain second features; and then fusing to obtain fused features and comparing the features to identify. Therefore, the face recognition process of fusing depth features is realized, and the diversity of the features is improved due to the fact that independent branches are introduced into a deep network of the integrated model; and the shallow network is characterized by a sharing mechanism, so that the accuracy of face recognition is improved, and meanwhile, the efficiency of face recognition is improved.

Description

Face recognition method, device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for face recognition, and a storage medium.
Background
Along with the rapid development of internet technology, the requirements of people on information security are higher and higher, face recognition is an effective way for improving information security, and is applied to a mobile terminal face recognition system in actual scenes such as security, payment, access control and the like, so that high requirements are put forward on the operation time consumption and accuracy of a face recognition model.
Generally, a large amount of face training data can be collected to train a face recognition model in a large scale, so that the accuracy of face recognition is improved.
However, collecting a large amount of face training data requires consuming a large amount of resources, and labeling a large amount of face training data; in addition, because a large number of samples are difficult to label effectively, wrong labels are introduced, so that the training process is complicated, and the accuracy and efficiency of face recognition are affected.
Disclosure of Invention
In view of the above, the application provides a face recognition method, which can effectively improve the accuracy and efficiency of face recognition.
The first aspect of the present application provides a method for face recognition, which can be applied to a system or a program including a face recognition function in a terminal device, and specifically includes:
acquiring a face image of a target object;
inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, and the shallow network is used for extracting the shallow feature of the face image;
Inputting the shallow layer features of the face image into the deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by the shallow layer network in the training process, and the network parameters of the deep semantic extraction branch are different from those of the deep layer network;
inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features;
and performing feature comparison based on the fusion features to obtain the face-related identification information corresponding to the target object.
Optionally, in some possible implementations of the present application, the method further includes:
acquiring training data for face recognition;
inputting the training data into the integrated model;
extracting shallow features of the training data based on the shallow network, and extracting the training features of the shallow features of the training data according to the deep network so as to train the face recognition model;
training the deep semantic extraction branch based on the training data, wherein the deep semantic extraction branch adopts the shallow network to perform characteristic input in the training process, and parameters of the shallow network are fixed;
And inputting the training features and the depth features output by the deep semantic extraction branches into the fused dimension reduction model to train the fused dimension reduction model based on constraint conditions, wherein the parameters of the face recognition model and the deep semantic extraction branches are fixed in the fused dimension reduction model in the training and training process.
Optionally, in some possible implementations of the present application, the extracting, based on the shallow network, the shallow features of the training data, and extracting, according to the deep network, the shallow features of the training data to train the face recognition model includes:
extracting shallow features of the training data based on the shallow network, and extracting the training features of the training data according to the deep network;
determining label information corresponding to the training data;
matching the training features with the tag information to perform gradient computation based on a first loss function;
and comparing the parameter value in the gradient calculation process corresponding to the first loss function with a first training condition, and if the first training condition is met, finishing the training of the face recognition model.
Optionally, in some possible implementations of the present application, the training the deep semantic extraction branch based on the training data includes:
determining first sampling information for sampling the training data in the face recognition model training process;
sampling configuration is carried out different from the first sampling information so as to determine second sampling information corresponding to the deep semantic extraction branch;
sampling the training data based on the second sampling information to obtain a training sample;
inputting the training samples into the trained shallow network to obtain sample input characteristics;
inputting the sample input features into the deep semantic extraction branches to obtain sample output features;
determining label information corresponding to the sample output characteristics so as to perform gradient calculation of a second loss function on the deep semantic extraction branch;
and comparing the parameter value in the gradient calculation process corresponding to the second loss function with a second training condition, and if the second training condition is met, completing training of the deep semantic extraction branch.
Optionally, in some possible implementations of the present application, the inputting the sample input feature into the deep semantic extraction branch to obtain a sample output feature includes:
Acquiring parameter random information of a depth network in the face recognition model;
carrying out random seed configuration on parameter random information different from the deep semantic extraction branch so as to carry out parameter configuration on the parameter random information of the deep semantic extraction branch;
and extracting branches from the deep semantic meaning after the sample input feature input parameter configuration to obtain the sample output feature.
Optionally, in some possible implementations of the present application, determining tag information corresponding to the sample output feature to perform gradient calculation of a second loss function on the deep semantic extraction branch includes:
determining label information corresponding to the sample output characteristics to determine quality parameters of training samples;
the quality parameters and constraint parameters of the second loss function are configured differently from the face recognition model, so that parameter adjustment is performed on the second loss function;
and carrying out gradient calculation on the deep semantic extraction branch based on the second loss function after parameter adjustment.
Optionally, in some possible implementations of the present application, the inputting the shallow features of the face image into the deep semantic extraction branch to obtain the second feature includes:
Performing network searching based on preset precision information to determine parameter information of the deep semantic extraction branch;
carrying out parameter configuration on the deep semantic extraction branches according to the parameter information obtained by network searching;
and inputting the shallow features of the face image into the deep semantic extraction branch subjected to parameter configuration to obtain the second features.
A second aspect of the present application provides an apparatus for face recognition, including:
the acquisition unit is used for acquiring the face image of the target object;
the input unit is used for inputting the face image into a face recognition model in an integrated model to obtain a first characteristic, the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, and the shallow network is used for extracting the shallow characteristic of the face image;
the input unit is further configured to input the shallow feature into the deep semantic extraction branch to obtain a second feature, where the deep semantic extraction branch adopts a feature extracted by the shallow network in a training process, and network parameters of the deep semantic extraction branch are different from network parameters of the deep network;
The fusion unit is used for inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features;
and the identification unit is used for carrying out feature comparison based on the fusion features so as to obtain the identification information related to the face corresponding to the target object.
Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to obtain training data for face recognition;
the recognition unit is specifically configured to input the training data into the integrated model;
the recognition unit is specifically configured to extract shallow features of the training data based on the shallow network, and extract the training features of the shallow features of the training data according to the deep network, so as to train the face recognition model;
the recognition unit is specifically configured to train the deep semantic extraction branch based on the training data, where the deep semantic extraction branch uses the shallow network to perform feature input in the training process, and parameters of the shallow network are fixed;
the recognition unit is specifically configured to input the training features and the depth features output by the deep semantic extraction branches into the fused dimension reduction model, so as to train the fused dimension reduction model based on constraint conditions, where parameters of the face recognition model and the deep semantic extraction branches are fixed in the fused dimension reduction model during training and training.
Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to extract a shallow feature of the training data based on the shallow network, and extract the training feature according to the shallow feature of the training data by using the deep network;
the identification unit is specifically used for determining label information corresponding to the training data;
the identification unit is specifically configured to match the training feature with the tag information, so as to perform gradient calculation based on a first loss function;
the identification unit is specifically configured to compare a parameter value in a gradient calculation process corresponding to the first loss function with a first training condition, and if the first training condition is satisfied, complete training of the face recognition model.
Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to determine first sampling information for sampling the training data in the training process of the face recognition model;
the identification unit is specifically configured to perform sampling configuration different from the first sampling information so as to determine second sampling information corresponding to the deep semantic extraction branch;
The identification unit is specifically configured to sample the training data based on the second sampling information, so as to obtain a training sample;
the identification unit is specifically configured to input the training sample into the trained shallow network to obtain a sample input feature;
the identification unit is specifically configured to input the sample input feature into the deep semantic extraction branch to obtain a sample output feature;
the identification unit is specifically configured to determine tag information corresponding to the sample output feature, so as to perform gradient calculation of a second loss function on the deep semantic extraction branch;
the identification unit is specifically configured to compare a parameter value in a gradient calculation process corresponding to the second loss function with a second training condition, and if the second training condition is satisfied, complete training of the deep semantic extraction branch.
Optionally, in some possible implementation manners of the present application, the identifying unit is specifically configured to obtain parameter random information of a depth network in the face recognition model;
the identification unit is specifically configured to perform random seed configuration on parameter random information different from the deep semantic extraction branch, so as to perform parameter configuration on the deep semantic extraction branch;
The identification unit is specifically configured to extract branches from the deep semantic extraction after the sample input feature input parameter configuration, so as to obtain the sample output feature.
Optionally, in some possible implementations of the present application, the identifying unit is specifically configured to determine tag information corresponding to the sample output feature, so as to determine a quality parameter of the training sample;
the identification unit is specifically configured to perform configuration different from the face recognition model on the quality parameter and the constraint parameter of the second loss function, so as to perform parameter adjustment on the second loss function;
the identification unit is specifically configured to perform gradient calculation on the deep semantic extraction branch based on the second loss function after parameter adjustment.
Optionally, in some possible implementations of the present application, the input unit is specifically configured to perform network searching based on preset precision information to determine parameter information of the deep semantic extraction branch;
the input unit is specifically configured to perform parameter configuration on the deep semantic extraction branch according to the parameter information obtained by network searching;
the input unit is specifically configured to input the shallow features of the face image into the deep semantic extraction branch after parameter configuration, so as to obtain the second features.
A third aspect of the present application provides a computer apparatus comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of face recognition according to the first aspect or any one of the first aspects according to instructions in the program code.
A fourth aspect of the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of face recognition as described in the first aspect or any one of the first aspects.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, which executes the computer instructions, causing the computer device to perform the method of face recognition provided in the above-described first aspect or various alternative implementations of the first aspect.
From the above technical solutions, the embodiment of the present application has the following advantages:
Acquiring a face image of a target object; then inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, the shallow network is used for extracting the shallow features of the face image, and the deep network is used for extracting the first feature of the shallow features; inputting shallow features of the face image into a deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by a shallow network in the training process, and network parameters of the deep semantic extraction branch are different from those of a deep network; then inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features; and further, feature comparison is carried out based on the fusion features so as to obtain the face-related identification information corresponding to the target object. Therefore, the face recognition process of fusing depth features is realized, and the diversity of the features is improved due to the fact that independent branches are introduced into a deep network of the integrated model; and the shallow network is characterized by a sharing mechanism, so that the accuracy of face recognition is improved, and meanwhile, the efficiency of face recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a network architecture for system operation for face recognition;
fig. 2 is a flowchart of a face recognition architecture according to an embodiment of the present application;
fig. 3 is a flowchart of a method for face recognition according to an embodiment of the present application;
fig. 4 is a schematic view of a scenario of a face recognition method according to an embodiment of the present application;
fig. 5 is a flowchart of another method for face recognition according to an embodiment of the present application;
fig. 6 is a flowchart of another method for face recognition according to an embodiment of the present application;
fig. 7 is a schematic view of a scenario of another face recognition method according to an embodiment of the present application;
fig. 8 is a flowchart of another method for face recognition according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a device for face recognition according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a face recognition method and a related device, which can be applied to a system or a program containing a face recognition function in terminal equipment, and can be used for acquiring a face image of a target object; then inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, the shallow network is used for extracting the shallow features of the face image, and the deep network is used for extracting the first feature of the shallow features; inputting shallow features of the face image into a deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by a shallow network in the training process, and network parameters of the deep semantic extraction branch are different from those of a deep network; then inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features; and further, feature comparison is carried out based on the fusion features so as to obtain the face-related identification information corresponding to the target object. Therefore, the face recognition process of fusing depth features is realized, and the diversity of the features is improved due to the fact that independent branches are introduced into a deep network of the integrated model; and the shallow network is characterized by a sharing mechanism, so that the accuracy of face recognition is improved, and meanwhile, the efficiency of face recognition is improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that the method for face recognition provided by the present application may be applied to a system or a program including a function of face recognition in a terminal device, for example, face recognition application, and specifically, the face recognition system may be operated in a network architecture as shown in fig. 1, which is a network architecture diagram operated by the face recognition system, as shown in the figure, the face recognition system may provide a process of face recognition with multiple information sources, that is, a face image is sent to a server through a triggering operation at a terminal side, so as to extract image features and perform further identity recognition; it will be appreciated that various terminal devices are shown in fig. 1, the terminal devices may be computer devices, in an actual scenario, there may be more or less terminal devices participating in the face recognition process, and the specific number and types are not limited herein, and in addition, one server is shown in fig. 1, but in an actual scenario, there may also be participation of multiple servers, especially in a scenario of multi-model training interaction, where the specific number of servers depends on the actual scenario.
In this embodiment, the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminals and servers may be directly or indirectly connected by wired or wireless communication, and the terminals and servers may be connected to form a blockchain network, which is not limited herein.
It will be appreciated that the face recognition system described above may be operated on a personal mobile terminal, for example: the application can be used as a face recognition application, can also be run on a server, and can also be used as a processing result of the face recognition of the information source, wherein the processing result is used for providing face recognition for the third party equipment; the specific face recognition system may be in a program form, may also be operated as a system component in the device, and may also be used as a cloud service program, where the specific operation mode is determined by the actual scenario and is not limited herein.
Along with the rapid development of internet technology, the requirements of people on information security are higher and higher, face recognition is an effective way for improving information security, and is applied to a mobile terminal face recognition system in actual scenes such as security, payment, access control and the like, so that high requirements are put forward on the operation time consumption and accuracy of a face recognition model.
Generally, a large amount of face training data can be collected to train a face recognition model in a large scale, so that the accuracy of face recognition is improved.
However, collecting a large amount of face training data requires consuming a large amount of resources, and labeling a large amount of face training data; in addition, because a large number of samples are difficult to label effectively, wrong labels are introduced, so that the training process is complicated, and the accuracy and efficiency of face recognition are affected.
In order to solve the above problems, the present application provides a face recognition method, which is applied to a flow frame of face recognition shown in fig. 2, as shown in fig. 2, and is a flow frame of face recognition provided by an embodiment of the present application, a user triggers a terminal to collect a face image through an access operation of the terminal, and sends the face image to a server, so as to extract multidimensional features through a face recognition model and deep semantic branches, and perform feature fusion, thereby obtaining a face recognition result.
According to the embodiment, on the basis of a face recognition model, an independent model branch structure is introduced into a deep structure of a network, and new random parameters are utilized to train the branch. Meanwhile, a feature dimension reduction module is added in the model output, so that the feature dimension of the model output is matched with the feature dimension of the existing model, and the compatibility between the scheme and the existing system is ensured. By integrating deep semantic features, the accuracy of the face recognition model is improved. The method has the advantages that only one branch with small calculation amount is added to the existing network, the time consumption increase of the method is guaranteed to have controllability, and the two branches are respectively and independently trained in training, so that the video memory and the time consumption of training are not increased.
It can be understood that the method provided by the application can be a program writing method, which is used as a processing logic in a hardware system, and can also be used as a face recognition device, and the processing logic is realized in an integrated or external mode. As one implementation, the means for face recognition triggers a target chapter in the first interactive video by responding to a target operation; then determining a second interactive video according to the association relation corresponding to the target chapter, wherein the association relation is determined based on an attribute tag or a mapping relation corresponding to the first interactive video, the attribute tag is used for indicating the content characteristics corresponding to the first interactive video, and the mapping relation is used for indicating the corresponding relation between the interactive videos; and further displaying the media content indicated in the second interactive video. Therefore, the process of associating and expanding the interactive video content is realized, and the existing interactive video content can be fully utilized due to the association of a plurality of interactive videos, so that the expansion progress of the interactive video content is greatly improved, and the expansion efficiency of the interactive video content is improved.
The scheme provided by the embodiment of the application relates to an artificial intelligence computer vision technology, and is specifically described by the following embodiments:
referring to fig. 3, fig. 3 is a flowchart of a method for face recognition according to an embodiment of the present application, where the method for managing face recognition may be executed by a terminal or a server, and the embodiment of the present application at least includes the following steps:
301. and acquiring a face image of the target object.
In this embodiment, the target object is the object to be identified, and the face image thereof can be acquired by the immediately-called camera device, so that the method is applied to each 1 to 1 or 1 to N face recognition service of face recognition application.
Specifically, in this embodiment, an independent feature extraction branch is introduced into a deep structure position of a face recognition model, and the model of normal face recognition and the deep face recognition branch are respectively and independently trained, and the deep feature extraction branch is used for assisting in integrating the recognition model, so that the recognition accuracy of the face recognition model is improved.
In a possible scenario, the architecture shown in fig. 4 is adopted in this embodiment, and fig. 4 is a schematic diagram of a scenario of a face recognition method provided by the embodiment of the present application; the embodiment is shown to be divided into a network module training phase and a network module integration deployment phase. Firstly, a normal face training model is adopted, then an independent deep feature extraction branch is introduced into a face recognition network, different initialization parameters are adopted to initialize the branch, and meanwhile, a shallow feature sharing mode is adopted to train the branch, so that the time consumption of the model is reduced. And finally, a feature fusion dimension reduction module is trained to fuse and integrate the face recognition model and the auxiliary deep semantic features. In the network module deployment stage, the auxiliary face recognition minor branches and the feature fusion dimension reduction model are integrated into a normal face recognition model, and the three models are integrated into the same model for deployment.
302. And inputting the face image into a face recognition model in the integrated model to obtain the first characteristic.
In this embodiment, the integrated model includes a face recognition model and a deep semantic extraction branch, that is, an independent feature extraction branch is introduced into a deep structure position of the face recognition model, and the normal face recognition model and the deep face recognition branch are respectively and independently trained, and the deep feature extraction branch is used for assisting in integrating the recognition model; the face recognition model comprises a shallow network and a deep network, wherein the shallow network is used for extracting the shallow features of the face image, and the deep network is used for extracting the first features of the shallow features.
The training process of the integrated model is described below, and the integrated model comprises a face recognition model and a deep semantic extraction branch, and can integrate the subsequent fusion dimension reduction model, namely, the three models are integrated into the same model for deployment.
Specifically, the training process of the face recognition model, the deep semantic extraction branch and the fusion dimension reduction model is performed step by step, namely training data for face recognition is firstly obtained; then inputting training data into the integrated model; shallow layer characteristics of training data are extracted based on a shallow layer network, and the shallow layer characteristics of the training data are extracted according to a deep layer network so as to train a face recognition model; training the deep semantic extraction branches based on training data, wherein the deep semantic extraction branches adopt a shallow network for feature input in the training process, and parameters of the shallow network are fixed, namely parameters of the shallow network in the trained face recognition model are fixed; the depth features output by the training features and the deep semantic extraction branches are further input into a fusion dimension reduction model to train the fusion dimension reduction model based on constraint conditions, and parameters of the face recognition model and the deep semantic extraction branches are fixed in the fusion dimension reduction model in the training and training process, so that the gradual training process is realized, and the calculated amount of training is reduced.
In one possible scenario, the training process for the face recognition model is shown in fig. 5, and fig. 5 is a flowchart of another face recognition method provided by an embodiment of the present application; firstly, acquiring training data based on a training data preparation module, and then carrying out feature extraction through an identification network unit model, namely firstly, extracting shallow features of the training data based on a shallow network, and extracting the shallow features of the training data according to a deep network; then executing a face recognition objective function calculation module, namely determining label information corresponding to training data; then matching the training features with the tag information to perform gradient calculation based on the first loss function; comparing the parameter value in the gradient calculation process corresponding to the first loss function with a first training condition, and if the first training condition (basic model training condition) is met, finishing the training of the face recognition model; if not, optimizing the objective function and training again.
Specifically, for the training data preparation module in the figure, namely in the training process, the face training data is read, and the read data is combined into a batch to be sent into a deep network unit for processing. The function of the recognition network unit module is to extract the spatial characteristics of the face picture, and the output characteristic picture keeps the spatial structure information of the face picture. The present module generally has a structure of Convolutional Neural Network (CNN), and includes operations such as convolutional (convolution) calculation, nonlinear activation function (Relu) calculation, pooling (Pooling) calculation, and the like.
In addition, the face recognition objective function calculation module takes the feature f output by the full-connection mapping unit and the label information of the face picture generating the vector as inputs to calculate an objective function value. The objective function may be a classification function (such as softmax, various types of softmax plus margin), or may be another type of objective function (first loss function). The human face recognition objective function optimization module performs training optimization on the whole network based on a gradient descent mode (such as random gradient descent, random gradient descent driving quantity items, adam, adagard). And repeating the training process in the training until the training result meets the training termination condition. The condition for ending model training generally sets the iteration times to meet the set value, or the loss calculated by the face recognition objective function is smaller than the set value, so that model training can be completed.
In order to meet the limitation of deployment scenes on model time consumption and obtain more stable model accuracy improvement, the embodiment provides a method for integrating the features of a face recognition model by using deep semantic features, improves the recognition accuracy of the face recognition model and ensures that the model can meet the operation time consumption in different application scenes. In the face training by adopting the deep network model, a strong recognition model (integrated model) can be obtained by integrating recognition features (deep semantic extraction and branch integration) by adopting a plurality of weaker recognition models (face recognition models).
It can be understood that, in the above embodiment, by adjusting the structure of the recognition model, two recognition branches are introduced into the deep semantic feature extraction network, and the two branches are respectively and independently trained. Therefore, the deep network with fewer layers can acquire the independent evaluation characteristics of the same picture, and then the two characteristics are integrated to obtain a stronger model.
303. And inputting the shallow features of the face image into a deep semantic extraction branch to obtain second features.
In this embodiment, the features extracted by the shallow layer network are adopted in the training process of the deep layer semantic extraction branch, and the network parameters of the deep layer semantic extraction branch are different from those of the deep layer network, that is, a sharing mechanism of the shallow layer features is adopted, so that the calculation amount in the training process is reduced, and the diversity of the features is improved through the configuration of random parameters.
Specifically, for the training process of deep semantic extraction branches, a modularized configuration situation may be combined to describe, as shown in fig. 6, fig. 6 is a flowchart of another face recognition method provided by the embodiment of the present application; the training data preparation module can be executed, namely first, first sampling information for sampling training data in the training process of the face recognition model is determined; then, distinguishing the first sampling information from the first sampling information to carry out sampling configuration so as to determine second sampling information corresponding to the deep semantic extraction branch, namely that the random seed control of data sampling is inconsistent with the training process of the face recognition model; the training data is then sampled based on the second sampling information to obtain training samples.
Then executing an integrated network unit module, namely inputting training samples into the trained shallow layer network to obtain sample input characteristics; inputting the sample input features into a deep semantic extraction branch to obtain sample output features; further determining label information corresponding to the output characteristics of the sample so as to perform gradient calculation of a second loss function on the deep semantic extraction branch; and comparing the parameter value in the gradient calculation process corresponding to the second loss function with a second training condition, if the second training condition (the number of times condition or the loss value condition) is met, completing training of the deep semantic extraction branch, and if the second training condition is not met, executing the training process again after adjusting the parameter of the second loss function.
In one possible scenario, in order to enrich the feature dimension, the adjustment of the sample output feature can be performed based on the random seed control module, namely, the parameter random information of the depth network in the face recognition model is obtained; then, carrying out random seed configuration on parameter random information different from the deep network so as to carry out parameter configuration on deep semantic extraction branches; and further, deep semantic extraction branches after sample input feature input parameter configuration are used for obtaining sample output features.
In addition, for the face recognition objective function calculation module, the parameter control can be performed through the loss super parameter control module, namely, firstly, the label information corresponding to the sample output characteristics is determined so as to determine the quality parameters of the training samples; then, the constraint parameters of the quality parameters and the second loss function are configured differently from the face recognition model, so that parameter adjustment is performed on the second loss function; and performing gradient calculation on the deep semantic extraction branches based on the second loss function after parameter adjustment.
In one possible scenario, the face recognition network is divided into 4 levels (stage 1, stage2, stage3, stage 4), and the following description is made in connection with the scenario, as shown in fig. 7, and fig. 7 is a schematic diagram of a scenario of another face recognition method according to an embodiment of the present application; the figure shows the structure of the integrated model, i.e. the face recognition network is divided into 4 parts (stage 1, stage2, stage3, stage 4). In this embodiment, a deep semantic extraction branch is introduced, and features output by a shallow network stage3 are shared with a face recognition network stage4 branch. In the training of the step, stage1 to stage4 are parts of parameter non-updating, and training parameters update parts of deep semantic extraction branches. Deep semantic extraction branches typically have a structure of Convolutional Neural Network (CNN) including convolutional (convolution) computation, nonlinear activation function (Relu) computation, pooling (Pooling) computation, and the like. In this step, parameters of face recognition networks stage1 to stage4 are model parameters inherited from the training in step 302.
Based on the above scenario, for the training data preparation module, the training data reading data is normal face data. In this step, the random seed control of the data sampling is inconsistent with step 302, increasing the complementarity of the model training process with step 302.
For the integrated network element module, the module is composed of the face recognition network element module obtained in step 302 and the deep semantic feature extraction branch introduced.
For a face recognition objective function calculation module, the module takes probability vectors of each category of pictures obtained by matrix multiplication of the feature f output by the deep semantic feature extraction branch and the centers of each category and label information of face pictures for generating the vectors as inputs to calculate an objective function value. The objective function may be a classification function (such as softmax, various types of softmax plus margin), or may be another type of objective function. The loss calculated by the module is L class
For the random seed control module, the module performs random initialization on deep semantic extraction branches, and the initialized random seed control is inconsistent with the face recognition model in the step 1, so that the diversity of the network is enriched.
For the loss-of-reference control module, the module controls the loss function trained by the complementary model. The loss constraint adopted in the face recognition training at present is generally a loss function with a margin constraint, as shown in the following formula,
Wherein s is the distance from the origin of the face picture, and m is a constraint parameter.
In this embodiment, the face pictures are gathered around the space of the identity vector, and the distance s from the origin expresses the quality of the pictures, so that the adjustment of s and m ensures the diversity of the constraint of the pictures in the space during the training of the deep semantic features, and the learning of the complementary knowledge of the model can be promoted. The control module of the super-reference is lost, and the configuration of s and m is controlled, and is inconsistent with the face recognition model.
For the face recognition objective function optimization module, the function of this module is identical to that of the module in step 302.
In another possible scenario, a network search technology is also adopted to search for the deep semantic extraction branches, so that the time consumption of network operation and the recognition accuracy are both considered. Firstly, carrying out network searching based on preset precision information (preset operation time consumption, training scale or recognition precision) so as to determine parameter information of deep semantic extraction branches; then carrying out parameter configuration on the deep semantic extraction branches according to the parameter information obtained by network searching; and further, inputting the shallow features of the face image into the deep semantic extraction branch subjected to parameter configuration to obtain the second features, so that the configuration efficiency of the deep semantic extraction branch is improved.
304. Inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features.
In this embodiment, the output features of the integrated model are fused to reduce the dimension. Because the feature output dimension of a single face recognition model is d, the output dimension of the model is 2*d when integrated auxiliary recognition is performed by adopting deep semantic features under the assumption that fusion dimension reduction processing is not performed. Because the feature length of the model becomes large, the comparison time is prolonged when the model is deployed, the model is incompatible with the feature dimension of the existing comparison model, and redundant information exists in the integrated features, the embodiment provides a fused dimension reduction module for executing the fused dimension reduction model.
In a possible scenario, a training process of fusing a dimension reduction model is shown in fig. 8, and fig. 8 is a flowchart of another face recognition method provided by an embodiment of the present application; the integrated network model outputs 2*d-dimensional features (stage 4 output feature d dimension and deep semantic extraction branch output feature d dimension) as a fusion dimension reduction module matrix input.
Specifically, the fused dimension reduction module is an independent neural network, and the network module comprises nonlinear activation function (Relu) calculation, pooling (Pooling), full connection and other operations. The constraint function of the training module is the face recognition constraint function used in step 303. In the training process of the module, parameters of the face recognition integrated model are not updated.
In addition, the training process of the fusion dimension reduction module further comprises a training condition judging process, wherein the training process can be specifically the number of times of training or the magnitude of loss values, if the training condition is met, the training of the fusion dimension reduction module is completed, if the training condition is not met, the optimization of the objective function is performed, and the training is performed again.
In this embodiment, two independent branches are introduced into the deep network of the same model, and the shallow network is characterized by a sharing mechanism. Therefore, the time consumption is not increased obviously by the double-branch model structure, and the deployment requirement of the recognition model is met. Finally, an integrated feature dimension reduction technology is introduced in the embodiment, and dimension reduction is carried out on the integrated features to feature dimensions consistent with the original model, so that the setting of model deployment compatibility is met.
305. And performing feature comparison based on the fusion features to obtain the face-related identification information corresponding to the target object.
In the embodiment, as the fusion features are fused with the multidimensional image features, the accuracy of face recognition can be improved in the comparison process, so that the recognition information related to the face corresponding to the target object is obtained; the face-related identification information may include whether the face image is a registered image, description information (name, sex, number, etc.) of the face image, or matching degree information of the face image.
It can be understood that the integrated model in this embodiment may combine the deep semantic extraction branches obtained by training, the normal face recognition model, and the feature fusion dimension reduction module. Feature F output by normal face recognition model N The feature output by the deep semantic extraction branch is F D After the two are spliced, the integrated characteristic F is output by a characteristic fusion dimension reduction module C The feature dimension of the integrated recognition model is unchanged, and the integrated recognition model can be normally integrated into a conventional face recognition system. Therefore, the accuracy of the face recognition system can be improved on the premise of slightly increasing the time consumption of the model by adopting the embodiment, so that the face recognition system is suitable for various complex application scenes.
As can be seen from the above embodiments, a face image of a target object is acquired; then inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, the shallow network is used for extracting the shallow features of the face image, and the deep network is used for extracting the first feature of the shallow features; inputting shallow features of the face image into a deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by a shallow network in the training process, and network parameters of the deep semantic extraction branch are different from those of a deep network; then inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features; and further, feature comparison is carried out based on the fusion features so as to obtain the face-related identification information corresponding to the target object. Therefore, the face recognition process of fusing depth features is realized, and the diversity of the features is improved due to the fact that independent branches are introduced into a deep network of the integrated model; and the shallow network is characterized by a sharing mechanism, so that the accuracy of face recognition is improved, and meanwhile, the efficiency of face recognition is improved.
In order to better implement the above-described aspects of the embodiments of the present application, the following provides related apparatuses for implementing the above-described aspects. Referring to fig. 9, fig. 9 is a schematic structural diagram of a device for recognizing human face provided in an embodiment of the present application, and a recognition device 900 includes:
an acquiring unit 901, configured to acquire a face image of a target object;
the input unit 902 is configured to input the face image into a face recognition model in an integrated model to obtain a first feature, where the integrated model includes the face recognition model and a deep semantic extraction branch, and the face recognition model includes a shallow network and a deep network, and the shallow network is used to extract the shallow feature of the face image;
the input unit 902 is further configured to input the shallow feature into the deep semantic extraction branch to obtain a second feature, where the deep semantic extraction branch adopts the feature extracted by the shallow network in the training process, and a network parameter of the deep semantic extraction branch is different from a network parameter of the deep network;
the fusion unit 903 is configured to input the first feature and the second feature into a fusion dimension reduction model for fusion, so as to obtain a fusion feature;
And the identifying unit 904 is configured to perform feature comparison based on the fusion feature to obtain identification information related to the face corresponding to the target object.
Optionally, in some possible implementations of the present application, the identifying unit 904 is specifically configured to obtain training data for face recognition;
the identifying unit 904 is specifically configured to input the training data into the integrated model;
the identifying unit 904 is specifically configured to extract shallow features of the training data based on the shallow network, and extract the shallow features of the training data according to the deep network, so as to train the face recognition model;
the identifying unit 904 is specifically configured to train the deep semantic extraction branch based on the training data, where the deep semantic extraction branch uses the shallow network to perform feature input during the training process, and parameters of the shallow network are fixed;
the recognition unit 904 is specifically configured to input the training feature and the depth feature output by the deep semantic extraction branch into the fused dimension reduction model, so as to train the fused dimension reduction model based on constraint conditions, where parameters of the face recognition model and the deep semantic extraction branch are fixed in the fused dimension reduction model during training and training.
Optionally, in some possible implementations of the present application, the identifying unit 904 is specifically configured to extract shallow features of the training data based on the shallow network, and extract the training features of the training data according to the deep network;
the identifying unit 904 is specifically configured to determine tag information corresponding to the training data;
the identifying unit 904 is specifically configured to match the training feature with the tag information, so as to perform gradient calculation based on a first loss function;
the identifying unit 904 is specifically configured to compare a parameter value in a gradient calculation process corresponding to the first loss function with a first training condition, and if the first training condition is satisfied, complete training of the face recognition model.
Optionally, in some possible implementations of the present application, the identifying unit 904 is specifically configured to determine first sampling information for sampling the training data in the training process of the face recognition model;
the identifying unit 904 is specifically configured to perform sampling configuration different from the first sampling information, so as to determine second sampling information corresponding to the deep semantic extraction branch;
The identifying unit 904 is specifically configured to sample the training data based on the second sampling information to obtain a training sample;
the identifying unit 904 is specifically configured to input the training sample into the trained shallow network, so as to obtain a sample input feature;
the identifying unit 904 is specifically configured to input the sample input feature into the deep semantic extraction branch to obtain a sample output feature;
the identifying unit 904 is specifically configured to determine tag information corresponding to the sample output feature, so as to perform gradient calculation of a second loss function on the deep semantic extraction branch;
the identifying unit 904 is specifically configured to compare a parameter value in a gradient calculation process corresponding to the second loss function with a second training condition, and if the second training condition is satisfied, complete training of the deep semantic extraction branch.
Optionally, in some possible implementations of the present application, the identifying unit 904 is specifically configured to obtain parameter random information of a depth network in the face recognition model;
the identifying unit 904 is specifically configured to perform random seed configuration on parameter random information different from the deep semantic extraction branch, so as to perform parameter configuration on the deep semantic extraction branch;
The identifying unit 904 is specifically configured to extract branches from the deep semantic extraction after the sample input feature input parameter configuration, so as to obtain the sample output feature.
Optionally, in some possible implementations of the present application, the identifying unit 904 is specifically configured to determine tag information corresponding to the sample output feature, so as to determine a quality parameter of the training sample;
the identifying unit 904 is specifically configured to perform configuration different from the face recognition model on the quality parameter and the constraint parameter of the second loss function, so as to perform parameter adjustment on the second loss function;
the identifying unit 904 is specifically configured to perform gradient calculation on the deep semantic extraction branch based on the second loss function after parameter adjustment.
Optionally, in some possible implementations of the present application, the input unit 902 is specifically configured to perform a network search based on preset precision information to determine parameter information of the deep semantic extraction branch;
the input unit 902 is specifically configured to perform parameter configuration on the deep semantic extraction branch according to the parameter information obtained by network searching;
the input unit 902 is specifically configured to input the shallow feature of the face image into the deep semantic extraction branch after parameter configuration to obtain the second feature.
Acquiring a face image of a target object; then inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, the shallow network is used for extracting the shallow features of the face image, and the deep network is used for extracting the first feature of the shallow features; inputting shallow features of the face image into a deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by a shallow network in the training process, and network parameters of the deep semantic extraction branch are different from those of a deep network; then inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features; and further, feature comparison is carried out based on the fusion features so as to obtain the face-related identification information corresponding to the target object. Therefore, the face recognition process of fusing depth features is realized, and the diversity of the features is improved due to the fact that independent branches are introduced into a deep network of the integrated model; and the shallow network is characterized by a sharing mechanism, so that the accuracy of face recognition is improved, and meanwhile, the efficiency of face recognition is improved.
The embodiment of the present application further provides a terminal device, as shown in fig. 10, which is a schematic structural diagram of another terminal device provided in the embodiment of the present application, for convenience of explanation, only the portion related to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (personal digital assistant, PDA), a point of sale (POS), a vehicle-mounted computer, and the like, taking the terminal as an example of the mobile phone:
fig. 10 is a block diagram showing a part of the structure of a mobile phone related to a terminal provided by an embodiment of the present application. Referring to fig. 10, the mobile phone includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (wireless fidelity, wiFi) module 1070, processor 1080, and power source 1090. It will be appreciated by those skilled in the art that the handset construction shown in fig. 10 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The following describes the components of the mobile phone in detail with reference to fig. 10:
the RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1080; in addition, the data of the design uplink is sent to the base station. Typically, the RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (global system of mobile communication, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), long term evolution (long term evolution, LTE), email, short message service (short messaging service, SMS), and the like.
The memory 1020 may be used to store software programs and modules that the processor 1080 performs various functional applications and data processing of the handset by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device.
The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1031 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc., and spaced touch operations within a certain range on the touch panel 1031) and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. Further, the touch panel 1031 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.
The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 1031 may overlay the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 1080 to determine a type of touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of touch event. Although in fig. 10, the touch panel 1031 and the display panel 1041 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.
Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and a cell phone. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to, for example, another cell phone via RF circuit 1010 or for output to memory 1020 for further processing.
WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1070, so that wireless broadband Internet access is provided for the user. Although fig. 10 shows a WiFi module 1070, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the invention.
Processor 1080 is the control center of the handset, connects the various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in memory 1020 and invoking data stored in memory 1020, thereby performing overall monitoring of the handset. Optionally, processor 1080 may include one or more processing units; alternatively, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1080.
The handset further includes a power source 1090 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 1080 via a power management system, such as for managing charge, discharge, and power consumption by the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.
In the embodiment of the present application, the processor 1080 included in the terminal also has a function of executing each step of the page processing method as described above.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) storing application programs 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.
The server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
The steps performed by the management apparatus in the above-described embodiments may be based on the server structure shown in fig. 11.
In an embodiment of the present application, there is further provided a computer readable storage medium having stored therein instructions for face recognition, which when executed on a computer, cause the computer to perform the steps performed by the apparatus for face recognition in the method described in the embodiment of fig. 3 to 8.
There is also provided in an embodiment of the application a computer program product comprising instructions for face recognition which, when run on a computer, cause the computer to perform the steps performed by the apparatus for face recognition in the method described in the embodiment of figures 3 to 8 described above.
The embodiment of the application also provides a face recognition system, which can comprise the face recognition device in the embodiment shown in fig. 9, or the terminal equipment in the embodiment shown in fig. 10, or the server in fig. 11.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product or all or part of the technical solution, which is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a face recognition device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (11)

1. A method of face recognition, comprising:
acquiring a face image of a target object;
inputting the face image into a face recognition model in an integrated model to obtain a first feature, wherein the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, and the shallow network is used for extracting the shallow feature of the face image;
inputting the shallow layer features of the face image into the deep semantic extraction branch to obtain second features, wherein the deep semantic extraction branch adopts features extracted by the shallow layer network in the training process, and the network parameters of the deep semantic extraction branch are different from those of the deep layer network;
Inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features;
and performing feature comparison based on the fusion features to obtain the face-related identification information corresponding to the target object.
2. The method according to claim 1, wherein the method further comprises:
acquiring training data for face recognition;
inputting the training data into the integrated model;
extracting shallow features of the training data based on the shallow network, and extracting the training features of the shallow features of the training data according to the deep network so as to train the face recognition model;
training the deep semantic extraction branch based on the training data, wherein the deep semantic extraction branch adopts the shallow network to perform characteristic input in the training process, and parameters of the shallow network are fixed;
and inputting the training features and the depth features output by the deep semantic extraction branches into the fused dimension reduction model to train the fused dimension reduction model based on constraint conditions, wherein the parameters of the face recognition model and the deep semantic extraction branches are fixed in the fused dimension reduction model in the training and training process.
3. The method according to claim 2, wherein the extracting the shallow features of the training data based on the shallow network and extracting the shallow features of the training data according to the deep network to train the face recognition model includes:
extracting shallow features of the training data based on the shallow network, and extracting the training features of the training data according to the deep network;
determining label information corresponding to the training data;
matching the training features with the tag information to perform gradient computation based on a first loss function;
and comparing the parameter value in the gradient calculation process corresponding to the first loss function with a first training condition, and if the first training condition is met, finishing the training of the face recognition model.
4. The method of claim 2, wherein the training the deep semantic extraction branch based on the training data comprises:
determining first sampling information for sampling the training data in the face recognition model training process;
Sampling configuration is carried out different from the first sampling information so as to determine second sampling information corresponding to the deep semantic extraction branch;
sampling the training data based on the second sampling information to obtain a training sample;
inputting the training samples into the trained shallow network to obtain sample input characteristics;
inputting the sample input features into the deep semantic extraction branches to obtain sample output features;
determining label information corresponding to the sample output characteristics so as to perform gradient calculation of a second loss function on the deep semantic extraction branch;
and comparing the parameter value in the gradient calculation process corresponding to the second loss function with a second training condition, and if the second training condition is met, completing training of the deep semantic extraction branch.
5. A method according to claim 3, wherein said inputting the sample input features into the deep semantic extraction branches to obtain sample output features comprises:
acquiring parameter random information of a depth network in the face recognition model;
carrying out random seed configuration on parameter random information different from the deep semantic extraction branch so as to carry out parameter configuration on the parameter random information of the deep semantic extraction branch;
And extracting branches from the deep semantic meaning after the sample input feature input parameter configuration to obtain the sample output feature.
6. A method according to claim 3, wherein determining the label information corresponding to the sample output feature for gradient calculation of a second loss function for the deep semantic extraction branch comprises:
determining label information corresponding to the sample output characteristics to determine quality parameters of training samples;
the quality parameters and constraint parameters of the second loss function are configured differently from the face recognition model, so that parameter adjustment is performed on the second loss function;
and carrying out gradient calculation on the deep semantic extraction branch based on the second loss function after parameter adjustment.
7. The method according to claim 1, wherein said inputting the shallow features of the face image into the deep semantic extraction branch to obtain the second features comprises:
performing network searching based on preset precision information to determine parameter information of the deep semantic extraction branch;
carrying out parameter configuration on the deep semantic extraction branches according to the parameter information obtained by network searching;
And inputting the shallow features of the face image into the deep semantic extraction branch subjected to parameter configuration to obtain the second features.
8. An apparatus for face recognition, comprising:
the acquisition unit is used for acquiring the face image of the target object;
the input unit is used for inputting the face image into a face recognition model in an integrated model to obtain a first characteristic, the integrated model comprises the face recognition model and a deep semantic extraction branch, the face recognition model comprises a shallow network and a deep network, and the shallow network is used for extracting the shallow characteristic of the face image;
the input unit is further configured to input the shallow feature into the deep semantic extraction branch to obtain a second feature, where the deep semantic extraction branch adopts a feature extracted by the shallow network in a training process, and network parameters of the deep semantic extraction branch are different from network parameters of the deep network;
the fusion unit is used for inputting the first feature and the second feature into a fusion dimension reduction model for fusion so as to obtain fusion features;
and the identification unit is used for carrying out feature comparison based on the fusion features so as to obtain the identification information related to the face corresponding to the target object.
9. A computer device, the computer device comprising a processor and a memory:
the memory is used for storing program codes; the processor is configured to perform the method of face recognition according to any one of claims 1 to 7 according to instructions in the program code.
10. A computer program product comprising computer programs/instructions stored on a computer readable storage medium, characterized in that the computer programs/instructions in the computer readable storage medium, when executed by a processor, implement the steps of the method of face recognition according to any of the preceding claims 1 to 7.
11. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of face recognition as claimed in any one of claims 1 to 7.
CN202210779688.2A 2022-07-04 2022-07-04 Face recognition method, device and storage medium Pending CN117037232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210779688.2A CN117037232A (en) 2022-07-04 2022-07-04 Face recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210779688.2A CN117037232A (en) 2022-07-04 2022-07-04 Face recognition method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117037232A true CN117037232A (en) 2023-11-10

Family

ID=88643545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210779688.2A Pending CN117037232A (en) 2022-07-04 2022-07-04 Face recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117037232A (en)

Similar Documents

Publication Publication Date Title
CN108875781B (en) Label classification method and device, electronic equipment and storage medium
CN108494947B (en) Image sharing method and mobile terminal
CN110704661B (en) Image classification method and device
CN109670174B (en) Training method and device of event recognition model
CN111816159B (en) Language identification method and related device
CN104239535A (en) Method and system for matching pictures with characters, server and terminal
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN111209423B (en) Image management method and device based on electronic album and storage medium
CN110598046A (en) Artificial intelligence-based identification method and related device for title party
CN110166828A (en) A kind of method for processing video frequency and device
CN112203115B (en) Video identification method and related device
CN111914113A (en) Image retrieval method and related device
CN110516113B (en) Video classification method, video classification model training method and device
CN114357278B (en) Topic recommendation method, device and equipment
CN113822427A (en) Model training method, image matching device and storage medium
CN112270238A (en) Video content identification method and related device
CN115171196B (en) Face image processing method, related device and storage medium
CN116453005A (en) Video cover extraction method and related device
CN117037232A (en) Face recognition method, device and storage medium
CN113569043A (en) Text category determination method and related device
CN113127740A (en) Information recommendation method, electronic device and storage medium
CN115080840A (en) Content pushing method and device and storage medium
CN116450808B (en) Data processing method and device and storage medium
CN113806533B (en) Metaphor sentence type characteristic word extraction method, metaphor sentence type characteristic word extraction device, metaphor sentence type characteristic word extraction medium and metaphor sentence type characteristic word extraction equipment
CN114743081B (en) Model training method, related device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination