CN116524550A - Server and face recognition method - Google Patents

Server and face recognition method Download PDF

Info

Publication number
CN116524550A
CN116524550A CN202210060530.XA CN202210060530A CN116524550A CN 116524550 A CN116524550 A CN 116524550A CN 202210060530 A CN202210060530 A CN 202210060530A CN 116524550 A CN116524550 A CN 116524550A
Authority
CN
China
Prior art keywords
target
face
feature vector
image
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210060530.XA
Other languages
Chinese (zh)
Inventor
洪锦奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Jukanyun Technology Co ltd
Original Assignee
Qingdao Jukanyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Jukanyun Technology Co ltd filed Critical Qingdao Jukanyun Technology Co ltd
Priority to CN202210060530.XA priority Critical patent/CN116524550A/en
Publication of CN116524550A publication Critical patent/CN116524550A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a server and a face recognition method, wherein the server is configured to: receiving a face recognition request of display equipment, wherein the face recognition request comprises a target picture; responding to the face recognition request, and carrying out face detection on the target image to obtain an image of the target face; carrying out face local segmentation on the image of the target face to obtain an image of the target five sense organs; acquiring a first feature vector corresponding to an image of a target face through a first deep learning network, and acquiring a second feature vector corresponding to an image of a target five sense organs through a second deep learning network; detecting feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points; and matching the target person corresponding to the target face in the face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device. The face recognition method and device improve accuracy of face recognition.

Description

Server and face recognition method
Technical Field
The application relates to the technical field of face detection, in particular to a server and a face recognition method.
Background
When a user views a movie, a television show, a variety and other media on an intelligent terminal, if a star which is not known by the user exists in a display picture on the intelligent terminal, the user may have a need for identifying the star. The user can carry out screenshot on the current display picture on the intelligent terminal, the intelligent terminal can send the screenshot to the server, and the server carries out face recognition to obtain character information in the screenshot. Face detection is an important link in face recognition. In the related art, after a server obtains a face image through face detection in the face recognition process, global feature extraction is generally performed on the face image through a deep learning network, and then face matching is performed according to the extracted features. However, the global features extracted by the deep learning network are only some more remarkable features, and for the insignificant features, the deep learning network is often ignored, which can cause the accuracy of face recognition to not meet the actual needs, and influence the user experience of star recognition of the intelligent terminal.
Disclosure of Invention
In order to solve the technical problem of low face recognition accuracy, the application provides a server and a face recognition method.
In a first aspect, the present application provides a server configured to:
receiving a face recognition request of display equipment, wherein the face recognition request comprises a target picture;
responding to the face recognition request, and carrying out face detection on the target image to obtain an image of a target face;
carrying out face local segmentation on the image of the target face to obtain an image of the target five sense organs;
acquiring a first feature vector corresponding to the image of the target face through a first deep learning network, and acquiring a second feature vector corresponding to the image of the target five sense organs through a second deep learning network;
detecting feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points;
and matching the target person corresponding to the target face in a face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device.
In some embodiments, matching the target person corresponding to the target face in the face database according to the first feature vector, the second feature vector and the third feature vector includes:
Fusing the second feature vectors corresponding to all the target five sense organs to obtain a second feature fusion vector;
fusing the first feature vector and the second feature fusion vector to obtain a target fusion feature vector;
screening fourth data matched with the target fusion feature vector from a face database;
and screening fifth data matched with the third feature vector from the fourth data, and obtaining character data of the target character according to the fifth data.
In a second aspect, the present application provides a face recognition method, including:
receiving a face recognition request of display equipment, wherein the face recognition request comprises a target picture;
responding to the face recognition request, and carrying out face detection on the target image to obtain an image of a target face;
carrying out face local segmentation on the image of the target face to obtain an image of the target five sense organs;
acquiring a first feature vector corresponding to the image of the target face through a first deep learning network, and acquiring a second feature vector corresponding to the image of the target five sense organs through a second deep learning network;
detecting feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points;
And matching the target person corresponding to the target face in a face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device.
The server and the face recognition method have the beneficial effects that:
after the image of the target face is obtained, the image of the target face is segmented to obtain the image of the target five sense organs. Processing the image of the target face through a first deep learning network to obtain a first feature vector, wherein the first feature vector can reflect the global features of the target face; processing the image of the target five sense organs through a second deep learning network to obtain a second feature vector, wherein the second feature vector can reflect the local features of the target five sense organs; processing the target face through a feature extraction network to obtain a third feature vector, wherein the third feature vector can reflect the detail features of the target face; the three feature vectors reflect a large amount of information of the target face, and face matching is carried out in the face database according to the three feature vectors, so that the accuracy of face matching can be improved, and the accuracy of face recognition is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
A schematic diagram of an operational scenario between a display device and a control apparatus according to some embodiments is schematically shown in fig. 1;
a flow diagram of a face recognition method according to some embodiments is schematically shown in fig. 2;
a flow diagram of a method of screening target persons according to some embodiments is schematically shown in fig. 3;
a flow diagram of a method of screening target persons according to some embodiments is schematically shown in fig. 4;
a timing diagram of a face recognition process according to some embodiments is schematically shown in fig. 5;
a schematic diagram of a target picture according to some embodiments is shown schematically in fig. 6;
a schematic diagram of a face contour is shown schematically in fig. 7 according to some embodiments;
an image schematic of a target face according to some embodiments is shown schematically in fig. 8;
a schematic of the localization of a targeted five sense organ according to some embodiments is exemplarily shown in fig. 9;
An image schematic of a target five sense organ according to some embodiments is schematically shown in fig. 10;
fig. 11 illustrates a flow diagram of a face recognition method according to some embodiments.
Detailed Description
For purposes of clarity and implementation of the present application, the following description will make clear and complete descriptions of exemplary implementations of the present application with reference to the accompanying drawings in which exemplary implementations of the present application are illustrated, it being apparent that the exemplary implementations described are only some, but not all, of the examples of the present application.
It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms "first," second, "" third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for limiting a particular order or sequence, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The display device provided in the embodiment of the application may have various implementation forms, for example, may be a television, an intelligent television, a laser projection device, a display (monitor), an electronic whiteboard (electronic bulletin board), an electronic desktop (electronic table), and the like. Fig. 1 is a specific embodiment of a display device of the present application.
Fig. 1 is a schematic diagram of an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the smart device 300 or the control apparatus 100.
In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication or bluetooth protocol communication, and other short-range communication modes, and the display device 200 is controlled by a wireless or wired mode. The user may control the display device 200 by inputting user instructions through keys on a remote control, voice input, control panel input, etc.
In some embodiments, a smart device 300 (e.g., mobile terminal, tablet, computer, notebook, etc.) may also be used to control the display device 200. For example, the display device 200 is controlled using an application running on a smart device.
In some embodiments, the display device may receive instructions not using the smart device or control device described above, but rather receive control of the user by touch or gesture, or the like.
In some embodiments, the display device 200 may also perform control in a manner other than the control apparatus 100 and the smart device 300, for example, the voice command control of the user may be directly received through a module configured inside the display device 200 device for acquiring voice commands, or the voice command control of the user may be received through a voice control device configured outside the display device 200 device.
In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display device 200. The server 400 may be a cluster, or may be multiple clusters, and may include one or more types of servers.
In some embodiments, a screen capture key may be provided on the control device 100. In the process of playing the media asset, if a person unknown to the user appears on the display device, the user can press the screen capturing key of the remote controller, the display device captures the current display picture to obtain a target picture, and the target picture is sent to the server 400 for face recognition.
In some embodiments, after receiving the target picture, the server 400 detects a face in the target picture through a face detection technology, compares features of the face with features of each face in the database to obtain a matched face in the database, feeds information of the matched face back to the display device 200, and the display device 200 displays the information of the face, thereby realizing face recognition of the target picture.
In some embodiments, when the server 400 extracts the features of the face, the global features of the face are extracted only through the deep learning network, and these global features can only reflect some more significant features of the face, but cannot reflect the insignificant features of the face, so that the accuracy of face recognition cannot meet the actual needs.
In order to improve the accuracy of face recognition, the embodiment of the application provides a face recognition method, which not only extracts global features, but also extracts local features of five sense organs and some detail features when face features are extracted, so that the extracted features are rich, more accurate target characters can be matched, and the accuracy of face recognition is improved.
Referring to fig. 2, a face recognition method provided in an embodiment of the present application is a flow chart of a face recognition method according to some embodiments, and as shown in fig. 2, the face recognition method may include the following steps:
step S101: and receiving a face recognition request of the display equipment, wherein the face recognition request comprises the target picture.
In some embodiments, when the user views the media asset on the display device, if the user wants to know the person on the display screen of the display device, the user can press the screen capturing key of the remote controller, and the display device captures the current display screen to obtain the target picture.
In some embodiments, after obtaining the target picture, the display device may generate a face recognition request including the target picture, and send the face recognition request to the server.
Step S102: and responding to the face recognition request, and carrying out face detection on the target image to obtain an image of the target face.
In some embodiments, after receiving the person identification request, the server extracts the target picture from the person identification request, and performs face detection on the target picture to obtain a rectangular frame. Each rectangular frame corresponds to one face, and if a plurality of faces are detected in the target picture, a plurality of rectangular frames are obtained. The server can divide each rectangular frame area from the target picture, and the obtained image is a face image corresponding to the target picture.
Step S103: and carrying out face partial segmentation on the image of the target face to obtain an image of the target five sense organs.
In some embodiments, face key points are set in the image of the target face obtained through face detection, and the position of each five sense organs in the image of the target face can be determined according to the coordinates of the face key points, so that the image of each five sense organs can be segmented from the image of the target face, and a plurality of images of the target five sense organs can be obtained.
In some embodiments, the facial image may be partially segmented according to facial features including eyebrows, eyes, nose, mouth, and ears to obtain facial feature images of the following single facial features: left eyebrow image, left eye image, left ear image, right eyebrow image, right eye image, left ear image, nose image, and mouth image.
In some embodiments, according to the facial features that affect the accuracy of face recognition, the facial features mainly include eyebrows, eyes, nose and mouth, and the facial images are partially segmented to obtain the following facial feature images: the left eye, the right eye, the nose and the mouth, wherein the left eye comprises a left eye and a left eyebrow, and the right eye comprises a right eye and a right eyebrow.
Step S104: and acquiring a first feature vector corresponding to the image of the target face through a first deep learning network, and acquiring a second feature vector corresponding to the five-sense organ image through a second deep learning network.
In some embodiments, the first deep learning network and the second deep learning network are different deep learning network models. Two deep learning network models may be pre-trained prior to face recognition, one of which may be referred to as a first model and the other of which may be referred to as a second model. The first model is used for carrying out global feature extraction on the image of the target face, and the second model is used for carrying out local feature extraction on the image of the target five sense organs, and the local feature can be called as a first local feature and represents the characteristics of one target five sense organs. Both the first model and the second model may be trained from an MGN (Multiple Granularity Network, multi-granularity network) model. The MGN model includes a backbone network and a loss layer, wherein the backbone network is a Resnet50 network.
The sample of the first model is an image containing a whole face, after the sample is acquired, a training set can be constructed to train the MGN model, and in the training process, back propagation is carried out through a loss layer so as to optimize a backbone network of the model. After training the model to convergence, the model is stripped of the loss layer, resulting in a first model comprising an optimized backbone network, without loss layer. And extracting the characteristics of the sample of the image containing the whole face by using the first model, so as to obtain a vector with the dimension of 2048.
The samples of the second model are images containing five sense organs, each sample containing only one five sense organ, the five sense organs being one five sense organ separated from the samples in the first learning network. For example, if one sample of the first deep learning network model is the picture P, the face in the picture P may be subjected to facial recognition and segmentation to obtain the following facial images: left eye brow image, right eye brow image, nose image, and mouth image. The five sense organs images are all taken as samples to be input into an MGN model for training, and the training method is the same as that of the MGN model with the complete face as the sample. And extracting the characteristics of the sample containing the image of the five sense organs by using the second model to obtain a vector with the dimension of 2048.
It can be seen that, for an image including a whole face, a 2048-dimensional vector is obtained through the first model, and after five sense organs are identified and segmented, if a left eye brow image, a right eye brow image, a nose image and a mouth image are obtained, 4 2048-dimensional vectors are obtained through the second model.
In some embodiments, after obtaining the image of the target face, the image of the target face is input into the first model, so as to obtain a vector with dimension 2048, where the vector is a feature vector corresponding to the image of the target face, and the feature vector may be referred to as a first feature vector.
In some embodiments, after obtaining the image of the target five sense organs, inputting the image of the target five sense organs into the second model may obtain a vector with dimension 2048, where the vector is a feature vector corresponding to the image of the target five sense organs, and the feature vector may be referred to as a second feature vector. Taking the target five sense organs as the left eye eyebrow, the right eye eyebrow, the nose and the mouth respectively as examples, the number of second feature vectors corresponding to one target face image is 4.
Step S105: and detecting the feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points.
In some embodiments, a feature point extraction network may be trained in advance before performing face recognition, where the feature point extraction network is used to extract features such as skin color, skin texture, skin shape, and the like. The feature point extraction network may be an extraction network of surf feature points, for example. The feature extraction network may extract local features of the face image, which may be referred to as second local features, which are used to represent some detailed features of the image of the target face. Taking an image containing the whole face as a sample, the feature extraction network can output 256-dimensional vectors or 2048-dimensional vectors, taking 2048-dimensional vectors as an example, and the 2048-dimensional vector can be called a third feature vector.
It can be seen that, for an image including a complete face, a 2048-dimensional first feature vector is obtained by processing the image with the first model, 4 2048-dimensional second feature vectors are obtained by processing the image with the second model, and a 2048-dimensional third feature vector is obtained by processing the image with the feature extraction network.
Step S106: and matching the target person corresponding to the target face in a face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device.
In some embodiments, the character data for each character in the face database is provided with three feature vectors: the first feature vector is a vector for matching with the first feature vector, which may be referred to as a fourth feature vector; the second feature vector is a vector for matching with the second feature vector, which may be referred to as a fifth feature vector for matching with the target fusion feature vector; the third feature vector is a vector for matching with the third feature vector, which may be referred to as a sixth feature vector. If the first feature vector corresponding to the target face is matched with the fourth feature vector of one person, the second feature vector corresponding to the target face is matched with the fifth feature vector of the person, and the third feature vector corresponding to the target face is matched with the sixth feature vector of the person, the person can be determined to be the target person corresponding to the target face.
In some embodiments, the first feature vector, the second feature vector, and the third feature vector may be respectively matched with the character data of each character in the face database, the character data matched by the three feature vectors may be collected, and the character corresponding to the collected character data may be determined as the target character corresponding to the target face.
In some embodiments, to increase the efficiency of matching, the method described in FIG. 3 may also be used to determine the target persona. Referring to fig. 3, a flow chart of a method for screening target persons according to some embodiments, as shown in fig. 3, may include the following steps:
step S601: and screening out first data matched with the first feature vector from a face database.
In some embodiments, the first feature vector may be matched with a fourth feature vector for each person in the face database. If the matching is successful, determining the character data of the character as first data; if the matching fails, the character data of the character is not determined as the first data.
The method for judging whether the first feature vector and the fourth feature vector are matched is to calculate a first cosine distance between the fourth feature vector and the first feature vector, and if the first cosine distance is smaller than a first preset threshold, then the fourth feature vector is judged to be matched with the first feature. If the first cosine distance is greater than or equal to a first preset threshold value, the fourth feature vector is not matched with the first feature vector.
Step S602: and screening second data matched with the second feature vector from the first data.
In some embodiments, the second feature vector may be matched with a fifth feature vector for each person in the first data. If the matching is successful, determining the character data of the character as second data; if the matching fails, the character data of the character is not determined as the second data.
The method for determining whether the fifth feature vector and the second feature vector are matched is to calculate a second cosine distance between the fifth feature vector and the second feature vector, and if the second cosine distance is smaller than a second preset threshold, it is determined that the fifth feature vector and the second feature vector are matched. If the second cosine distance is greater than or equal to a second preset threshold, the fifth feature vector is not matched with the second feature vector. The first preset threshold value and the second preset threshold value may be the same or different.
Step S603: and screening third data matched with the third feature vector from the second data, and obtaining character data of the target character according to the third data.
In some embodiments, the third feature vector may be matched with a sixth feature vector for each person in the second data. If the matching is successful, determining the character data of the character as third data; if the matching fails, the character data of the character is not determined as the third data. Wherein the third data may also be referred to as target data.
The method for determining whether the sixth feature vector matches the third feature vector is to calculate a third cosine distance between the sixth feature vector and the third feature vector, and if the third cosine distance is smaller than a third preset threshold, determine that the sixth feature vector matches the third feature vector. The first preset threshold, the second preset threshold and the third preset threshold may be the same or different.
In some embodiments, if the third data is data of one person, determining the person corresponding to the third data as the target person, and obtaining person data of the target person from the service processing module, where the person data stored in the service processing module may include data of professions, participations, related news, and the like of a plurality of persons.
In some embodiments, if the third data is data of a plurality of people, a product of the first cosine distance, the second cosine distance, and the third cosine distance of each person and the corresponding weight may be calculated according to a preset weight ratio, and the person with the smallest product is determined as the target person. For example, the preset weight ratio is 0.4:0.4:0.2, the third data is the data of the person a and the data of the person B, and the first cosine distance, the second cosine distance and the third cosine distance corresponding to the person a are sequentially: a1, a2 and a3, the first cosine distance, the second cosine distance and the third cosine distance corresponding to the person B are sequentially as follows: b1, b2, b3. The product sum S1 corresponding to person a is: s1=0.4a1+0.4a2+0.2a3, and the product sum S2 for person B is: s2=0.4b1+0.4b2+0.2b3. If S1 is smaller than S2, determining the person A as a target person; if S1 is greater than S2, person B is determined to be the target person.
As can be seen from fig. 3, when determining the target person corresponding to the target face, coarse screening may be performed in the face database according to the first feature vector, then secondary coarse screening may be performed in the face database according to the second feature vector, and finally fine screening may be performed in the face data according to the third feature vector. The target person is determined through multiple rounds of screening, so that the screening range can be reduced step by step, and the matching efficiency is improved. And the target person is determined through multiple rounds of screening, so that the matching degree of the screened target person with the target face can be ensured, and the accuracy of face recognition is improved.
In some embodiments, the character data for each character in the face database is provided with two feature vectors: one feature vector is a vector for matching with the target fusion feature vector, which may be referred to as a seventh feature vector; the other feature vector is a vector for matching with the third feature vector, and the vector for matching with the third feature vector is a sixth feature vector. The target fusion feature vector is a vector obtained by fusing the first feature vector and the second feature vector.
According to the fact that the character data of each character in the face database is provided with a seventh feature vector and a sixth feature vector, a target character screening method can also refer to fig. 4, and the method comprises the following steps:
step S611: and fusing all the second feature vectors corresponding to the target five sense organs to obtain a second feature fusion vector.
In some embodiments, for an image including a complete face, the second feature vectors corresponding to the target five sense organs output through the second model may be fused. An exemplary method for fusing the second feature vector is as follows: and carrying out contact operation on the data of the plurality of feature vectors in the same dimension to obtain the data of the fused vector in the dimension, wherein the fused vector can be called a second feature fusion vector.
Step S612: and fusing the first feature vector and the second feature fusion vector to obtain a target fusion feature vector.
In some embodiments, for an image that includes a complete face, the vector output through the first model may be merged with the vector output through the second model into one vector. After the feature vectors corresponding to the five sense organs of each target output by the second model are fused into a second feature fusion vector, the second feature fusion vector and the first feature vector can be fused, and the finally obtained vector can be called a target fusion feature vector. And the target fusion feature vector and the third feature vector are used for carrying out face matching on the target face in a database. The method of fusing the second feature fusion vector and the first feature vector may be the same as the method of fusing each second feature vector, and fusion is performed by a contact (connection) operation.
According to the above embodiment, the first feature vector is a 2048-dimensional vector, the number of the first feature vector is 1, the second feature vector is a 2048-dimensional vector, and the number of the second feature vector is 4, and the dimension of the target fusion feature vector obtained by fusing the first feature vector and the second feature vector is still 1024 dimensions, where each dimension includes the data of the first feature vector in the dimension and the data of each second feature vector in the dimension.
In some embodiments, the second feature vector may not be fused to the first feature vector after being fused to the second feature fusion vector, but the first feature vector may be directly fused to all the second feature vectors to obtain the target fusion vector.
Step S613: and screening fourth data matched with the target fusion feature vector from a face database.
In some embodiments, the target fusion feature vector may be matched with a seventh feature vector for each person in the face database. If the matching is successful, determining the character data of the character as fourth data; if the matching fails, the character data of the character is not determined as fourth data.
The method for judging whether the target fusion feature vector is matched with the seventh feature vector is exemplified by calculating a fourth cosine distance between the target fusion feature vector and the seventh feature vector, and if the fourth cosine distance is smaller than a fourth preset threshold, judging that the target fusion feature vector is matched with the seventh feature vector. And if the fourth cosine distance is greater than or equal to the first preset threshold value, judging that the target fusion feature vector is not matched with the seventh feature vector.
Step S614: and screening fifth data matched with the third feature vector from the fourth data, and obtaining character data of the target character according to the fifth data.
In some embodiments, the third feature vector may be matched with a sixth feature vector for each person in the fourth data. If the matching is successful, determining the character data of the character as fifth data; if the matching fails, the character data of the character is not determined as the fifth data. Wherein the fifth data may also be referred to as target data.
The method for determining whether the sixth feature vector matches the third feature vector is to calculate a third cosine distance between the sixth feature vector and the third feature vector, and if the third cosine distance is smaller than a third preset threshold, determine that the sixth feature vector matches the third feature vector. The fourth preset threshold value and the third preset threshold value may be the same or different.
In some embodiments, if the fifth data is data of one person, determining the person corresponding to the fifth data as the target person, and acquiring the person data of the target person from the service processing module.
In some embodiments, if the fifth data is data of a plurality of people, a product sum of a fourth cosine distance and a third cosine distance of each person and a corresponding weight is calculated according to a preset weight ratio, and the person with the smallest product sum is determined as the target person. For example, the preset weight ratio is 0.8:0.2, the fifth data is the data of the person C and the data of the person D, and the fourth cosine distance and the third cosine distance corresponding to the person C are sequentially: c1, c2, the fourth cosine distance and the third cosine distance corresponding to the person D are in turn: d1, d2. The product sum S3 corresponding to person C is: s3=0.8c1+0.2c2, and the product sum S4 for person D is: s4=0.8d1+0.2d2. If S3 is smaller than S4, determining the person C as a target person; if S3 is greater than S4, person D is determined to be the target person.
As can be seen from fig. 4, the number of matching times can be reduced and the matching efficiency can be improved by determining the target person through two rounds of matching.
According to the method shown in fig. 2-4, after the image of the target face is obtained, the image of the target face is segmented to obtain the image of the target five sense organs. Processing the image of the target face through a first deep learning network to obtain a first feature vector, wherein the first feature vector can reflect the global features of the target face; processing the image of the target five sense organs through a second deep learning network to obtain a second feature vector, wherein the second feature vector can reflect the local features of the target five sense organs; processing the target face through a feature extraction network to obtain a third feature vector, wherein the third feature vector can reflect the detail features of the target face; the three feature vectors reflect a large amount of information of the target face, and face matching is carried out in the face database according to the three feature vectors, so that the accuracy of face matching can be improved, and the accuracy of face recognition is improved.
For further description of the face recognition method provided in the above embodiment, fig. 5 shows a timing diagram of a face recognition process.
Referring to fig. 5, the modules of the server participating in the target image processing include a terminal-oriented module, a face recognition module, a face database, and a service processing module, wherein the terminal-oriented module, the face recognition module, the face database, and the service processing module may be disposed on one hardware device, or may be disposed on a plurality of hardware devices, respectively, and functions of these modules or databases may be described as follows.
As shown in fig. 5, in some embodiments, after a user performs a screenshot operation on a display device, the display device performs a screenshot on a current display screen, obtains a target picture obtained by the screenshot, and displays the target picture.
In some embodiments, after receiving the screenshot operation, the display device may further generate a floating layer including an operation control, and after obtaining the target picture, display the floating layer above the target picture. The operation control can comprise a person identification control, an object identification control, a two-dimensional code control and the like, wherein the person identification control is used for identifying characters in a target picture, the object identification control is used for identifying flowers, plants, automobiles and other objects in the target picture, and the two-dimensional code control is used for scanning codes on the intelligent terminal to download the target picture.
For example, as shown in fig. 6, a target picture includes a person, and the face of the person may be referred to as a target face.
In some embodiments, the user may click on a person identification control to effect input of a star identification operation to the display device.
In some embodiments, after receiving a trigger instruction of the person identification control, the display device generates a person identification request containing the target picture, and sends the person identification request to a terminal-oriented module of the server.
In some embodiments, after receiving the person identification request, the terminal-oriented module may send the target image of the person identification request to the face recognition module for processing, or directly forward the person identification request to the face recognition module.
In some embodiments, after the face recognition module obtains the target image, the face detection module may perform face detection on the target image to obtain a face contour, as shown in fig. 7, where the face contour is L1. The image in the face outline is segmented from the target picture, and the image of the target face shown in fig. 8 can be obtained.
In some embodiments, a rectangular frame of the face may be obtained in the face detection process, or an image in the rectangular frame may be directly segmented from the target picture to be used as an image of the target face.
In some embodiments, the face recognition module obtains the image of the target facial feature through face segmentation after obtaining the image of the target face.
Taking a target five sense organ, such as lips as an example, the method for dividing the target five sense organ image by the face recognition module is as follows: referring to fig. 9, a plurality of face key points of lips are identified through face detection, wherein the leftmost face key point is M1, the uppermost face key point is M2, the rightmost face key point is M3, and the lowermost face key point is M4. And according to the key points of the target face, the coordinate area of the target five sense organs can be determined. The coordinate range of the lips is determined to be { M1, M2, M3, M4}. The image of the target five sense organs can be segmented from the target picture according to the coordinate area. The image in the coordinate range is segmented from the face image to obtain a lip image, and the segmented lip image is an image of a target five sense organs, see fig. 10.
In some embodiments, after obtaining the image of the target face and the image of the target five sense organs, the face recognition module obtains a first feature vector corresponding to the image of the target face through the first deep learning network, and obtains a second feature vector corresponding to the image of the target face through the second deep learning network.
In some embodiments, after obtaining the image of the target face, the face recognition module may detect the feature points of the target face through the feature point extraction network, to obtain a third feature vector for describing the feature points.
In some embodiments, after the face recognition module obtains the first feature vector and the second feature vector, the first feature vector and the second feature vector may be fused to obtain the target fusion feature vector, and the fusion method may be described above, which is not described herein.
In some embodiments, after the face recognition module obtains the target fusion feature vector, the face recognition module may screen fourth data matching the target fusion feature vector from the face database, and screen fifth data matching the third feature vector from the fourth data.
In some embodiments, to improve face recognition efficiency, the acquisition of the third feature vector may be initiated in synchronization with the acquisition of the first feature vector or with the acquisition of the second feature vector.
In some embodiments, the face recognition module may determine the target person of the target face based on the fifth data, and then send the person information request of the target person to the service processing module, where the person information request may include a person identification of the target person, where the person identification is stored in the face database, and the fourth data and the fifth data each include the person identification.
In some embodiments, to obtain the personal information of the target person, the terminal-oriented module may generate a personal information request including the personal identifier, and send the request to a service processing module for processing the face recognition service for processing, where the service processing module may search the personal information of the target person according to the personal identifier, and return the personal information of the target person to the terminal-oriented module.
In some embodiments, after the terminal-oriented module obtains the personal information returned by the service processing module, the personal information may be returned to the display device as a graph recognition result, and the display device displays the graph recognition result on the current display screen after receiving the graph recognition result.
According to the above embodiment, the processing flow of the image of the target face shown in fig. 8 may be referred to as fig. 11, which is a schematic flow chart of face recognition according to some embodiments, as shown in fig. 11, after the image of the target face is obtained, a detailed feature may be extracted by extracting a network based on feature points of a feature point extraction algorithm, where the detailed feature may be represented by a third feature vector. After the image of the target face is subjected to the five-sense-organ segmentation, the following target five-sense-organ image can be obtained: left eye brow image, right eye brow image, nose image, and lip image. The image of the target face is subjected to feature extraction through a first model based on a first deep learning algorithm, so that global features of the target face can be extracted, and the global features can be represented through a first feature vector. And extracting the characteristics of the target facial image through a second model based on a second deep learning algorithm, so that local characteristics of the target face can be extracted, and the local characteristics can be represented through a second characteristic vector. After the global features and the local features are subjected to feature fusion, face matching can be performed by utilizing the fused features and the detail features, so that a target person corresponding to the image of the target face is determined.
As can be seen from the above embodiments, in the embodiments of the present application, after obtaining an image of a target face, the image of the target face is segmented to obtain an image of a target five sense organs. Processing the image of the target face through a first deep learning network to obtain a first feature vector, wherein the first feature vector can reflect the global features of the target face; processing the image of the target five sense organs through a second deep learning network to obtain a second feature vector, wherein the second feature vector can reflect the local features of the target five sense organs; processing the target face through a feature extraction network to obtain a third feature vector, wherein the third feature vector can reflect the detail features of the target face; the three feature vectors reflect a large amount of information of the target face, and face matching is carried out in the face database according to the three feature vectors, so that the accuracy of face matching can be improved, and the accuracy of face recognition is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. A server, wherein the server is configured to:
receiving a face recognition request of display equipment, wherein the face recognition request comprises a target picture;
responding to the face recognition request, and carrying out face detection on the target image to obtain an image of a target face;
carrying out face local segmentation on the image of the target face to obtain an image of the target five sense organs;
acquiring a first feature vector corresponding to the image of the target face through a first deep learning network, and acquiring a second feature vector corresponding to the image of the target five sense organs through a second deep learning network;
detecting feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points;
And matching the target person corresponding to the target face in a face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device.
2. The server of claim 1, wherein the samples for training the first deep learning network are images of a complete face and the samples for training the second deep learning network are images of a single five sense organ.
3. The server according to claim 1, wherein the step of performing face partial segmentation on the image of the target face to obtain the image of the target five sense organs includes:
determining a coordinate area of a target five sense organs according to the face key points of the target face;
and dividing the image of the target five sense organs from the target picture according to the coordinate area.
4. The server of claim 1, wherein matching the target person corresponding to the target face in the face database according to the first feature vector, the second feature vector, and the third feature vector comprises:
fusing the second feature vectors corresponding to all the target five sense organs to obtain a second feature fusion vector;
Fusing the first feature vector and the second feature fusion vector to obtain a target fusion feature vector;
screening fourth data matched with the target fusion feature vector from a face database;
and screening fifth data matched with the third feature vector from the fourth data, and obtaining character data of the target character according to the fifth data.
5. The server of claim 4, wherein screening out fourth data in a face database that matches the target fusion feature vector comprises:
calculating a fourth cosine distance between the target fusion feature vector and a seventh feature vector of each person in the face database;
and if the fourth cosine distance is smaller than a fourth preset threshold value, setting fourth data to comprise the data of the person corresponding to the seventh feature vector.
6. The server according to claim 4, wherein obtaining character data of the target character from the fifth data includes:
if the fifth data is the data of one person, determining the person corresponding to the fifth data as a target person;
and if the fifth data is the data of a plurality of characters, calculating the product sum of a fourth cosine distance and a third cosine distance of each character in the fifth data and the corresponding weight respectively, and determining the character with the smallest product sum as a target character, wherein the third cosine distance is the cosine distance between a sixth feature vector and a third feature vector of each character in the fifth data.
7. The server of claim 1, wherein matching the target person corresponding to the target face in the face database according to the first feature vector, the second feature vector, and the third feature vector comprises:
screening first data matched with the first feature vector from a face database;
screening second data matched with the second feature vector from the first data;
and screening third data matched with the third feature vector from the second data, and obtaining character data of the target character according to the third data.
8. The server according to claim 1, wherein detecting feature points of the face image through a feature point extraction network, to obtain a third feature vector for describing the feature points, includes:
and detecting the characteristic points of the face image through a characteristic point extraction network based on a surf algorithm to obtain a third characteristic vector for describing the characteristic points, wherein the characteristic points comprise surf characteristic points.
9. A face recognition method, comprising:
receiving a face recognition request of display equipment, wherein the face recognition request comprises a target picture;
responding to the face recognition request, and carrying out face detection on the target image to obtain an image of a target face;
Carrying out face local segmentation on the image of the target face to obtain an image of the target five sense organs;
acquiring a first feature vector corresponding to the image of the target face through a first deep learning network, and acquiring a second feature vector corresponding to the image of the target five sense organs through a second deep learning network;
detecting feature points of the face image through a feature point extraction network to obtain a third feature vector for describing the feature points;
and matching the target person corresponding to the target face in a face database according to the first feature vector, the second feature vector and the third feature vector, and feeding back the person data of the target person to the display device.
10. The face recognition method of claim 9, wherein the sample for training the first deep learning network is an image of a complete face and the sample for training the second deep learning network is an image of a single five sense organ.
CN202210060530.XA 2022-01-19 2022-01-19 Server and face recognition method Pending CN116524550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210060530.XA CN116524550A (en) 2022-01-19 2022-01-19 Server and face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210060530.XA CN116524550A (en) 2022-01-19 2022-01-19 Server and face recognition method

Publications (1)

Publication Number Publication Date
CN116524550A true CN116524550A (en) 2023-08-01

Family

ID=87401647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210060530.XA Pending CN116524550A (en) 2022-01-19 2022-01-19 Server and face recognition method

Country Status (1)

Country Link
CN (1) CN116524550A (en)

Similar Documents

Publication Publication Date Title
WO2021238631A1 (en) Article information display method, apparatus and device and readable storage medium
US10621991B2 (en) Joint neural network for speaker recognition
EP3876140B1 (en) Method and apparatus for recognizing postures of multiple persons, electronic device, and storage medium
CN110519636B (en) Voice information playing method and device, computer equipment and storage medium
CN106682632B (en) Method and device for processing face image
CN110741377A (en) Face image processing method and device, storage medium and electronic equipment
KR101872635B1 (en) Automatic make-up evaluation system and operating method thereof
WO2021213067A1 (en) Object display method and apparatus, device and storage medium
CN110633669B (en) Mobile terminal face attribute identification method based on deep learning in home environment
CN108363973B (en) Unconstrained 3D expression migration method
CN111556278A (en) Video processing method, video display device and storage medium
CN102467661A (en) Multimedia device and method for controlling the same
CN107766403B (en) Photo album processing method, mobile terminal and computer readable storage medium
WO2023040679A1 (en) Fusion method and apparatus for facial images, and device and storage medium
CN111428689B (en) Face image feature extraction method based on multi-pool information fusion
KR20180054407A (en) Apparatus for recognizing user emotion and method thereof, and robot system using the same
KR20210076373A (en) Mediating Apparatus, Method and Computer Readable Recording Medium Thereof
CN109272473B (en) Image processing method and mobile terminal
KR20120120858A (en) Service and method for video call, server and terminal thereof
CN111429338A (en) Method, apparatus, device and computer-readable storage medium for processing video
CN110516598B (en) Method and apparatus for generating image
CN111724199A (en) Intelligent community advertisement accurate delivery method and device based on pedestrian active perception
CN108174141A (en) A kind of method of video communication and a kind of mobile device
CN111881740A (en) Face recognition method, face recognition device, electronic equipment and medium
CN113591562A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination