CN112101280A - Face image recognition method and device - Google Patents

Face image recognition method and device Download PDF

Info

Publication number
CN112101280A
CN112101280A CN202011021820.0A CN202011021820A CN112101280A CN 112101280 A CN112101280 A CN 112101280A CN 202011021820 A CN202011021820 A CN 202011021820A CN 112101280 A CN112101280 A CN 112101280A
Authority
CN
China
Prior art keywords
face image
information
neural network
layer
feature recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011021820.0A
Other languages
Chinese (zh)
Inventor
王珂尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011021820.0A priority Critical patent/CN112101280A/en
Publication of CN112101280A publication Critical patent/CN112101280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a face image recognition method, a face image recognition device, electronic equipment and a computer readable storage medium, relates to the fields of artificial intelligence, computer vision and deep learning, and can be used for face recognition. The specific implementation scheme is as follows: after a preprocessed face image to be recognized is obtained, inputting the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer, so that the accuracy of the result of performing feature recognition on the face image can be improved.

Description

Face image recognition method and device
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to the field of computer vision technology and deep learning technology, and more particularly, to a method and an apparatus for recognizing a human face image, an electronic device, and a computer-readable storage medium.
Background
Currently, in order to make a user service policy better, it is often necessary to know the feature information of a user to know the feature state of the user.
In the prior art, a neural network or a model is usually used for identifying facial expressions of a user, a traditional method or a single-model convolutional neural network is generally used for identifying the facial expressions, facial expression images of the user are used as input, expression features are extracted through the convolutional neural network or manually, and then the expression features are output through a classifier to obtain expression identification classification results.
Disclosure of Invention
The application provides a face image recognition method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present application provides a method for recognizing a face image, which obtains a preprocessed face image to be recognized; inputting the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the feature recognition information.
In a second aspect, an embodiment of the present application provides an apparatus for recognizing a face image, including: the image acquisition unit is configured to acquire a preprocessed face image to be recognized; the characteristic recognition unit is configured to input the face image to be recognized into a characteristic recognition neural network to obtain characteristic recognition information output by the characteristic recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and the recognition result generating unit is configured to generate a feature recognition result of the face image to be recognized based on the feature recognition information.
In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method for recognizing the face image described in any implementation manner.
In a fourth aspect, embodiments of the present application provide a non-transitory computer readable storage medium having computer instructions stored thereon, comprising: the computer instructions are used for causing the computer to execute the method for recognizing the face image described in any implementation mode.
After a preprocessed face image to be recognized is obtained, the face image to be recognized is input into a feature recognition neural network, and feature recognition information output by the feature recognition neural network is obtained; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer, so that the accuracy of the result of performing feature recognition on the face image can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture to which embodiments of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method for identifying face images according to the present application;
FIG. 3 is a flowchart of an embodiment of obtaining a preprocessed face image to be recognized according to the face image recognition method of the present application;
FIG. 4 is a flow chart of one embodiment of determining a synthetic image recognition neural network in the face image recognition method according to the present application;
fig. 5 is a schematic structural diagram of an embodiment of a face image recognition apparatus according to the present application;
fig. 6 is a block diagram of an electronic device suitable for implementing the face image recognition method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the face image recognition method, apparatus, electronic device, and computer-readable storage medium of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 for the purpose of transmitting a face image to be recognized, and the like. Various applications related to face feature recognition, such as emotion analysis applications, social applications, image recognition applications, and the like, may be installed on the terminal devices 101, 102, and 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. Hardware, various electronic devices with display screens are possible, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as a plurality of software or software modules (for example, sending a face image to be recognized, etc.), or may be implemented as a single software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a server that provides recognition of face images for the terminal apparatuses 101, 102, 103. For example: after a preprocessed face image to be recognized is obtained, inputting the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer.
It should be noted that the method for recognizing a face image provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the device for recognizing a face image is generally disposed in the server 105.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.
Furthermore, the method for recognizing the face image may be executed by the terminal devices 101, 102, and 103, and accordingly, the device for recognizing the face image may be provided in the terminal devices 101, 102, and 103. At this point, the exemplary system architecture 100 may also not include the server 105 and the network 104.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for identifying facial images according to the present application is shown. The method for identifying the face image comprises the following steps:
step 201, obtaining a preprocessed face image to be recognized.
In this embodiment, an execution subject (for example, the server 105 shown in fig. 1) of the facial image recognition method may acquire a facial image to be recognized from a local or non-local human-computer interaction device (for example, the terminal devices 101, 102, and 103 shown in fig. 1), which is not limited in this application.
It should be understood that after the image is acquired, the executing subject may perform preprocessing to obtain the preprocessed face image to be recognized, or may directly acquire the preprocessed face image to be recognized, which is obtained after the preprocessing is completed by another executing subject, from a local or non-local storage device.
In some optional implementations of the present embodiment, the preprocessed face image to be recognized is determined through the following steps: acquiring an original image, and determining a face image area in the original image by adopting a face detection model; adding first key point mark information to the face image area by using a key point marking model; adjusting the face image in the face image area based on the coordinate information of the key point to obtain an adjusted face image area; adding second key point mark information for the adjusted face image area again by using the key point mark injection mold; and removing the non-face image part based on the second key point information to obtain the preprocessed face image to be recognized.
Specifically, referring to fig. 3, a flow 300 of an implementation manner of a determination step of a pre-processed face image to be recognized in a face image recognition method is shown, and specifically includes:
step 301, obtaining an original image, and determining a face image area in the original image by using a face detection model.
Specifically, after the original image is acquired, a face detection model is used to determine an area containing a face in the original image, and the face detection model may be a residual error network detection model, a semantic segmentation detection model, an vgg detection model, or other face detection models.
Step 302, adding first key point mark information to the face image area by using the key point mark injection model.
Specifically, according to the purpose of feature recognition, a preset number of key points may be analyzed and determined based on the face image in the historical data, and then first key point mark information is added to the face region in the original image determined in step 201 through a key point marking Model, for example, an assembly module (ASM), an Active Appearance Model (AAM), or a Cascaded Shape regression Model (CPR).
For example, when it is desired to acquire emotional features of a face image to be recognized, the face emotion expressed in the face image can be classified into 7 categories of basic emotions: anger (Angry), Disgust (dispust), Fear (Fear), happy (happiensess), Sadness (Sadness), Surprise (surprie) and Neutral (Neutral), and then according to facial muscle changes generated when the characters in the images present the above expressions in the facial images in the historical data, the number of preset determined key points is determined to be 72, so that the facial expressions in the facial images to be recognized can be analyzed according to the 72 key points subsequently.
Step 303, adjusting the face image in the face image region based on the first key point mark information to obtain an adjusted face image region.
Specifically, based on the mark information of the first key point determined in the step 302, a coordinate system is established, a corresponding coordinate value is added to each first key point, and then the face image in the face region is adjusted according to the coordinate value of each first key point, so that the face image approaches to a forward state, and the face features are obtained as accurately as possible.
After the face alignment operation is performed, the adjusted face region can be separately extracted, and the face region is enlarged to the size of the original image, so that the features in the face region can be identified more clearly in the following process.
And step 304, adding second key point mark information to the adjusted face image area again by using the key point mark injection model.
Specifically, the key point labeling model used in step 302 is used again to re-label the face region adjusted based on the face image, and second key point labeling information is added.
It should be understood that after the face region is marked again by using the above-mentioned key point labeling model, new coordinates are generated for the second key point information based on the coordinate system determined in step 303.
And 305, removing the non-face image part based on the second key point information to obtain a preprocessed face image to be recognized.
Specifically, a second key point corresponding to an outer contour corresponding to a face image part in the face image area is determined, that is, the contour of the face image in the face image area is determined, and the content of the part except the face image is deleted based on the determined coordinates of the second key point information, so that the preprocessed face image to be recognized is obtained.
In the implementation mode, the acquired original image is preprocessed based on the face detection model and the key point mark model, so that an image of a face area containing key points for face recognition features in the original image is extracted and used as a subsequent face image to be recognized, and the accuracy of recognition by using the face image to be recognized is improved.
Furthermore, in some embodiments, in the step 305, the method may further include: after removing the non-face image part based on the second key point information, carrying out normalization processing on the obtained face image area; and carrying out random data enhancement processing on the normalized human face image area to obtain the human face image to be recognized.
Specifically, the image normalization processing is to sequentially perform normalization processing on each pixel in the image, so that each pixel in the image can be smoothly identified, the phenomenon that the final detection effect is influenced due to pixel missing identification caused by the pixel value difference of the pixels is avoided, then random data enhancement processing is performed on the result obtained after normalization, a plurality of training materials are obtained according to a single image, and the quality of model training is improved.
Illustratively, the normalization process is to subtract 128 from the pixel value of each pixel and divide the subtracted value by 256 to make the pixel value of each pixel between [ -0.5,0.5], so as to avoid the influence on the subsequent detection due to too large difference in pixel value between images.
Step 202, inputting the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network.
In this embodiment, the preprocessed face image to be recognized obtained in step 201 is input into a feature recognition neural network, the synthesized feature recognition neural network includes an additional layer, and the face image to be recognized is further analyzed based on the additional layer included in the synthesized feature recognition neural network to obtain feature information of the face image to be detected, so as to subsequently determine the feature information included in the face image to be detected, where a specific expression form of the feature information is related to the content of the additional layer.
For example, feature matrix information, feature vector information, feature value information, and the like of the face image to be detected, which are obtained based on different addition layers, are used for achieving the purpose of recognizing different face images subsequently according to the obtained feature information.
Wherein, the addition layer can include for realizing the addition convolution layer of various functions, add the biggest pooling layer, add full connection layer, add central entropy supervisory layer etc. to the realization expands the content in the neural network of feature extraction according to the content in the addition layer, with the function that corresponds in the independent realization addition layer or, jointly use the function in addition layer and the original feature extraction neural network, in order to realize optimizing and promoting the function in the original feature extraction neural network.
Illustratively, when it is desired to acquire emotional features of a face image to be recognized, for example, basic emotions classified into 7 categories with predetermined face emotions are finally obtained: anger (Angry), Disgust (dispust), Fear (Fear), happy (happienses), Sadness (Sadness), Surprise (surrise) and Neutral (Neutral), and related emotional feature vectors, so as to identify and classify the emotional features of the face image according to the obtained emotional feature vectors.
The feature recognition neural network is usually a convolutional neural network that can extract features from an image, for example, a LeNet neural network, an AlexNet neural network, a vgg (visual Geometry Group network) neural network, and the like.
Preferably, a VGG neural network is adopted, in which small convolution kernels of 1 × 1 and 3 × 3 and a small pooling kernel of a pooling kernel 2x2 are used, because the convolution kernels focus on expanding the number of channels and pooling focuses on narrowing down the width and height, so that the model architecture of the VGG neural network is deeper and wider, the number of layers increased and slowed down by the calculated amount is deeper and wider, and because three full connections of a training phase are replaced by three convolutions in a network testing phase of the VGG neural network, the test reuses parameters during the training, so that the tested full convolution network has no limitation of full connections, and thus can receive an input of any width or height.
Step 203, generating a feature recognition result of the face image to be recognized based on the feature recognition information.
In this embodiment, the recognition result of the facial image to be recognized is determined specifically in combination with the form of the feature recognition information obtained in step 202, for example, the image recognition information obtained in step 202 is a feature vector, and the feature vector is subsequently used to obtain the recognition result of the facial image to be recognized.
The method for recognizing the face image comprises the steps of firstly obtaining a preprocessed face image to be recognized, inputting the face image to be recognized into a feature recognition neural network, and obtaining feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer, so that the accuracy of the result of performing feature recognition on the face image can be improved.
With continued reference to FIG. 4, a flowchart 400 of one embodiment of generating a feature recognition neural network in the face image recognition method according to the present application is shown. The method specifically comprises the following steps:
step 401, a feature extraction neural network is obtained.
In this embodiment, the executing entity (for example, the server 105 shown in fig. 1) may obtain the feature extraction neural network from a local or non-local human-computer interaction device (for example, the terminal devices 101, 102, 103 shown in fig. 1), which is not limited in this application.
Step 402, obtaining structural information of the feature extraction neural network.
In this embodiment, the structure of the feature extraction neural network is analyzed according to the functions of each part in the feature extraction neural network, so as to obtain the structure information of the feature extraction neural network, where the structure information at least includes the position information of the convolutional layers of the feature extraction neural network, so as to indicate the number of convolutional layers included in the feature extraction neural network, the connection relationship between the convolutional layers, and the position relationship of each convolutional layer, thereby facilitating the subsequent generation of the feature identification neural network based on the position information of the convolutional layers.
In step 403, in response to receiving a generation request for the feature recognition neural network, structure information and additional layer information in the generation request are obtained.
In this embodiment, when a generation request for a feature recognition neural network is received, that is, when a request for generating the feature recognition neural network based on the feature extraction neural network is received, the structural information of the feature extraction neural network used for generating the feature recognition neural network is obtained, and the corresponding addition layer in the generation request is obtained, where the addition layer at least includes the number information and the connection structure information of the central entropy monitoring layers, so that the central entropy monitoring layers are correspondingly added to the feature extraction neural network according to the number information and the connection structure information of the central entropy monitoring layers.
Wherein, the central entropy supervision layer comprises a central loss function (center loss):
Figure BDA0002700874910000091
wherein, cyiDenotes the y thiCenter of features of individual classes, xiRepresenting the characteristics before the fully connected layer and m representing the number of samples of processed data in the current batch. Through the center loss function, the feature difference of each feature category can be increased, a category center is provided for each category, and the distance between each sample in each batch of data and the corresponding category center is minimized, so that the purpose of reducing the intra-category distance can be achieved.
In some optional implementations of this embodiment, adding the layer includes: a central entropy supervision layer connected separately after the last convolutional layer of the feature extraction neural network.
Specifically, a central entropy monitoring layer is connected to the last convolution layer of the feature extraction neural network, so that the extracted features can be better clustered through the central entropy monitoring layer, the distance between each sample and the corresponding sample center in the processing process is shortened, and the intra-class distance is shortened.
And step 404, adjusting the feature extraction neural network based on the structural information and the added layer information to generate a feature recognition neural network.
In this embodiment, after the execution main body acquires the structural information and the added layer information of the feature extraction neural network, the execution main body adds the added layer to the corresponding structure of the image segmentation neural network based on the indication in the added layer information and the structural information of the feature extraction neural network, so as to obtain the feature recognition neural network.
In the generation method of the feature recognition neural network provided by this embodiment, after the feature extraction neural network is obtained, the structural information of the feature extraction neural network is added into the central entropy monitoring layer to generate the feature recognition neural network, and the feature recognition neural network has a high convergence rate in training, and can complete training only by relying on a small amount of training materials, so that the training efficiency of the neural network is improved, and in the subsequent process of realizing feature recognition by using the feature recognition neural network, the recognition generalization applicability and the recognition quality are improved.
In some optional implementation manners of this embodiment, the structure information further includes: and extracting the position information of the full connection layer of the neural network by the characteristics.
Specifically, the position information of the full connection layer of the feature extraction neural network can be acquired, so that an addition layer connected with the full connection layer is added according to the position information of the full connection layer, and the quality of the obtained feature identification neural network is further improved.
In some optional implementations of this embodiment, the adding layer information further includes: and the cross entropy loss layer is separately connected behind the full connection layer of the characteristic extraction neural network.
Specifically, after the full-connection layer of the feature extraction neural network is obtained, a cross entropy loss layer can be added behind the full-connection layer, wherein the meaning of cross entropy is the difficulty of text recognition by using the model, or each word is coded by using several bits on average from the compression point of view. The meaning of complexity is the number of branches that represent this text average with the model, whose inverse can be considered as the average probability of each word. Smoothing means that a probability value is given to an unobserved N-tuple combination to ensure that a word sequence can always obtain a probability value through a language model, and therefore, a cross entropy loss layer can be determined based on a cross entropy loss function to realize data aggregation and obtain a final feature classification result.
In order to deepen understanding, the application also provides a specific implementation scheme by combining a specific application scene. In the specific application scenario, in order to identify emotional features in a face image, a VGG11 neural network is used as a feature extraction neural network to generate a feature identification neural network, and the adding layers comprise a central entropy monitoring layer and a cross entropy loss layer, wherein the central entropy monitoring layer and the cross entropy loss layer are sequentially connected to the last convolutional layer of the VGG11 neural network, and the cross entropy loss layer is connected to the back of a full connection layer of the VGG11 neural network.
Obtaining an original image A with the size of 128x128, determining a face area in the original image by adopting a face detection model, and dividing facial expressions into 7 types of basic expressions, such as anger, disgust, fear, happiness, sadness, surprise and middle warmer, according to changes of facial muscles based on the face image in historical dataAnd (4) sex. Defining the face to contain 72 key points, and then adding first key point information to the face area by adopting a key point mark injection mold, wherein the first key point information is (x) respectively1,y1)…(x72,y72)。
Determining a coordinate system based on the first key point information, generating coordinate information corresponding to the first key point information, aligning the face of the face image in the face region according to the coordinate values of all key points in the first key point information, intercepting the face region through affine transformation, and adjusting the face region to the size which is the same as the size 128x128 of the original image.
And adding second key point mark information to the adjusted face image area again by using the key point mark injection mold, determining the coordinates of the second key point mark information according to the coordinate system determined based on the first key point information, determining the outline of the face image in the face image area based on the coordinate information of the second key point information, and deleting the contents except the face image based on the determined coordinates of the second key point information.
Then subtracting 128 from the pixel value of each pixel of the content of the face image part, and dividing the pixel value by 256 to make the pixel value of each pixel between [ -0.5,0.5], and then performing random data enhancement processing on the obtained face image area to obtain a face image A1 to be recognized.
And acquiring a characteristic extraction neural network as a VGG11 neural network, analyzing the structural information of the VGG11 neural network, and acquiring the position information of the last convolutional layer of the VGG11 neural network and the position information of the full-link layer.
And receiving a generation request of the feature recognition neural network, acquiring the information of the added layers as a central entropy monitoring layer which is connected to the back of the last convolutional layer of the VGG1 neural network and a cross entropy loss layer which is connected to the back of the full-connection layer of the VGG1 neural network, adding a central entropy monitoring layer to the back of the last convolutional layer of the VGG11 neural network and adding a cross entropy loss layer to the back of the full-connection layer of the VGG11 neural network based on the information of the added layers and the structural information of the VGG11 neural network, and acquiring a feature recognition neural network B.
And inputting the image A1 to be recognized into a feature recognition neural network B, and finally obtaining an emotional feature recognition result of the face image.
According to the application scenario, the face image recognition method provided by the embodiment of the application obtains the preprocessed face image to be recognized, inputs the face image to be recognized into the feature recognition neural network, and obtains feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer, so that the accuracy of the result of performing feature recognition on the face image can be improved.
As shown in fig. 5, the apparatus 500 for recognizing a face image according to the present embodiment may include: an image obtaining unit 501 configured to obtain a preprocessed face image to be recognized; a feature recognition unit 502 configured to input the facial image to be recognized into a feature recognition neural network, so as to obtain feature recognition information output by the feature recognition neural network; wherein, the characteristic recognition neural network comprises an addition layer; a recognition result generating unit 503 configured to generate a feature recognition result of the face image to be recognized based on the feature recognition information.
In some optional implementations of this embodiment, the above apparatus for recognizing a face image further includes: a feature extraction neural network acquisition unit configured to acquire a feature extraction neural network; a feature extraction neural network structure analysis unit configured to acquire structural information of the feature extraction neural network; wherein, the structure information at least comprises the position information of the convolution layer; an added layer information acquisition unit configured to acquire the structural information and added layer information in a generation request in response to receiving the generation request for the feature recognition neural network; the information of the added layer at least comprises the quantity information and the connection structure information of the central entropy monitoring layer; and the feature recognition network generating unit is configured to adjust the feature extraction generating network to generate the feature recognition neural network based on the structural information and the added layer information.
In some optional implementations of this embodiment, the adding layer information in the adding layer information obtaining unit includes: a central entropy monitor layer separately connected after the last convolutional layer of the feature extraction neural network.
In some optional implementations of this embodiment, the structural information in the feature extraction neural network structure analysis unit further includes: the features extract full connectivity layer location information for the neural network.
In some optional implementation manners of this embodiment, the adding layer information in the adding layer information obtaining unit further includes: a cross-entropy loss layer separately connected after the fully-connected layer of the feature extraction neural network.
In some optional implementations of this embodiment, the apparatus for recognizing a face image further includes: an image preprocessing unit 504 configured to acquire an original image, and determine a face image region in the original image by using a face detection model; adding first key point mark information to the face image area by using a key point marking model; adjusting the face image in the face image area based on the first key point mark information to obtain an adjusted face image area; adding second key point mark information for the adjusted face image area again by using the key point mark injection mold; and removing the non-face image part based on the second key point information to obtain the preprocessed face image to be recognized.
In some optional implementation manners of this embodiment, the removing, in the image preprocessing unit 504, the non-face image part based on the second keypoint information, and obtaining the preprocessed face image to be recognized further includes: after removing the non-face image part based on the second key point information, carrying out normalization processing on the obtained face image area; and carrying out random data enhancement processing on the normalized human face image area to obtain the human face image to be recognized.
The present embodiment exists as an apparatus embodiment corresponding to the above method embodiment, and the same contents refer to the description of the above method embodiment, which is not repeated herein. Through the face image recognition device provided by the embodiment of the application, the face image is subjected to feature recognition through the feature recognition neural network comprising the addition layer, and the accuracy of the result of the feature recognition of the face image can be improved.
As shown in fig. 6, the embodiment of the present application is a block diagram of an electronic device of a face image recognition method. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the method for recognizing the face image provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method for recognizing a face image provided by the present application.
The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the recognition method of human face features in the embodiments of the present application (for example, the image acquisition unit 501, the feature recognition unit 502, the recognition result generation unit 503, and the image preprocessing unit 504 shown in fig. 5). The processor 601 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the method for recognizing a face image in the above-described method embodiment.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for recognition of a face image, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory located remotely from the processor 601, which may be connected to the facial image recognition electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for performing the recognition method of the face image may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for recognition of a human face image, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, after the preprocessed face image to be recognized is obtained, the face image to be recognized is input into a feature recognition neural network, and feature recognition information output by the feature recognition neural network is obtained; wherein, the characteristic recognition neural network comprises an addition layer; and generating a feature recognition result of the face image to be recognized based on the obtained feature recognition information, and performing feature recognition on the face image through a feature recognition neural network comprising an addition layer, so that the accuracy of the result of performing feature recognition on the face image can be improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. A face image recognition method comprises the following steps:
acquiring a preprocessed face image to be recognized;
inputting the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network; wherein the feature recognition neural network comprises an addition layer;
and generating a feature recognition result of the face image to be recognized based on the feature recognition information.
2. The method of claim 1, wherein the generating of the feature recognition neural network comprises:
acquiring a feature extraction neural network;
acquiring structural information of the feature extraction neural network; wherein the structure information at least comprises convolutional layer position information;
in response to receiving a generation request for a feature recognition neural network, acquiring the structural information and added layer information in the generation request; wherein the adding layer information at least comprises the number information and the connection structure information of the central entropy monitoring layer;
and adjusting the feature extraction generation network based on the structural information and the added layer information to generate the feature recognition neural network.
3. The method of claim 2, wherein the adding layer information comprises:
a central entropy supervision layer separately connected after the last convolutional layer of the feature extraction neural network.
4. The method according to any one of claims 1-3, wherein the structure information further includes:
the features extract full connectivity layer location information of the neural network.
5. The method of claim 4, wherein the adding layer information further comprises:
a cross-entropy loss layer separately connected after the full-connectivity layer of the feature extraction neural network.
6. The method of claim 1, wherein the preprocessed face image to be recognized is determined by:
acquiring an original image, and determining a face image area in the original image by adopting a face detection model;
adding first key point mark information to the face image area by using a key point marking model;
adjusting the face image in the face image area based on the first key point mark information to obtain an adjusted face image area;
adding second key point mark information to the adjusted face image area again by using the key point mark injection model;
and removing a non-face image part based on the second key point information to obtain the preprocessed face image to be recognized.
7. The method of claim 6, wherein the removing of the non-face image based on the second keypoint information to obtain the preprocessed face image to be recognized further comprises:
after removing the non-face image part based on the second key point information, carrying out normalization processing on the obtained face image area;
and carrying out random data enhancement processing on the normalized face image area to obtain the face image to be recognized.
8. An apparatus for recognizing a face image, comprising:
the image acquisition unit is configured to acquire a preprocessed face image to be recognized;
the feature recognition unit is configured to input the face image to be recognized into a feature recognition neural network to obtain feature recognition information output by the feature recognition neural network; wherein the feature recognition neural network comprises an addition layer;
and the recognition result generation unit is configured to generate a feature recognition result of the face image to be recognized based on the feature recognition information.
9. The apparatus of claim 8, further comprising:
a feature extraction neural network acquisition unit configured to acquire a feature extraction neural network;
a feature extraction neural network structure analysis unit configured to acquire structural information of the feature extraction neural network; wherein the structure information at least comprises convolutional layer position information;
an added layer information acquisition unit configured to acquire the structural information and added layer information in a generation request in response to receiving the generation request for a feature recognition neural network; wherein the adding layer information at least comprises the number information and the connection structure information of the central entropy monitoring layer;
a feature recognition network generation unit configured to adjust the feature extraction generation network based on the structural information and the added layer information, and generate the feature recognition neural network.
10. The apparatus of claim 8, wherein the additional layer information in the additional layer information obtaining unit comprises:
a central entropy supervision layer separately connected after the last convolutional layer of the feature extraction neural network.
11. The apparatus according to any one of claims 8-10, wherein the structural information in the feature extraction neural network structural analysis unit further comprises:
the features extract full connectivity layer location information of the neural network.
12. The apparatus of claim 11, wherein the adding layer information obtaining unit further includes:
a cross-entropy loss layer separately connected after the full-connectivity layer of the feature extraction neural network.
13. The apparatus of claim 8, further comprising:
the image preprocessing unit is configured to acquire an original image and determine a face image area in the original image by adopting a face detection model;
adding first key point mark information to the face image area by using a key point marking model;
adjusting the face image in the face image area based on the first key point mark information to obtain an adjusted face image area;
adding second key point mark information to the adjusted face image area again by using the key point mark injection model;
and removing a non-face image part based on the second key point information to obtain the preprocessed face image to be recognized.
14. The apparatus according to claim 13, wherein the removing, in the image preprocessing unit, the non-face image portion based on the second keypoint information to obtain the preprocessed face image to be recognized further comprises:
after removing the non-face image part based on the second key point information, carrying out normalization processing on the obtained face image area;
and carrying out random data enhancement processing on the normalized face image area to obtain the face image to be recognized.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of recognizing a face image according to any one of claims 1 to 7.
16. A non-transitory computer readable storage medium storing computer instructions, comprising: the computer instructions are for causing the computer to perform the method of recognizing a face image according to any one of claims 1 to 7.
CN202011021820.0A 2020-09-25 2020-09-25 Face image recognition method and device Pending CN112101280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011021820.0A CN112101280A (en) 2020-09-25 2020-09-25 Face image recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011021820.0A CN112101280A (en) 2020-09-25 2020-09-25 Face image recognition method and device

Publications (1)

Publication Number Publication Date
CN112101280A true CN112101280A (en) 2020-12-18

Family

ID=73755288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011021820.0A Pending CN112101280A (en) 2020-09-25 2020-09-25 Face image recognition method and device

Country Status (1)

Country Link
CN (1) CN112101280A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996017309A1 (en) * 1994-11-29 1996-06-06 The Salk Institute For Biological Studies Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy
CN109712144A (en) * 2018-10-29 2019-05-03 百度在线网络技术(北京)有限公司 Processing method, training method, equipment and the storage medium of face-image
CN110135318A (en) * 2019-05-08 2019-08-16 佳都新太科技股份有限公司 Cross determination method, apparatus, equipment and the storage medium of vehicle record

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996017309A1 (en) * 1994-11-29 1996-06-06 The Salk Institute For Biological Studies Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy
CN109712144A (en) * 2018-10-29 2019-05-03 百度在线网络技术(北京)有限公司 Processing method, training method, equipment and the storage medium of face-image
CN110135318A (en) * 2019-05-08 2019-08-16 佳都新太科技股份有限公司 Cross determination method, apparatus, equipment and the storage medium of vehicle record

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONG LIU; WEI SHI; WEIPENG HUANG; QIAO GUAN: "A Discriminatively Learned Feature Embedding Based on Multi-Loss Fusion For Person Search", IEEE, 13 September 2018 (2018-09-13) *
王珂尧: "基于深度学习的视频人脸表情识别", 《中国知网 信息科技辑》, no. 2020 *

Similar Documents

Publication Publication Date Title
CN111652828B (en) Face image generation method, device, equipment and medium
US20190392587A1 (en) System for predicting articulated object feature location
CN110991427B (en) Emotion recognition method and device for video and computer equipment
CN111582185B (en) Method and device for recognizing images
CN111738910A (en) Image processing method and device, electronic equipment and storage medium
CN111783620A (en) Expression recognition method, device, equipment and storage medium
CN111709873B (en) Training method and device for image conversion model generator
CN111860362A (en) Method and device for generating human face image correction model and correcting human face image
CN111611990B (en) Method and device for identifying tables in images
CN111709875B (en) Image processing method, device, electronic equipment and storage medium
CN112507090B (en) Method, apparatus, device and storage medium for outputting information
CN112241715A (en) Model training method, expression recognition method, device, equipment and storage medium
CN111539897A (en) Method and apparatus for generating image conversion model
CN112149634A (en) Training method, device and equipment of image generator and storage medium
US20230036338A1 (en) Method and apparatus for generating image restoration model, medium and program product
CN112561053A (en) Image processing method, training method and device of pre-training model and electronic equipment
Krishnan et al. Detection of alphabets for machine translation of sign language using deep neural net
CN113177449A (en) Face recognition method and device, computer equipment and storage medium
CN111862031A (en) Face synthetic image detection method and device, electronic equipment and storage medium
CN111523467A (en) Face tracking method and device
CN110738261B (en) Image classification and model training method and device, electronic equipment and storage medium
Gheitasi et al. Estimation of hand skeletal postures by using deep convolutional neural networks
CN111275110B (en) Image description method, device, electronic equipment and storage medium
CN112560854A (en) Method, apparatus, device and storage medium for processing image
CN112381927A (en) Image generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination