CN111310705A - Image recognition method and device, computer equipment and storage medium - Google Patents

Image recognition method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111310705A
CN111310705A CN202010127177.3A CN202010127177A CN111310705A CN 111310705 A CN111310705 A CN 111310705A CN 202010127177 A CN202010127177 A CN 202010127177A CN 111310705 A CN111310705 A CN 111310705A
Authority
CN
China
Prior art keywords
image
predicted
face
module
positioning data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010127177.3A
Other languages
Chinese (zh)
Inventor
胡艺飞
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010127177.3A priority Critical patent/CN111310705A/en
Publication of CN111310705A publication Critical patent/CN111310705A/en
Priority to PCT/CN2021/071172 priority patent/WO2021169637A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Geometry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image recognition method, an image recognition device, computer equipment and a storage medium, and belongs to the field of face recognition. The human face detection is carried out on the acquired image to be detected to obtain the human face image and the positioning data of the human face image, the human face image is corrected according to the positioning data to obtain the image to be predicted for image recognition, the calculated amount is reduced, the image to be predicted is recognized by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the recognition speed is high, the consumed time is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.

Description

Image recognition method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of face recognition, and in particular, to an image recognition method, an image recognition apparatus, a computer device, and a storage medium.
Background
The emotion analysis is rapidly developed along with the rise of network social media (such as comments, forums, blogs and microblogs), and the expression viewpoint, emotion, evaluation, attitude, emotion and tendency can be analyzed through the emotion analysis of people. Since a person may cause changes in physiological parameters (e.g., skin current, heart beat, blood pressure, respiration, brain wave, voice, line of sight, etc.) when the person is psychologically changed, the emotional changes of the analyzed person can be evaluated by detecting the changes. Considering that emotion analysis is generally performed in a non-contact and convenient-to-acquire and process scene, a technology for analyzing emotion changes of an evaluated person by using an image recognition technology is more and more popular with the continuous development of a face recognition technology.
The existing image recognition systems are mainly classified into two types, one type is that an infrared camera is used for collecting images for image recognition, for example: eye Tracking systems (Tobi Eye Tracking) from alien computers (alien); the other type is image recognition by using a monocular camera to acquire images. The defects of the image recognition technology by adopting an infrared camera to collect images mainly comprise: the equipment is expensive to be equipped, and each user needs to be calibrated in advance, so that the equipment cannot be used in a scene of eye analysis of unspecified users at a bank outlet and the like. The method for identifying the image acquired by the monocular camera comprises the following steps: the human face is detected, the rotation angle of the head of the human body is estimated, 68 key points are identified on the human face, so that an eye part picture is obtained, and the eye spirit direction is identified. However, the above method has disadvantages: the image recognition model is complex in construction process, and computing resources and time consumption are too high when the model is used. 4 models are needed for eye recognition of one picture, the storage space occupied by the models is large, and the deployment difficulty of a mobile phone end is large; many calculations performed by the face key point recognition model are irrelevant to eye spirit judgment, and the existing method is long in calculation process and cannot achieve real-time analysis.
In summary, the existing image recognition method has high cost, low efficiency, large occupied storage space and limited application scene.
Disclosure of Invention
Aiming at the problems of low efficiency and large occupied storage space of the existing image identification method, an image identification method, a device, computer equipment and a storage medium which can improve the identification efficiency and occupy small storage space are provided.
The invention provides an image recognition method, which comprises the following steps:
acquiring an image to be detected;
carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;
correcting the face image based on the positioning data to obtain a to-be-predicted image;
and identifying the image to be predicted by adopting a human eye sight prediction neural network model, and determining the direction of the human eye sight.
Preferably, the right wait to examine the image and carry out face detection, obtain face image and face image's positioning data includes:
and carrying out face detection on the image to be detected by adopting a multitask convolutional neural network to obtain a face image and positioning data of the face image.
Preferably, the positioning data includes: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.
Preferably, the correcting the face image based on the positioning data, and acquiring the image to be predicted includes:
and comparing the positioning data with the standard coordinate data, and performing similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.
Preferably, the human eye sight line prediction neural network model includes: the system comprises a separable convolution module, an attention mechanism module and a classification module;
the identifying the image to be predicted by adopting the human eye sight prediction neural network model and the determining the human eye sight direction comprises the following steps:
performing first facial feature extraction on the image to be predicted through the separable convolution module;
adjusting the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the eye feature weight;
and combining the first facial features and the feature weights to generate second facial features, and processing the second facial features through the classification module to obtain the sight line direction of human eyes.
Preferably, the separable convolution module is combined with the forward residual error module to extract the first facial feature of the image to be predicted; and adding the features obtained by the separable convolution module and the initial features at the same position by adopting the forward residual error module to obtain the first facial feature extraction.
Preferably, the separable convolution module and the inverse residual error module are combined to extract the first facial feature of the image to be predicted; and combining the reverse residual error module with the separable convolution module, and adding 1 × 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to obtain a first face feature extraction.
The present invention also provides an image recognition apparatus, comprising:
the receiving unit is used for acquiring an image to be detected;
the detection unit is used for carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;
the correction unit is used for correcting the face image based on the positioning data to acquire a to-be-predicted image;
and the identification unit is used for identifying the image to be predicted by adopting a human eye sight prediction neural network model and determining the direction of the human eye sight.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the image identification method, the image identification device, the computer equipment and the storage medium, the obtained image to be detected is subjected to face detection to obtain the face image and the positioning data of the face image, the face image is corrected according to the positioning data to obtain the image to be predicted for image identification, the calculation amount is reduced, the image to be predicted is identified by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the identification speed is high, the time consumption is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.
Drawings
FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present invention;
FIG. 2 is a flowchart of an embodiment of identifying the image to be predicted by using a human eye gaze prediction neural network model according to the present invention;
FIG. 3 is a block diagram of an embodiment of an image recognition apparatus according to the present invention;
fig. 4 is a hardware architecture diagram of one embodiment of the computer apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The image identification method, the image identification device, the computer equipment and the storage medium provided by the invention can be applied to the business fields of banks, insurance and the like. The human face detection is carried out on the acquired image to be detected to obtain the human face image and the positioning data of the human face image, the human face image is corrected according to the positioning data to obtain the image to be predicted for image recognition, the calculated amount is reduced, the image to be predicted is recognized by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the recognition speed is high, the consumed time is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.
Example one
Referring to fig. 1, an image recognition method of the present embodiment includes the following steps:
s1, obtaining an image to be detected;
in this embodiment, the equipment which is more than the collected image has no strict requirement, and the monocular camera can be adopted to collect the image to be detected, so that the requirement on the collecting equipment is low, and the equipment cost can be effectively reduced.
S2, carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;
wherein the positioning data may include: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.
The positioning data in this embodiment include 5 key point coordinates, are two eyes ellipse center point coordinates, nose head coordinate and the coordinate at mouth angle both ends respectively, compare in prior art and need carry out the sight prediction through obtaining 68 key points, greatly reduced the calculated amount, improved calculation process speed. The image recognition method can be widely applied to various application scenes, such as: the network management system comprises a bank outlet, a mobile terminal (such as a mobile phone end), a billboard and the like. It should be noted that the face detection network of this embodiment can detect multiple faces at a time, and obtain each face image and corresponding positioning data at the same time.
Further, step S2 performs face detection on the image to be detected, and acquiring the face image and the positioning data of the face image includes:
and performing face detection on the image to be detected by adopting a Multi-task Convolutional neural network (MTCNN for short) to obtain a face image and positioning data of the face image.
The multitask convolution neural network detects the human face and positions key points (coordinates of two eye oval center points, nose head coordinates and coordinates of two ends of a mouth corner) by utilizing a three-layer cascade architecture and combining a convolution neural network algorithm. The multitask convolutional neural network comprises three parts: the method comprises the following steps that neural networks P-Net (porous network), R-Net (refine network) and O-Net (output network) are adopted, a full convolution neural network P-Net is adopted to identify an image to be detected to obtain a first candidate window (a window for marking the position of a human face in the image to be detected) and a boundary regression vector, the offset of each first candidate window is calculated according to the boundary regression vector to determine a boundary window, the first candidate window is calibrated according to the boundary window, a Non-Maximum Suppression (NMS for short) is utilized to remove an overlapped window, and a second candidate window is obtained; the detection of the neural network P-Net is rough, so that the acquisition of a second candidate window is further optimized by adopting the neural network R-Net, the neural network R-Net is similar to the neural network P-Net, the second candidate window is input into the neural network R-Net for identification, a false window is filtered to further position a face region, and a third candidate window is generated; and (3) adopting a neural network O-Net with one more layer of convolution than the R-Net to supervise the third candidate window body, removing overlapped windows, confirming a face area, and positioning position coordinates of five face key points based on the confirmed face area.
S3, correcting the face image based on the positioning data to obtain a to-be-predicted image;
in the present embodiment, for the convenience of subsequent (step S4) line-of-sight recognition of the image to be predicted, it is necessary to convert the face image into the image to be predicted with a head-centered (e.g., eyes looking forward) for the convenience of recognition, so as to improve the accuracy of line-of-sight recognition.
Further, the step S3 corrects the face image based on the positioning data, and acquiring the image to be predicted includes:
and comparing the positioning data with the standard coordinate data, and performing similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.
The standard coordinate data is 5 key point standard coordinates stored in advance. The 5 keypoint standard coordinates include: the two eye ellipse center point labeling coordinates, the nose labeling coordinates and the labeling coordinates at two ends of the mouth corner.
In the embodiment, the positioning data is compared with the standard coordinate data to obtain the relation variable quantity, the face image is subjected to similar transformation such as rotation, translation, scaling and the like based on the relation variable quantity, and the face image is converted into the image to be predicted, so that the image to be predicted meets the requirement of sight line identification. Compared with the existing correction method which needs to adopt a deep neural network model to calculate the head rotation angle, the correction method adopted in the embodiment effectively reduces the calculated amount, and a model for training the head rotation angle estimation is not needed, so that the calculation cost is greatly reduced.
And S4, identifying the image to be predicted by adopting a human eye sight prediction neural network model, and determining the direction of the human eye sight.
It should be noted that the human eye gaze prediction neural network model includes: the system comprises a separable convolution module, an attention mechanism module and a classification module;
as shown in fig. 2, further, the identifying the image to be predicted by using the human eye gaze prediction neural network model in step S4, and determining the human eye gaze direction may include:
s41, performing first facial feature extraction on the image to be predicted through the separable convolution module;
in the step, the calculation amount is greatly reduced by adopting the separable convolution sum to replace the convolution kernel of the standard convolution neural network, and the calculation complexity is reduced. Taking the input image to be predicted as d × c × m, the output first surface feature as d × c × n, and the convolution layer as k × k as an example:
the standard convolution kernel calculation amount is d multiplied by m multiplied by n multiplied by k;
the number of separable convolution kernels calculated is d × d × m × (n + k × k);
wherein d represents the width of the image to be predicted, c represents the height of the image to be predicted, m and n are both channel numbers, and k represents the size of the convolutional layer;
it follows that separable convolution reduces the number of parameters of the model and the computational load of the convolution process.
In step S41, the separable convolution module can be combined with the forward residual module to perform the first facial feature extraction on the image to be predicted.
And adding the features obtained by the separable convolution module and the initial features at the same position by using a forward residual module. The network can learn the high-order characteristics and can not forget useful low-order characteristics.
In step S41, the separable convolution module can be combined with the inverse residual module to perform a first facial feature extraction on the image to be predicted.
Convolving each input channel of the image to be predicted by a single convolution kernel through a depth convolution (depthwise convolution) of the separable convolution module to obtain a first feature map; and then carrying out weighted combination on the first feature map of the previous step in the depth direction by adopting point-by-point convolution (pointwise convolution) through 1 × 1 convolution to obtain more features. Combining the reverse residual error module with the separable convolution module, adding 1 x 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to ensure that more effective second feature maps are extracted, and splicing all the second feature maps to obtain the first face features. In the embodiment, the neural network does not forget useful low-order features while learning high-order features through the inverse residual error module, and meanwhile, compared with a forward residual error module, the quantity of parameters is less, the calculation speed is higher, and the occupied space of a memory is greatly reduced.
S42, adjusting the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the eye feature weight;
in this step, the attention mechanism module adopts a self-attention mechanism. Among them, the self-attention mechanism is a position correlation mechanism of weights and sequences when calculating the same sequence representation, and has proved to be very effective in machine reading understanding, abstract summary and picture description generation.
In this embodiment, a plurality of attention mechanism modules may be included, where the attention mechanism modules correspond to convolution layers of the separable convolution modules, and the attention mechanism modules are located behind the corresponding convolution layers, and extract convolution features around the eye through the attention mechanism modules, where the output of each attention mechanism module is used as the input of the next attention mechanism module, and the refined convolution features of the last attention mechanism module are used as feature weights (i.e., feature weights for enhancing eye feature weights). The extraction of the features around the eyes is enhanced on the basis of the first surface features by means of attention mechanism adjustment weight, and then the eye features are generated according to the features of the eyeballs and the features of the eye muscles, so that the feature weight capable of enhancing the eye features is obtained.
S43, combining the first facial features and the feature weights to generate second facial features, and processing the second facial features through the classification module to obtain the human eye sight direction.
In this step, the classification module employs a full connection layer. Multiplying the first face features and the feature weights to generate second face features, inputting the second face features into a full-connection layer, integrating the second face features through a weight matrix by the full-connection layer, calculating deviation probability information based on integrated neurons, obtaining the vertical deviation and the horizontal deviation of the sight corresponding to each deviation probability information, and obtaining the sight direction of human eyes according to the vertical deviation and the horizontal deviation.
In step S4, the input image to be predicted by the human eye gaze prediction neural network model is a whole human face, which has two main advantages over the prior art: firstly, on the accuracy of prediction, muscle changes around human eyes can assist in judging the direction of the eyes, and the existing method only inputs eye pictures and cannot utilize surrounding information; secondly, in order to obtain the eye picture, the existing method needs to construct 68 detection models of key points of the face to obtain the coordinates of the eye frame, and has large calculation amount and high cost.
In the embodiment, the image identification method comprises the steps of carrying out face detection on an obtained image to be detected to obtain a face image and positioning data of the face image, correcting the face image according to the positioning data to obtain a to-be-predicted image for image identification, reducing the calculation amount, and identifying the to-be-predicted image by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined.
In practical application, compared with a human eye recognition system adopting an infrared camera, the image recognition method can finish image acquisition by only one monocular camera, so that the equipment cost is reduced; meanwhile, people do not need to be calibrated, and the method can be widely applied to various scenes such as bank outlets, personal mobile phones and the like. Compared with other human eye recognition systems using monocular cameras, the image recognition method only needs two models, and the human eye sight prediction neural network model has less parameter quantity than the existing human eye recognition model, so that the calculation of one-time eye recognition is greatly accelerated, and the real-time analysis can be realized on the English-Weber 1080 model GPU; the model of the eye-mind prediction neural network model occupies less than 8MB of memory space, and the space memory of the existing human eye recognition model is usually more than 100 MB.
The image recognition method in the embodiment may be applied to emotion analysis, such as: the device is characterized in that the device is neglected when in tension or lie, and can be used for anti-fraud judgment; the method can also be used for analyzing interested areas of customers such as billboards; the method can also be applied to small games for human eye identification or game interaction and the like.
Example two
As shown in fig. 3, the present invention also provides an image recognition apparatus 1 including: a receiving unit 11, a detecting unit 12, a correcting unit 13 and a recognizing unit 14, wherein:
a receiving unit 11, configured to acquire an image to be detected;
in this embodiment, the equipment which is more than the collected image has no strict requirement, and the monocular camera can be adopted to collect the image to be detected, so that the requirement on the collecting equipment is low, and the equipment cost can be effectively reduced.
The detection unit 12 is configured to perform face detection on the image to be detected, and acquire a face image and positioning data of the face image;
wherein the positioning data may include: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.
The positioning data in this embodiment include 5 key point coordinates, are two eyes ellipse center point coordinates, nose head coordinate and the coordinate at mouth angle both ends respectively, compare in prior art and need carry out the sight prediction through obtaining 68 key points, greatly reduced the calculated amount, improved calculation process speed. The image recognition method can be widely applied to various application scenes, such as: the network management system comprises a bank outlet, a mobile terminal (such as a mobile phone end), a billboard and the like. It should be noted that the face detection network of this embodiment can detect multiple faces at a time, and obtain each face image and corresponding positioning data at the same time.
Specifically, the detection unit 12 may perform face detection on the image to be detected by using a Multi-task convolutional neural network (MTCNN for short), and obtain a face image and positioning data of the face image.
The multitask convolution neural network detects the human face and positions key points (coordinates of two eye oval center points, nose head coordinates and coordinates of two ends of a mouth corner) by utilizing a three-layer cascade architecture and combining a convolution neural network algorithm. The multitask convolutional neural network comprises three parts: the method comprises the following steps that neural networks P-Net (porous network), R-Net (refine network) and O-Net (output network) are adopted, a full convolution neural network P-Net is adopted to identify an image to be detected to obtain a first candidate window (a window for marking the position of a human face in the image to be detected) and a boundary regression vector, the offset of each first candidate window is calculated according to the boundary regression vector to determine a boundary window, the first candidate window is calibrated according to the boundary window, a Non-Maximum Suppression (NMS for short) is utilized to remove an overlapped window, and a second candidate window is obtained; the detection of the neural network P-Net is rough, so that the acquisition of a second candidate window is further optimized by adopting the neural network R-Net, the neural network R-Net is similar to the neural network P-Net, the second candidate window is input into the neural network R-Net for identification, a false window is filtered to further position a face region, and a third candidate window is generated; and (3) adopting a neural network O-Net with one more layer of convolution than the R-Net to supervise the third candidate window body, removing overlapped windows, confirming a face area, and positioning position coordinates of five face key points based on the confirmed face area.
A correcting unit 13, configured to correct the face image based on the positioning data, and obtain an image to be predicted;
in the embodiment, in order to facilitate the subsequent sight line identification of the image to be predicted, the face image needs to be converted into the image to be predicted with a correct head (for example, eyes looking forward) for easy identification, so as to improve the accuracy of sight line identification.
The correction unit 13 compares the positioning data with the standard coordinate data, and performs similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.
The standard coordinate data is 5 key point standard coordinates stored in advance. The 5 keypoint standard coordinates include: the two eye ellipse center point labeling coordinates, the nose labeling coordinates and the labeling coordinates at two ends of the mouth corner.
In the embodiment, the positioning data is compared with the standard coordinate data to obtain the relation variable quantity, the face image is subjected to similar transformation such as rotation, translation, scaling and the like based on the relation variable quantity, and the face image is converted into the image to be predicted, so that the image to be predicted meets the requirement of sight line identification. Compared with the existing correction method which needs to adopt a deep neural network model to calculate the head rotation angle, the correction method adopted in the embodiment effectively reduces the calculated amount, and a model for training the head rotation angle estimation is not needed, so that the calculation cost is greatly reduced.
And the identifying unit 14 is configured to identify the image to be predicted by using a human eye gaze prediction neural network model, and determine a human eye gaze direction.
It should be noted that the human eye gaze prediction neural network model includes: the system comprises a separable convolution module, an attention mechanism module and a classification module;
the identification unit 14 performs first facial feature extraction on the image to be predicted through the separable convolution module; the separable convolution module can be combined with the forward residual module to perform a first facial feature extraction on the image to be predicted. And adding the features obtained by the separable convolution module and the initial features at the same position by using a forward residual module. The network can learn the high-order characteristics and can not forget useful low-order characteristics.
The separable convolution module can be combined with the inverse residual module to perform a first facial feature extraction on the image to be predicted. Convolving each input channel of the image to be predicted by a single convolution kernel through a depth convolution (depthwise convolution) of the separable convolution module to obtain a first feature map; and then carrying out weighted combination on the first feature map of the previous step in the depth direction by adopting point-by-point convolution (position convolution) through 1 multiplied by 1 convolution to obtain more features. Combining the reverse residual error module with the separable convolution module, adding 1 x 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to ensure that more effective second feature maps are extracted, and splicing all the second feature maps to obtain the first face features. In the embodiment, the neural network does not forget useful low-order features while learning high-order features through the inverse residual error module, and meanwhile, compared with a forward residual error module, the quantity of parameters is less, the calculation speed is higher, and the occupied space of a memory is greatly reduced.
The recognition unit 14 adjusts the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the weight of the eye feature; the attention mechanism module adopts a self-attention mechanism. Among them, the self-attention mechanism is a position correlation mechanism of weights and sequences when calculating the same sequence representation, and has proved to be very effective in machine reading understanding, abstract summary and picture description generation.
In this embodiment, a plurality of attention mechanism modules may be included, where the attention mechanism modules correspond to convolution layers of the separable convolution modules, and the attention mechanism modules are located behind the corresponding convolution layers, and extract convolution features around the eye through the attention mechanism modules, where the output of each attention mechanism module is used as the input of the next attention mechanism module, and the refined convolution features of the last attention mechanism module are used as feature weights (i.e., feature weights for enhancing eye feature weights). The extraction of the features around the eyes is enhanced on the basis of the first surface features by means of attention mechanism adjustment weight, and then the eye features are generated according to the features of the eyeballs and the features of the eye muscles, so that the feature weight capable of enhancing the eye features is obtained.
The recognition unit 14 combines the first facial features and the feature weights to generate second facial features, and the classification module processes the second facial features to obtain the human eye sight direction.
The classification module adopts a full connection layer. Multiplying the first face features and the feature weights to generate second face features, inputting the second face features into a full-connection layer, integrating the second face features through a weight matrix by the full-connection layer, calculating deviation probability information based on integrated neurons, obtaining the vertical deviation and the horizontal deviation of the sight corresponding to each deviation probability information, and obtaining the sight direction of human eyes according to the vertical deviation and the horizontal deviation.
The input image to be predicted of the human eye sight prediction neural network model is the whole human face, and has two main advantages compared with the prior art: firstly, on the accuracy of prediction, muscle changes around human eyes can assist in judging the direction of the eyes, and the existing method only inputs eye pictures and cannot utilize surrounding information; secondly, in order to obtain the eye picture, the existing method needs to construct 68 detection models of key points of the face to obtain the coordinates of the eye frame, and has large calculation amount and high cost.
In this embodiment, the image recognition device 1 performs face detection on the acquired image to be detected to obtain a face image and positioning data of the face image, corrects the face image according to the positioning data to obtain an image to be predicted for image recognition, reduces the amount of calculation, and recognizes the image to be predicted by using the human eye sight prediction neural network model, so as to determine the direction of the human eye sight.
In practical application, compared with a human eye recognition system adopting an infrared camera, the image recognition method can finish image acquisition by only one monocular camera, so that the equipment cost is reduced; meanwhile, people do not need to be calibrated, and the method can be widely applied to various scenes such as bank outlets, personal mobile phones and the like. Compared with other human eye recognition systems using monocular cameras, the image recognition method only needs two models, and the human eye sight prediction neural network model has less parameter quantity than the existing human eye recognition model, so that the calculation of one-time eye recognition is greatly accelerated, and the real-time analysis can be realized on the English-Weber 1080 model GPU; the model of the eye-mind prediction neural network model occupies less than 8MB of memory space, and the space memory of the existing human eye recognition model is usually more than 100 MB.
The image recognition apparatus 1 in the present embodiment can be applied to emotion analysis, such as: the device is characterized in that the device is neglected when in tension or lie, and can be used for anti-fraud judgment; the method can also be used for analyzing interested areas of customers such as billboards; the method can also be applied to small games for human eye identification or game interaction and the like.
EXAMPLE III
In order to achieve the above object, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, components of the image recognition apparatus 1 according to the second embodiment may be dispersed in different computer devices 2, and the computer device 2 may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that executes a program, and the like. The computer device 2 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 23, a network interface 22, and the image recognition apparatus 1 (refer to fig. 4) that can be communicatively connected to each other through a system bus. It is noted that fig. 4 only shows the computer device 2 with components, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as a program code of the image recognition method of the first embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 23 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 23 is typically used for controlling the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 23 is configured to run the program codes stored in the memory 21 or process data, for example, run the image recognition apparatus 1.
The network interface 22 may comprise a wireless network interface or a wired network interface, and the network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 4 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the image recognition apparatus 1 stored in the memory 21 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 23) to complete the present invention.
Example four
To achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, implements corresponding functions. The computer readable storage medium of the present embodiment is used for storing the image recognition apparatus 1, and when being executed by the processor 23, the computer readable storage medium implements the image recognition method of the first embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An image recognition method, comprising:
acquiring an image to be detected;
carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;
correcting the face image based on the positioning data to obtain a to-be-predicted image;
and identifying the image to be predicted by adopting a human eye sight prediction neural network model, and determining the direction of the human eye sight.
2. The image recognition method of claim 1, wherein the performing of the face detection on the image to be detected to obtain the face image and the positioning data of the face image comprises:
and carrying out face detection on the image to be detected by adopting a multitask convolutional neural network to obtain a face image and positioning data of the face image.
3. The image recognition method according to claim 1 or 2, wherein the positioning data includes: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.
4. The image recognition method according to claim 1, wherein the correcting the face image based on the positioning data to obtain the image to be predicted comprises:
and comparing the positioning data with the standard coordinate data, and performing similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.
5. The image recognition method of claim 1, wherein the human eye gaze prediction neural network model comprises: the system comprises a separable convolution module, an attention mechanism module and a classification module;
the identifying the image to be predicted by adopting the human eye sight prediction neural network model and the determining the human eye sight direction comprises the following steps:
performing first facial feature extraction on the image to be predicted through the separable convolution module;
adjusting the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the eye feature weight;
and combining the first facial features and the feature weights to generate second facial features, and processing the second facial features through the classification module to obtain the sight line direction of human eyes.
6. The image recognition method of claim 5, wherein the separable convolution module, in combination with a forward residual module, performs the first facial feature extraction on the image to be predicted; and adding the features obtained by the separable convolution module and the initial features at the same position by adopting the forward residual error module to obtain the first facial feature extraction.
7. The image recognition method of claim 5, wherein the separable convolution module, in combination with the inverse residual module, performs the first facial feature extraction on the image to be predicted; and combining the reverse residual error module with the separable convolution module, and adding 1 × 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to obtain a first face feature extraction.
8. An image recognition apparatus, comprising:
the receiving unit is used for acquiring an image to be detected;
the detection unit is used for carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;
the correction unit is used for correcting the face image based on the positioning data to acquire a to-be-predicted image;
and the identification unit is used for identifying the image to be predicted by adopting a human eye sight prediction neural network model and determining the direction of the human eye sight.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that: the processor, when executing the computer program, realizes the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN202010127177.3A 2020-02-28 2020-02-28 Image recognition method and device, computer equipment and storage medium Pending CN111310705A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010127177.3A CN111310705A (en) 2020-02-28 2020-02-28 Image recognition method and device, computer equipment and storage medium
PCT/CN2021/071172 WO2021169637A1 (en) 2020-02-28 2021-01-12 Image recognition method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010127177.3A CN111310705A (en) 2020-02-28 2020-02-28 Image recognition method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111310705A true CN111310705A (en) 2020-06-19

Family

ID=71149407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010127177.3A Pending CN111310705A (en) 2020-02-28 2020-02-28 Image recognition method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111310705A (en)
WO (1) WO2021169637A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111710109A (en) * 2020-07-01 2020-09-25 中国银行股份有限公司 Withdrawal control method, device and system
CN112464793A (en) * 2020-11-25 2021-03-09 大连东软教育科技集团有限公司 Method, system and storage medium for detecting cheating behaviors in online examination
CN112749655A (en) * 2021-01-05 2021-05-04 风变科技(深圳)有限公司 Sight tracking method, sight tracking device, computer equipment and storage medium
CN112801069A (en) * 2021-04-14 2021-05-14 四川翼飞视科技有限公司 Face key feature point detection device, method and storage medium
CN113111745A (en) * 2021-03-30 2021-07-13 四川大学 Eye movement identification method based on product attention of openposition
WO2021169637A1 (en) * 2020-02-28 2021-09-02 深圳壹账通智能科技有限公司 Image recognition method and apparatus, computer device and storage medium
WO2021217919A1 (en) * 2020-04-29 2021-11-04 深圳壹账通智能科技有限公司 Facial action unit recognition method and apparatus, and electronic device, and storage medium
CN114706484A (en) * 2022-04-18 2022-07-05 Oppo广东移动通信有限公司 Sight line coordinate determination method and device, computer readable medium and electronic equipment
CN114898447A (en) * 2022-07-13 2022-08-12 北京科技大学 Personalized fixation point detection method and device based on self-attention mechanism
CN117132869A (en) * 2023-08-28 2023-11-28 广州视景医疗软件有限公司 Method and device for training sight deviation estimation model and correcting sight deviation value

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114115535A (en) * 2021-11-12 2022-03-01 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Eye movement tracking and identifying method and system based on Yinhua mobile operation system of Galaxy
CN116912924B (en) * 2023-09-12 2024-01-05 深圳须弥云图空间科技有限公司 Target image recognition method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978548A (en) * 2014-04-02 2015-10-14 汉王科技股份有限公司 Visual line estimation method and visual line estimation device based on three-dimensional active shape model
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
US20190110003A1 (en) * 2017-10-11 2019-04-11 Wistron Corporation Image processing method and system for eye-gaze correction
CN109740491A (en) * 2018-12-27 2019-05-10 北京旷视科技有限公司 A kind of human eye sight recognition methods, device, system and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930278A (en) * 2012-10-16 2013-02-13 天津大学 Human eye sight estimation method and device
CN110678873A (en) * 2019-07-30 2020-01-10 珠海全志科技股份有限公司 Attention detection method based on cascade neural network, computer device and computer readable storage medium
CN111310705A (en) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 Image recognition method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978548A (en) * 2014-04-02 2015-10-14 汉王科技股份有限公司 Visual line estimation method and visual line estimation device based on three-dimensional active shape model
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
US20190110003A1 (en) * 2017-10-11 2019-04-11 Wistron Corporation Image processing method and system for eye-gaze correction
CN109492514A (en) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 A kind of method and system in one camera acquisition human eye sight direction
CN109740491A (en) * 2018-12-27 2019-05-10 北京旷视科技有限公司 A kind of human eye sight recognition methods, device, system and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169637A1 (en) * 2020-02-28 2021-09-02 深圳壹账通智能科技有限公司 Image recognition method and apparatus, computer device and storage medium
WO2021217919A1 (en) * 2020-04-29 2021-11-04 深圳壹账通智能科技有限公司 Facial action unit recognition method and apparatus, and electronic device, and storage medium
CN111710109A (en) * 2020-07-01 2020-09-25 中国银行股份有限公司 Withdrawal control method, device and system
CN112464793A (en) * 2020-11-25 2021-03-09 大连东软教育科技集团有限公司 Method, system and storage medium for detecting cheating behaviors in online examination
CN112749655A (en) * 2021-01-05 2021-05-04 风变科技(深圳)有限公司 Sight tracking method, sight tracking device, computer equipment and storage medium
CN113111745A (en) * 2021-03-30 2021-07-13 四川大学 Eye movement identification method based on product attention of openposition
CN112801069A (en) * 2021-04-14 2021-05-14 四川翼飞视科技有限公司 Face key feature point detection device, method and storage medium
CN114706484A (en) * 2022-04-18 2022-07-05 Oppo广东移动通信有限公司 Sight line coordinate determination method and device, computer readable medium and electronic equipment
CN114898447A (en) * 2022-07-13 2022-08-12 北京科技大学 Personalized fixation point detection method and device based on self-attention mechanism
CN117132869A (en) * 2023-08-28 2023-11-28 广州视景医疗软件有限公司 Method and device for training sight deviation estimation model and correcting sight deviation value

Also Published As

Publication number Publication date
WO2021169637A1 (en) 2021-09-02

Similar Documents

Publication Publication Date Title
WO2021169637A1 (en) Image recognition method and apparatus, computer device and storage medium
US10713532B2 (en) Image recognition method and apparatus
US10635890B2 (en) Facial recognition method and apparatus, electronic device, and storage medium
CN109359548B (en) Multi-face recognition monitoring method and device, electronic equipment and storage medium
CA2934514C (en) System and method for identifying faces in unconstrained media
CN109657554B (en) Image identification method and device based on micro expression and related equipment
CN112419170B (en) Training method of shielding detection model and beautifying processing method of face image
US20170140210A1 (en) Image processing apparatus and image processing method
WO2020199611A1 (en) Liveness detection method and apparatus, electronic device, and storage medium
EP4099217A1 (en) Image processing model training method and apparatus, device, and storage medium
CN112395979B (en) Image-based health state identification method, device, equipment and storage medium
CN106295591A (en) Gender identification method based on facial image and device
CN111597884A (en) Facial action unit identification method and device, electronic equipment and storage medium
CN111598038B (en) Facial feature point detection method, device, equipment and storage medium
US20230081982A1 (en) Image processing method and apparatus, computer device, storage medium, and computer program product
EP3685288B1 (en) Apparatus, method and computer program product for biometric recognition
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
CN107844742A (en) Facial image glasses minimizing technology, device and storage medium
CN113591763B (en) Classification recognition method and device for face shapes, storage medium and computer equipment
CN112699857A (en) Living body verification method and device based on human face posture and electronic equipment
CN115050064A (en) Face living body detection method, device, equipment and medium
CN113298158A (en) Data detection method, device, equipment and storage medium
CN111401192A (en) Model training method based on artificial intelligence and related device
CN116311370A (en) Multi-angle feature-based cow face recognition method and related equipment thereof
CN114861241A (en) Anti-peeping screen method based on intelligent detection and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200619