CN111310705A

CN111310705A - Image recognition method and device, computer equipment and storage medium

Info

Publication number: CN111310705A
Application number: CN202010127177.3A
Authority: CN
Inventors: 胡艺飞; 徐国强
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-06-19
Also published as: WO2021169637A1

Abstract

The invention discloses an image recognition method, an image recognition device, computer equipment and a storage medium, and belongs to the field of face recognition. The human face detection is carried out on the acquired image to be detected to obtain the human face image and the positioning data of the human face image, the human face image is corrected according to the positioning data to obtain the image to be predicted for image recognition, the calculated amount is reduced, the image to be predicted is recognized by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the recognition speed is high, the consumed time is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.

Description

Image recognition method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of face recognition, and in particular, to an image recognition method, an image recognition apparatus, a computer device, and a storage medium.

Background

The emotion analysis is rapidly developed along with the rise of network social media (such as comments, forums, blogs and microblogs), and the expression viewpoint, emotion, evaluation, attitude, emotion and tendency can be analyzed through the emotion analysis of people. Since a person may cause changes in physiological parameters (e.g., skin current, heart beat, blood pressure, respiration, brain wave, voice, line of sight, etc.) when the person is psychologically changed, the emotional changes of the analyzed person can be evaluated by detecting the changes. Considering that emotion analysis is generally performed in a non-contact and convenient-to-acquire and process scene, a technology for analyzing emotion changes of an evaluated person by using an image recognition technology is more and more popular with the continuous development of a face recognition technology.

The existing image recognition systems are mainly classified into two types, one type is that an infrared camera is used for collecting images for image recognition, for example: eye Tracking systems (Tobi Eye Tracking) from alien computers (alien); the other type is image recognition by using a monocular camera to acquire images. The defects of the image recognition technology by adopting an infrared camera to collect images mainly comprise: the equipment is expensive to be equipped, and each user needs to be calibrated in advance, so that the equipment cannot be used in a scene of eye analysis of unspecified users at a bank outlet and the like. The method for identifying the image acquired by the monocular camera comprises the following steps: the human face is detected, the rotation angle of the head of the human body is estimated, 68 key points are identified on the human face, so that an eye part picture is obtained, and the eye spirit direction is identified. However, the above method has disadvantages: the image recognition model is complex in construction process, and computing resources and time consumption are too high when the model is used. 4 models are needed for eye recognition of one picture, the storage space occupied by the models is large, and the deployment difficulty of a mobile phone end is large; many calculations performed by the face key point recognition model are irrelevant to eye spirit judgment, and the existing method is long in calculation process and cannot achieve real-time analysis.

In summary, the existing image recognition method has high cost, low efficiency, large occupied storage space and limited application scene.

Disclosure of Invention

Aiming at the problems of low efficiency and large occupied storage space of the existing image identification method, an image identification method, a device, computer equipment and a storage medium which can improve the identification efficiency and occupy small storage space are provided.

The invention provides an image recognition method, which comprises the following steps:

acquiring an image to be detected;

carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;

correcting the face image based on the positioning data to obtain a to-be-predicted image;

and identifying the image to be predicted by adopting a human eye sight prediction neural network model, and determining the direction of the human eye sight.

Preferably, the right wait to examine the image and carry out face detection, obtain face image and face image's positioning data includes:

and carrying out face detection on the image to be detected by adopting a multitask convolutional neural network to obtain a face image and positioning data of the face image.

Preferably, the positioning data includes: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.

Preferably, the correcting the face image based on the positioning data, and acquiring the image to be predicted includes:

and comparing the positioning data with the standard coordinate data, and performing similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.

Preferably, the human eye sight line prediction neural network model includes: the system comprises a separable convolution module, an attention mechanism module and a classification module;

the identifying the image to be predicted by adopting the human eye sight prediction neural network model and the determining the human eye sight direction comprises the following steps:

performing first facial feature extraction on the image to be predicted through the separable convolution module;

adjusting the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the eye feature weight;

and combining the first facial features and the feature weights to generate second facial features, and processing the second facial features through the classification module to obtain the sight line direction of human eyes.

Preferably, the separable convolution module is combined with the forward residual error module to extract the first facial feature of the image to be predicted; and adding the features obtained by the separable convolution module and the initial features at the same position by adopting the forward residual error module to obtain the first facial feature extraction.

Preferably, the separable convolution module and the inverse residual error module are combined to extract the first facial feature of the image to be predicted; and combining the reverse residual error module with the separable convolution module, and adding 1 × 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to obtain a first face feature extraction.

The present invention also provides an image recognition apparatus, comprising:

the receiving unit is used for acquiring an image to be detected;

the detection unit is used for carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;

the correction unit is used for correcting the face image based on the positioning data to acquire a to-be-predicted image;

and the identification unit is used for identifying the image to be predicted by adopting a human eye sight prediction neural network model and determining the direction of the human eye sight.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.

The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the image identification method, the image identification device, the computer equipment and the storage medium, the obtained image to be detected is subjected to face detection to obtain the face image and the positioning data of the face image, the face image is corrected according to the positioning data to obtain the image to be predicted for image identification, the calculation amount is reduced, the image to be predicted is identified by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the identification speed is high, the time consumption is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.

Drawings

FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present invention;

FIG. 2 is a flowchart of an embodiment of identifying the image to be predicted by using a human eye gaze prediction neural network model according to the present invention;

FIG. 3 is a block diagram of an embodiment of an image recognition apparatus according to the present invention;

fig. 4 is a hardware architecture diagram of one embodiment of the computer apparatus of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The image identification method, the image identification device, the computer equipment and the storage medium provided by the invention can be applied to the business fields of banks, insurance and the like. The human face detection is carried out on the acquired image to be detected to obtain the human face image and the positioning data of the human face image, the human face image is corrected according to the positioning data to obtain the image to be predicted for image recognition, the calculated amount is reduced, the image to be predicted is recognized by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined, the recognition speed is high, the consumed time is short, the adopted human eye sight prediction neural network model occupies a small memory space, and the operation speed is high.

Example one

Referring to fig. 1, an image recognition method of the present embodiment includes the following steps:

s1, obtaining an image to be detected;

in this embodiment, the equipment which is more than the collected image has no strict requirement, and the monocular camera can be adopted to collect the image to be detected, so that the requirement on the collecting equipment is low, and the equipment cost can be effectively reduced.

S2, carrying out face detection on the image to be detected to obtain a face image and positioning data of the face image;

wherein the positioning data may include: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.

The positioning data in this embodiment include 5 key point coordinates, are two eyes ellipse center point coordinates, nose head coordinate and the coordinate at mouth angle both ends respectively, compare in prior art and need carry out the sight prediction through obtaining 68 key points, greatly reduced the calculated amount, improved calculation process speed. The image recognition method can be widely applied to various application scenes, such as: the network management system comprises a bank outlet, a mobile terminal (such as a mobile phone end), a billboard and the like. It should be noted that the face detection network of this embodiment can detect multiple faces at a time, and obtain each face image and corresponding positioning data at the same time.

Further, step S2 performs face detection on the image to be detected, and acquiring the face image and the positioning data of the face image includes:

and performing face detection on the image to be detected by adopting a Multi-task Convolutional neural network (MTCNN for short) to obtain a face image and positioning data of the face image.

The multitask convolution neural network detects the human face and positions key points (coordinates of two eye oval center points, nose head coordinates and coordinates of two ends of a mouth corner) by utilizing a three-layer cascade architecture and combining a convolution neural network algorithm. The multitask convolutional neural network comprises three parts: the method comprises the following steps that neural networks P-Net (porous network), R-Net (refine network) and O-Net (output network) are adopted, a full convolution neural network P-Net is adopted to identify an image to be detected to obtain a first candidate window (a window for marking the position of a human face in the image to be detected) and a boundary regression vector, the offset of each first candidate window is calculated according to the boundary regression vector to determine a boundary window, the first candidate window is calibrated according to the boundary window, a Non-Maximum Suppression (NMS for short) is utilized to remove an overlapped window, and a second candidate window is obtained; the detection of the neural network P-Net is rough, so that the acquisition of a second candidate window is further optimized by adopting the neural network R-Net, the neural network R-Net is similar to the neural network P-Net, the second candidate window is input into the neural network R-Net for identification, a false window is filtered to further position a face region, and a third candidate window is generated; and (3) adopting a neural network O-Net with one more layer of convolution than the R-Net to supervise the third candidate window body, removing overlapped windows, confirming a face area, and positioning position coordinates of five face key points based on the confirmed face area.

S3, correcting the face image based on the positioning data to obtain a to-be-predicted image;

in the present embodiment, for the convenience of subsequent (step S4) line-of-sight recognition of the image to be predicted, it is necessary to convert the face image into the image to be predicted with a head-centered (e.g., eyes looking forward) for the convenience of recognition, so as to improve the accuracy of line-of-sight recognition.

Further, the step S3 corrects the face image based on the positioning data, and acquiring the image to be predicted includes:

The standard coordinate data is 5 key point standard coordinates stored in advance. The 5 keypoint standard coordinates include: the two eye ellipse center point labeling coordinates, the nose labeling coordinates and the labeling coordinates at two ends of the mouth corner.

In the embodiment, the positioning data is compared with the standard coordinate data to obtain the relation variable quantity, the face image is subjected to similar transformation such as rotation, translation, scaling and the like based on the relation variable quantity, and the face image is converted into the image to be predicted, so that the image to be predicted meets the requirement of sight line identification. Compared with the existing correction method which needs to adopt a deep neural network model to calculate the head rotation angle, the correction method adopted in the embodiment effectively reduces the calculated amount, and a model for training the head rotation angle estimation is not needed, so that the calculation cost is greatly reduced.

And S4, identifying the image to be predicted by adopting a human eye sight prediction neural network model, and determining the direction of the human eye sight.

It should be noted that the human eye gaze prediction neural network model includes: the system comprises a separable convolution module, an attention mechanism module and a classification module;

as shown in fig. 2, further, the identifying the image to be predicted by using the human eye gaze prediction neural network model in step S4, and determining the human eye gaze direction may include:

s41, performing first facial feature extraction on the image to be predicted through the separable convolution module;

in the step, the calculation amount is greatly reduced by adopting the separable convolution sum to replace the convolution kernel of the standard convolution neural network, and the calculation complexity is reduced. Taking the input image to be predicted as d × c × m, the output first surface feature as d × c × n, and the convolution layer as k × k as an example:

the standard convolution kernel calculation amount is d multiplied by m multiplied by n multiplied by k;

the number of separable convolution kernels calculated is d × d × m × (n + k × k);

wherein d represents the width of the image to be predicted, c represents the height of the image to be predicted, m and n are both channel numbers, and k represents the size of the convolutional layer;

it follows that separable convolution reduces the number of parameters of the model and the computational load of the convolution process.

In step S41, the separable convolution module can be combined with the forward residual module to perform the first facial feature extraction on the image to be predicted.

And adding the features obtained by the separable convolution module and the initial features at the same position by using a forward residual module. The network can learn the high-order characteristics and can not forget useful low-order characteristics.

In step S41, the separable convolution module can be combined with the inverse residual module to perform a first facial feature extraction on the image to be predicted.

Convolving each input channel of the image to be predicted by a single convolution kernel through a depth convolution (depthwise convolution) of the separable convolution module to obtain a first feature map; and then carrying out weighted combination on the first feature map of the previous step in the depth direction by adopting point-by-point convolution (pointwise convolution) through 1 × 1 convolution to obtain more features. Combining the reverse residual error module with the separable convolution module, adding 1 x 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to ensure that more effective second feature maps are extracted, and splicing all the second feature maps to obtain the first face features. In the embodiment, the neural network does not forget useful low-order features while learning high-order features through the inverse residual error module, and meanwhile, compared with a forward residual error module, the quantity of parameters is less, the calculation speed is higher, and the occupied space of a memory is greatly reduced.

S42, adjusting the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the eye feature weight;

in this step, the attention mechanism module adopts a self-attention mechanism. Among them, the self-attention mechanism is a position correlation mechanism of weights and sequences when calculating the same sequence representation, and has proved to be very effective in machine reading understanding, abstract summary and picture description generation.

In this embodiment, a plurality of attention mechanism modules may be included, where the attention mechanism modules correspond to convolution layers of the separable convolution modules, and the attention mechanism modules are located behind the corresponding convolution layers, and extract convolution features around the eye through the attention mechanism modules, where the output of each attention mechanism module is used as the input of the next attention mechanism module, and the refined convolution features of the last attention mechanism module are used as feature weights (i.e., feature weights for enhancing eye feature weights). The extraction of the features around the eyes is enhanced on the basis of the first surface features by means of attention mechanism adjustment weight, and then the eye features are generated according to the features of the eyeballs and the features of the eye muscles, so that the feature weight capable of enhancing the eye features is obtained.

S43, combining the first facial features and the feature weights to generate second facial features, and processing the second facial features through the classification module to obtain the human eye sight direction.

In this step, the classification module employs a full connection layer. Multiplying the first face features and the feature weights to generate second face features, inputting the second face features into a full-connection layer, integrating the second face features through a weight matrix by the full-connection layer, calculating deviation probability information based on integrated neurons, obtaining the vertical deviation and the horizontal deviation of the sight corresponding to each deviation probability information, and obtaining the sight direction of human eyes according to the vertical deviation and the horizontal deviation.

In step S4, the input image to be predicted by the human eye gaze prediction neural network model is a whole human face, which has two main advantages over the prior art: firstly, on the accuracy of prediction, muscle changes around human eyes can assist in judging the direction of the eyes, and the existing method only inputs eye pictures and cannot utilize surrounding information; secondly, in order to obtain the eye picture, the existing method needs to construct 68 detection models of key points of the face to obtain the coordinates of the eye frame, and has large calculation amount and high cost.

In the embodiment, the image identification method comprises the steps of carrying out face detection on an obtained image to be detected to obtain a face image and positioning data of the face image, correcting the face image according to the positioning data to obtain a to-be-predicted image for image identification, reducing the calculation amount, and identifying the to-be-predicted image by adopting the human eye sight prediction neural network model, so that the human eye sight direction is determined.

In practical application, compared with a human eye recognition system adopting an infrared camera, the image recognition method can finish image acquisition by only one monocular camera, so that the equipment cost is reduced; meanwhile, people do not need to be calibrated, and the method can be widely applied to various scenes such as bank outlets, personal mobile phones and the like. Compared with other human eye recognition systems using monocular cameras, the image recognition method only needs two models, and the human eye sight prediction neural network model has less parameter quantity than the existing human eye recognition model, so that the calculation of one-time eye recognition is greatly accelerated, and the real-time analysis can be realized on the English-Weber 1080 model GPU; the model of the eye-mind prediction neural network model occupies less than 8MB of memory space, and the space memory of the existing human eye recognition model is usually more than 100 MB.

The image recognition method in the embodiment may be applied to emotion analysis, such as: the device is characterized in that the device is neglected when in tension or lie, and can be used for anti-fraud judgment; the method can also be used for analyzing interested areas of customers such as billboards; the method can also be applied to small games for human eye identification or game interaction and the like.

Example two

As shown in fig. 3, the present invention also provides an image recognition apparatus 1 including: a receiving unit 11, a detecting unit 12, a correcting unit 13 and a recognizing unit 14, wherein:

a receiving unit 11, configured to acquire an image to be detected;

The detection unit 12 is configured to perform face detection on the image to be detected, and acquire a face image and positioning data of the face image;

Specifically, the detection unit 12 may perform face detection on the image to be detected by using a Multi-task convolutional neural network (MTCNN for short), and obtain a face image and positioning data of the face image.

A correcting unit 13, configured to correct the face image based on the positioning data, and obtain an image to be predicted;

in the embodiment, in order to facilitate the subsequent sight line identification of the image to be predicted, the face image needs to be converted into the image to be predicted with a correct head (for example, eyes looking forward) for easy identification, so as to improve the accuracy of sight line identification.

The correction unit 13 compares the positioning data with the standard coordinate data, and performs similarity transformation on the face image according to a comparison result to generate a to-be-predicted image.

And the identifying unit 14 is configured to identify the image to be predicted by using a human eye gaze prediction neural network model, and determine a human eye gaze direction.

the identification unit 14 performs first facial feature extraction on the image to be predicted through the separable convolution module; the separable convolution module can be combined with the forward residual module to perform a first facial feature extraction on the image to be predicted. And adding the features obtained by the separable convolution module and the initial features at the same position by using a forward residual module. The network can learn the high-order characteristics and can not forget useful low-order characteristics.

The separable convolution module can be combined with the inverse residual module to perform a first facial feature extraction on the image to be predicted. Convolving each input channel of the image to be predicted by a single convolution kernel through a depth convolution (depthwise convolution) of the separable convolution module to obtain a first feature map; and then carrying out weighted combination on the first feature map of the previous step in the depth direction by adopting point-by-point convolution (position convolution) through 1 multiplied by 1 convolution to obtain more features. Combining the reverse residual error module with the separable convolution module, adding 1 x 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to ensure that more effective second feature maps are extracted, and splicing all the second feature maps to obtain the first face features. In the embodiment, the neural network does not forget useful low-order features while learning high-order features through the inverse residual error module, and meanwhile, compared with a forward residual error module, the quantity of parameters is less, the calculation speed is higher, and the occupied space of a memory is greatly reduced.

The recognition unit 14 adjusts the weight of the first facial feature through the attention mechanism module to obtain a feature weight for enhancing the weight of the eye feature; the attention mechanism module adopts a self-attention mechanism. Among them, the self-attention mechanism is a position correlation mechanism of weights and sequences when calculating the same sequence representation, and has proved to be very effective in machine reading understanding, abstract summary and picture description generation.

The recognition unit 14 combines the first facial features and the feature weights to generate second facial features, and the classification module processes the second facial features to obtain the human eye sight direction.

The classification module adopts a full connection layer. Multiplying the first face features and the feature weights to generate second face features, inputting the second face features into a full-connection layer, integrating the second face features through a weight matrix by the full-connection layer, calculating deviation probability information based on integrated neurons, obtaining the vertical deviation and the horizontal deviation of the sight corresponding to each deviation probability information, and obtaining the sight direction of human eyes according to the vertical deviation and the horizontal deviation.

The input image to be predicted of the human eye sight prediction neural network model is the whole human face, and has two main advantages compared with the prior art: firstly, on the accuracy of prediction, muscle changes around human eyes can assist in judging the direction of the eyes, and the existing method only inputs eye pictures and cannot utilize surrounding information; secondly, in order to obtain the eye picture, the existing method needs to construct 68 detection models of key points of the face to obtain the coordinates of the eye frame, and has large calculation amount and high cost.

In this embodiment, the image recognition device 1 performs face detection on the acquired image to be detected to obtain a face image and positioning data of the face image, corrects the face image according to the positioning data to obtain an image to be predicted for image recognition, reduces the amount of calculation, and recognizes the image to be predicted by using the human eye sight prediction neural network model, so as to determine the direction of the human eye sight.

The image recognition apparatus 1 in the present embodiment can be applied to emotion analysis, such as: the device is characterized in that the device is neglected when in tension or lie, and can be used for anti-fraud judgment; the method can also be used for analyzing interested areas of customers such as billboards; the method can also be applied to small games for human eye identification or game interaction and the like.

EXAMPLE III

In order to achieve the above object, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, components of the image recognition apparatus 1 according to the second embodiment may be dispersed in different computer devices 2, and the computer device 2 may be a smartphone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that executes a program, and the like. The computer device 2 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 23, a network interface 22, and the image recognition apparatus 1 (refer to fig. 4) that can be communicatively connected to each other through a system bus. It is noted that fig. 4 only shows the computer device 2 with components, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as a program code of the image recognition method of the first embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 23 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 23 is typically used for controlling the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 23 is configured to run the program codes stored in the memory 21 or process data, for example, run the image recognition apparatus 1.

The network interface 22 may comprise a wireless network interface or a wired network interface, and the network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.

It is noted that fig. 4 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.

In this embodiment, the image recognition apparatus 1 stored in the memory 21 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 23) to complete the present invention.

Example four

To achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, implements corresponding functions. The computer readable storage medium of the present embodiment is used for storing the image recognition apparatus 1, and when being executed by the processor 23, the computer readable storage medium implements the image recognition method of the first embodiment.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An image recognition method, comprising:

acquiring an image to be detected;

2. The image recognition method of claim 1, wherein the performing of the face detection on the image to be detected to obtain the face image and the positioning data of the face image comprises:

3. The image recognition method according to claim 1 or 2, wherein the positioning data includes: coordinates of two eye elliptic center points, coordinates of nose heads and coordinates of two ends of mouth corners.

4. The image recognition method according to claim 1, wherein the correcting the face image based on the positioning data to obtain the image to be predicted comprises:

5. The image recognition method of claim 1, wherein the human eye gaze prediction neural network model comprises: the system comprises a separable convolution module, an attention mechanism module and a classification module;

6. The image recognition method of claim 5, wherein the separable convolution module, in combination with a forward residual module, performs the first facial feature extraction on the image to be predicted; and adding the features obtained by the separable convolution module and the initial features at the same position by adopting the forward residual error module to obtain the first facial feature extraction.

7. The image recognition method of claim 5, wherein the separable convolution module, in combination with the inverse residual module, performs the first facial feature extraction on the image to be predicted; and combining the reverse residual error module with the separable convolution module, and adding 1 × 1 cross-channel convolution between every two point-by-point convolution channels to perform inter-channel information fusion so as to obtain a first face feature extraction.

8. An image recognition apparatus, comprising:

the receiving unit is used for acquiring an image to be detected;

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that: the processor, when executing the computer program, realizes the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.